Work on DEFINE command.

author Ben Pfaff <blp@cs.stanford.edu>

Tue, 23 Mar 2021 14:14:48 +0000 (07:14 -0700)

committer Ben Pfaff <blp@cs.stanford.edu>

Sun, 27 Jun 2021 18:30:40 +0000 (11:30 -0700)
author Ben Pfaff <blp@cs.stanford.edu>
Tue, 23 Mar 2021 14:14:48 +0000 (07:14 -0700)
committer Ben Pfaff <blp@cs.stanford.edu>
Sun, 27 Jun 2021 18:30:40 +0000 (11:30 -0700)
diff --git a/doc/flow-control.texi b/doc/flow-control.texi

index c0931f1bcedca679601d48545da52e0e9f904fab..a3f39b6ccb8c40390b0b5126ec453e711fd9ddd4 100644 (file)
--- a/doc/flow-control.texi
+++ b/doc/flow-control.texi
@@ -20,6 +20,7 @@ looping, and flow of control.
  
  @menu
  * BREAK::                       Exit a loop.
+* DEFINE::                      Define a macro.
  * DO IF::                       Conditionally execute a block of code.
  * DO REPEAT::                   Textually repeat a code block.
  * LOOP::                        Repeat a block of code.
@@ -39,6 +40,768 @@ BREAK.
  @cmd{BREAK} is allowed only inside @cmd{LOOP}@dots{}@cmd{END LOOP}.
  @xref{LOOP}, for more details.
  
+@node DEFINE
+@section DEFINE
+@vindex DEFINE
+@cindex macro
+
+@subsection Overview
+
+@display
+@t{DEFINE} @i{macro_name}@t{(}@r{[}@i{argument}@r{[}@t{/}@i{argument}@r{]@dots{}]}@t{)}
+@dots{}@i{body}@dots{}
+@t{!ENDDEFINE.}
+@end display
+
+Each @i{argument} takes the following form:
+@display
+@r{@{}@i{!arg_name} @t{=},@t{!POSITIONAL}@r{@}}
+@r{[}@t{!DEFAULT(}@i{default}@t{)}@r{]}
+@r{[}@t{!NOEXPAND}@r{]}
+@r{@{}@t{!TOKENS(}@i{count}@t{)},@t{!CHAREND('}@i{token}@t{')},@t{!ENCLOSE('}@i{start}@t{','}@i{end}@t{')},@t{!CMDEND}@}
+@end display
+
+The following directives may be used within @i{body}:
+@example
+!OFFEXPAND
+!ONEXPAND
+@end example
+
+The following functions may be used within the body:
+@display
+@t{!BLANKS(}@i{count}@t{)}
+@t{!CONCAT(}@i{arg}@dots{}@t{)}
+@t{!EVAL(}@i{arg}@t{)}
+@t{!HEAD(}@i{arg}@t{)}
+@t{!INDEX(}@i{haystack}@t{,} @i{needle}@t{)}
+@t{!LENGTH(}@i{arg}@t{)}
+@t{!NULL}
+@t{!QUOTE(}@i{arg}@t{)}
+@t{!SUBSTR(}@i{arg}@t{,} @i{start}[@t{,} @i{count}]@t{)}
+@t{!TAIL(}@i{arg}@t{)}
+@t{!UNQUOTE(}@i{arg}@t{)}
+@t{!UPCASE(}@i{arg}@t{)}
+@end display
+
+The body may also include the following constructs:
+@display
+@t{!IF (}@i{condition}@t{) !THEN} @i{true-expansion} @t{!ENDIF}
+@t{!IF (}@i{condition}@t{) !THEN} @i{true-expansion} @t{!ELSE} @i{false-expansion} @t{!ENDIF}
+
+@t{!DO} @i{!var} @t{=} @i{start} @t{!TO} @i{end} [@t{!BY} @i{step}]
+  @i{body}
+@t{!DOEND}
+@t{!DO} @i{!var} @t{!IN} @t{(}@i{expression}@t{)}
+  @i{body}
+@t{!DOEND}
+
+@t{!LET} @i{!var} @t{=} @i{expression}
+@end display
+
+@subsection Introduction
+
+The DEFINE command creates a @dfn{macro}, which is a name for a
+fragment of PSPP syntax called the macro's @dfn{body}.  Following the
+DEFINE command, syntax may @dfn{call} the macro by name any number of
+times.  Each call substitutes, or @dfn{expands}, the macro's body in
+place of the call, as if the body had been written in its place.
+
+The following syntax defines a macro named @code{!vars} that expands
+to the variable names @code{v1 v2 v3}.  The macro's name begins with
+@samp{!}, which is optional for macro names.  The @code{()} following
+the macro name are required:
+
+@example
+DEFINE !vars()
+v1 v2 v3
+!ENDDEFINE.
+@end example
+
+Here are two ways that @code{!vars} might be called given the
+preceding definition:
+
+@example
+DESCRIPTIVES !vars.
+FREQUENCIES /VARIABLES=!vars.
+@end example
+
+With macro expansion, the above calls are equivalent to the following:
+
+@example
+DESCRIPTIVES v1 v2 v3.
+FREQUENCIES /VARIABLES=v1 v2 v3.
+@end example
+
+The @code{!vars} macro expands to a fixed body.  Macros may have more
+sophisticated contents:
+
+@itemize @bullet
+@item
+Macro @dfn{arguments} that are substituted into the body whenever they
+are named.  The values of a macro's arguments are specified each time
+it is called.  @xref{Macro Arguments}.
+
+@item
+Macro @dfn{functions}, expanded when the macro is called.  @xref{Macro
+Functions}.
+
+@item
+@code{!IF} constructs, for conditional expansion.  @xref{Macro
+Conditional Expansion}.
+
+@item
+Two forms of @code{!DO} construct, for looping over a numerical range
+or a collection of tokens.  @xref{Macro Loops}.
+
+@item
+@code{!LET} constructs, for assigning to macro variables.  @xref{Macro
+Variable Assignment}.
+@end itemize
+
+Many identifiers associated with macros begin with @samp{!}, a
+character not normally allowed in identifiers.  These identifiers are
+reserved only for use with macros, which helps keep them from being
+confused with other kinds of identifiers.
+
+The following sections provide more details on macro syntax and
+semantics.
+
+@node Macro Bodies
+@subsection Macro Bodies
+
+As previously shown, a macro body may contain a fragment of a PSPP
+command (such as a variable name).  A macro body may also contain full
+PSPP commands.  In the latter case, the macro body should also contain
+the command terminators.
+
+Most PSPP commands may occur within a macro.  The @code{DEFINE}
+command itself is one exception, because the inner @code{!ENDDEFINE}
+ends the outer macro definition.  For compatibility, @code{BEGIN
+DATA}@dots{}@code{END DATA.} should not be used within a macro.
+
+The body of a macro may call another macro.  The following shows one
+way that could work:
+
+@example
+DEFINE !commands()
+DESCRIPTIVES !vars.
+FREQUENCIES /VARIABLES=!vars.
+!ENDDEFINE.
+
+DEFINE !vars() v1 v2 v3 !ENDDEFINE.
+!commands
+
+* We can redefine the variables macro to analyze different variables:
+DEFINE !vars() v4 v5 !ENDDEFINE.
+!commands
+@end example
+
+The @code{!commands} macro would be easier to use if it took the
+variables to analyze as an argument rather than through another macro.
+The following section shows how to do that.
+
+@node Macro Arguments
+@subsection Macro Arguments
+
+This section explains how to use macro arguments.  As an initial
+example, the following syntax defines a macro named @code{!analyze}
+that takes all the syntax up to the first command terminator as an
+argument:
+
+@example
+DEFINE !analyze(!POSITIONAL !CMDEND)
+DESCRIPTIVES !1.
+FREQUENCIES /VARIABLES=!1.
+!ENDDEFINE.
+@end example
+
+@noindent When @code{!analyze} is called, it expands to a pair of analysis
+commands with each @code{!1} in the body replaced by the argument.
+That is, these calls:
+
+@example
+!analyze v1 v2 v3.
+!analyze v4 v5.
+@end example
+
+@noindent act like the following:
+
+@example
+DESCRIPTIVES v1 v2 v3.
+FREQUENCIES /VARIABLES=v1 v2 v3.
+DESCRIPTIVES v4 v5.
+FREQUENCIES /VARIABLES=v4 v5.
+@end example
+
+Macros may take any number of arguments, described within the
+parentheses in the DEFINE command.  Arguments come in two varieties
+based on how their values are specified when the macro is called:
+
+@itemize @bullet
+@item
+A @dfn{positional} argument has a required value that follows the
+macro's name.  Use the @code{!POSITIONAL} keyword to declare a
+positional argument.
+
+When a macro is called, every positional argument must be given a
+value in the same order as the defintion.
+
+References to a positional argument in a macro body are numbered:
+@code{!1} is the first positional argument, @code{!2} the second, and
+so on.  In addition, @code{!*} expands to all of the positional
+arguments' values, separated by spaces.
+
+The following example uses a positional argument:
+
+@example
+DEFINE !analyze(!POSITIONAL !CMDEND)
+DESCRIPTIVES !1.
+FREQUENCIES /VARIABLES=!1.
+!ENDDEFINE.
+
+!analyze v1 v2 v3.
+!analyze v4 v5.
+@end example
+
+@item
+A @dfn{keyword} argument has a name.  In the macro call, its value is
+specified with the syntax @code{@i{name}=@i{value}}.  The names allow
+keyword argument values to take any order in the call, and even to be
+omitted.  When one is omitted, a default value is used: either the
+value specified in @code{!DEFAULT(@i{value})}, or an empty value
+otherwise.
+
+In declaration and calls, a keyword argument's name may not begin with
+@samp{!}, but references to it in the macro body do start with a
+leading @samp{!}.
+
+The following example uses a keyword argument that defaults to ALL if
+the argument is not assigned a value:
+
+@example
+DEFINE !analyze_kw(vars=!DEFAULT(ALL) !CMDEND)
+DESCRIPTIVES !vars.
+FREQUENCIES /VARIABLES=!vars.
+!ENDDEFINE.
+
+!analyze_kw vars=v1 v2 v3.  /* Analyze specified variables.
+!analyze_kw.                /* Analyze all variables.
+@end example
+@end itemize
+
+If a macro has both positional and keyword arguments, then the
+positional arguments must come first in the DEFINE command, and their
+values also come first in macro calls.
+
+Each argument declaration specifies the form of its value:
+
+@table @code
+@item !TOKENS(@i{count})
+Exactly @var{count} tokens, e.g.@: @code{!TOKENS(1)} for a single
+token.  Each identifier, number, quoted string, operator, or
+punctuator is a token.  @xref{Tokens}, for a complete definition.
+
+The following variant of @code{!analyze_kw} accepts only a single
+variable name (or @code{ALL}) as its argument:
+
+@example
+DEFINE !analyze_one_var(!POSITIONAL !TOKENS(1))
+DESCRIPTIVES !1.
+FREQUENCIES /VARIABLES=!1.
+!ENDDEFINE.
+
+!analyze_one_var v1.
+@end example
+
+@item !CHAREND('@var{token}')
+Any number of tokens up to @var{token}, which should be an operator or
+punctuator token such as @samp{/} or @samp{+}.  The @var{token} does
+not become part of the value.
+
+With the following variant of @code{!analyze_kw}, the variables must
+be following by @samp{/}:
+
+@example
+DEFINE !analyze_parens(vars=!CHARNED('/'))
+DESCRIPTIVES !vars.
+FREQUENCIES /VARIABLES=!vars.
+!ENDDEFINE.
+
+!analyze_parens vars=v1 v2 v3/.
+@end example
+
+@item !ENCLOSE('@var{start}','@var{end}')
+Any number of tokens enclosed between @var{start} and @var{end}, which
+should each be operator or punctuator tokens.  For example, use
+@code{!ENCLOSE('(',')')} for a value enclosed within parentheses.
+(Such a value could never have right parentheses inside it, even
+paired with left parentheses.)  The start and end tokens are not part
+of the value.
+
+With the following variant of @code{!analyze_kw}, the variables must
+be specified within parentheses:
+
+@example
+DEFINE !analyze_parens(vars=!ENCLOSE('(',')'))
+DESCRIPTIVES !vars.
+FREQUENCIES /VARIABLES=!vars.
+!ENDDEFINE.
+
+!analyze_parens vars=(v1 v2 v3).
+@end example
+
+@item !CMDEND
+Any number of tokens up to the end of the command.  This should be
+used only for the last positional parameter, since it consumes all of
+the tokens in the command calling the macro.
+
+The following variant of @code{!analyze_kw} takes all the variable
+names up to the end of the command as its argument:
+
+@example
+DEFINE !analyze_kw(vars=!CMDEND)
+DESCRIPTIVES !vars.
+FREQUENCIES /VARIABLES=!vars.
+!ENDDEFINE.
+
+!analyze_kw vars=v1 v2 v3.
+@end example
+@end table
+
+By default, when an argument's value contains a macro call, the call
+is expanded each time the argument appears in the macro's body.  The
+@code{!NOEXPAND} keyword in an argument declaration suppresses this
+expansion.  @xref{Controlling Macro Expansion}.
+
+@node Controlling Macro Expansion
+@subsection Controlling Macro Expansion
+
+Multiple factors control whether macro calls are expanded in different
+situations.  At the highest level, @code{SET MEXPAND} controls whether
+macro calls are expanded.  By default, it is enabled.  @xref{SET
+MEXPAND}, for details.
+
+A macro body may contain macro calls.  By default, these are expanded.
+If a macro body contains @code{!OFFEXPAND} or @code{!ONEXPAND}
+directives, then @code{!OFFEXPAND} disables expansion of macro calls
+until the following @code{!ONEXPAND}.
+
+A macro argument's value may contain a macro call.  These macro calls
+are expanded, unless the argument was declared with the
+@code{!NOEXPAND} keyword.
+
+The argument to a macro function is a special context that does not
+expand macro calls.  For example, if @code{!vars} is the name of a
+macro, then @code{!LENGTH(!vars)} expands to 5, as does
+@code{!LENGTH(!1)} if positional argument 1 has value @code{!vars}.
+To expand macros in these cases, use the @code{!EVAL} macro function,
+e.g.@: @code{!LENGTH(!EVAL(!vars))} or @code{!LENGTH(!EVAL(!1))}.
+@xref{Macro Functions}, for details.
+
+These rules apply to macro calls, not to uses of macro functions and
+macro arguments within a macro body, which are always expanded.
+
+@node Macro Functions
+@subsection Macro Functions
+
+Macro bodies may manipulate syntax using macro functions.  Macro
+functions accept tokens as arguments and expand to sequences of
+characters.
+
+The arguments to macro functions have a restricted form.  They may
+only be a single token (such as an identifier or a string), a macro
+argument, or a call to a macro function.  Thus, the following are
+valid macro arguments:
+@example
+x    5.0    x    !1    "5 + 6"    !CONCAT(x,y)
+@end example
+@noindent and the following are not:
+@example
+x y    5+6
+@end example
+
+Macro functions expand to sequences of characters.  When these
+character strings are processed further as character strings, e.g.@:
+with @code{!LENGTH}, any character string is valid.  When they are
+interpreted as PSPP syntax, e.g.@: when the expansion becomes part of
+a command, they need to be valid for that purpose.  For example,
+@code{!UNQUOTE("It's")} will yield an error if the expansion
+@code{It's} becomes part of a PSPP command, because it contains
+unbalanced single quotes, but @code{!LENGTH(!UNQUOTE("It's"))} expands
+to 4.
+
+The following macro functions are available.  Each function's
+documentation includes examples in the form @code{@var{call}
+@expansion{} @var{expansion}}.
+
+@deffn {Macro Function} !BLANKS (count)
+Expands to @var{count} unquoted spaces, where @var{count} is a
+nonnegative integer.  Outside quotes, any positive number of spaces
+are equivalent; for a quoted string of spaces, use
+@code{!QUOTE(!BLANKS(@var{count}))}.
+
+In the examples below, @samp{_} stands in for a space to make the
+results visible.
+
+@c Keep these examples in sync with the test for !BLANKS in
+@c tests/language/control/define.at:
+@example
+!BLANKS(0)                  @expansion{} @r{empty}
+!BLANKS(1)                  @expansion{} _
+!BLANKS(2)                  @expansion{} __
+!QUOTE(!BLANKS(5))          @expansion{} '_____'
+@end example
+@end deffn
+
+@deffn {Macro Function} !CONCAT (arg@dots{})
+Expands to the concatenation of all of the arguments.  Before
+concatenation, each quoted string argument is unquoted, as if
+@code{!UNQUOTE} were applied.  This allows for ``token pasting'',
+combining two (or more) tokens into a single one:
+
+@c Keep these examples in sync with the test for !CONCAT in
+@c tests/language/control/define.at:
+@example
+!CONCAT(x, y)                @expansion{} xy
+!CONCAT('x', 'y')            @expansion{} xy
+!CONCAT(12, 34)              @expansion{} 1234
+!CONCAT(!NULL, 123)          @expansion{} 123
+@end example
+
+@code{!CONCAT} is often used for constructing a series of similar
+variable names from a prefix followed by a number and perhaps a
+suffix.  For example:
+
+@c Keep these examples in sync with the test for !CONCAT in
+@c tests/language/control/define.at:
+@example
+!CONCAT(x, 0)                @expansion{} x0
+!CONCAT(x, 0, y)             @expansion{} x0y
+@end example
+
+An identifier token must begin with a letter (or @samp{#} or
+@samp{@@}), which means that attempting to use a number as the first
+part of an identifier will produce a pair of distinct tokens rather
+than a single one.  For example:
+
+@c Keep these examples in sync with the test for !CONCAT in
+@c tests/language/control/define.at:
+@example
+!CONCAT(0, x)                @expansion{} 0 x
+!CONCAT(0, x, y)             @expansion{} 0 xy
+@end example
+@end deffn
+
+@deffn {Macro Function} !EVAL (arg)
+Expands macro calls in @var{arg}.  This is especially useful if
+@var{arg} is the name of a macro or a macro argument that expands to
+one, because arguments to macro functions are not expanded by default
+(@pxref{Controlling Macro Expansion}).
+
+The following examples assume that @code{!vars} is a macro that
+expands to @code{a b c}:
+
+@example
+!vars                        @expansion{} a b c
+!QUOTE(!vars)                @expansion{} '!vars'
+!EVAL(!vars)                 @expansion{} a b c
+!QUOTE(!EVAL(!vars))         @expansion{} 'a b c'
+@end example
+
+These examples additionally assume that argument @code{!1} has value
+@code{!vars}:
+
+@example
+!1                           @expansion{} a b c
+!QUOTE(!1)                   @expansion{} '!vars'
+!EVAL(!1)                    @expansion{} a b c
+!QUOTE(!EVAL(!1))            @expansion{} 'a b c'
+@end example
+@end deffn
+
+@deffn {Macro Function} !HEAD (arg)
+@deffnx {Macro Function} !TAIL (arg)
+@code{!HEAD} expands to just the first token in an unquoted version of
+@var{arg}, and @code{!TAIL} to all the tokens after the first.
+
+@example
+!HEAD('a b c')               @expansion{} a
+!HEAD('a')                   @expansion{} a
+!HEAD(!NULL)                 @expansion{} @r{empty}
+!HEAD('')                    @expansion{} @r{empty}
+
+!TAIL('a b c')               @expansion{} b c
+!TAIL('a')                   @expansion{} @r{empty}
+!TAIL(!NULL)                 @expansion{} @r{empty}
+!TAIL('')                    @expansion{} @r{empty}
+@end example
+@end deffn
+
+@deffn {Macro Function} !INDEX (haystack, needle)
+Looks for @var{needle} in @var{haystack}.  If it is present, expands
+to the 1-based index of its first occurrence; if not, expands to 0.
+
+@example
+!INDEX(banana, an)           @expansion{} 2
+!INDEX(banana, nan)          @expansion{} 3
+!INDEX(banana, apple)        @expansion{} 0
+!INDEX("banana", nan)        @expansion{} 4
+!INDEX("banana", "nan")      @expansion{} 0
+!INDEX(!UNQUOTE("banana"), !UNQUOTE("nan")) @expansion{} 3
+@end example
+@end deffn
+
+@deffn {Macro Function} !LENGTH (arg)
+Expands to a number token representing the number of characters in
+@var{arg}.
+
+@example
+!LENGTH(123)                 @expansion{} 3
+!LENGTH(123.00)              @expansion{} 6
+!LENGTH( 123 )               @expansion{} 3
+!LENGTH("123")               @expansion{} 5
+!LENGTH(xyzzy)               @expansion{} 5
+!LENGTH("xyzzy")             @expansion{} 7
+!LENGTH("xy""zzy")           @expansion{} 9
+!LENGTH(!UNQUOTE("xyzzy"))   @expansion{} 5
+!LENGTH(!UNQUOTE("xy""zzy")) @expansion{} 6
+!LENGTH(!1)                  @expansion{} 5 @r{if @t{!1} is @t{a b c}}
+!LENGTH(!1)                  @expansion{} 0 @r{if @t{!1} is empty}
+!LENGTH(!NULL)               @expansion{} 0
+@end example
+@end deffn
+
+@deffn {Macro Function} !NULL
+Expands to an empty character sequence.
+
+@example
+!NULL                        @expansion{} @r{empty}
+!QUOTE(!NULL)                @expansion{} ''
+@end example
+@end deffn
+
+@deffn {Macro Function} !QUOTE (arg)
+@deffnx {Macro Function} !UNQUOTE (arg)
+The @code{!QUOTE} function expands to its argument surrounded by
+apostrophes, doubling any apostrophes inside the argument to make sure
+that it is valid PSPP syntax for a string.  If the argument was
+already a quoted string, @code{!QUOTE} expands to it unchanged.
+
+Given a quoted string argument, the @code{!UNQUOTED} function expands
+to the string's contents, with the quotes removed and any doubled
+quote marks reduced to singletons.  If the argument was not a quoted
+string, @code{!UNQUOTE} expands to the argument unchanged.
+
+@example
+!QUOTE(123.0)                @expansion{} '123.0'
+!QUOTE( 123 )                @expansion{} '123'
+!QUOTE('a b c')              @expansion{} 'a b c'
+!QUOTE("a b c")              @expansion{} "a b c"
+!QUOTE(!1)                   @expansion{} 'a ''b'' c' @r{if @t{!1} is @t{a 'b' c}}
+
+!UNQUOTE(123.0)              @expansion{} 123.0
+!UNQUOTE( 123 )              @expansion{} 123
+!UNQUOTE('a b c')            @expansion{} a b c
+!UNQUOTE("a b c")            @expansion{} a b c
+!UNQUOTE(!1)                 @expansion{} a 'b' c @r{if @t{!1} is @t{a 'b' c}}
+
+!QUOTE(!UNQUOTE(123.0))      @expansion{} '123.0'
+!QUOTE(!UNQUOTE( 123 ))      @expansion{} '123'
+!QUOTE(!UNQUOTE('a b c'))    @expansion{} 'a b c'
+!QUOTE(!UNQUOTE("a b c"))    @expansion{} 'a b c'
+!QUOTE(!UNQUOTE(!1))         @expansion{} 'a ''b'' c' @r{if @t{!1} is @t{a 'b' c}}
+@end example
+@end deffn
+
+@deffn {Macro Function} !SUBSTR (arg, start[, count])
+Expands to a substring of @var{arg} starting from 1-based position
+@var{start}.  If @var{count} is given, it limits the number of
+characters in the expansion; if it is omitted, then the expansion
+extends to the end of @var{arg}.
+
+@example
+!SUBSTR(banana, 3)           @expansion{} nana
+!SUBSTR(banana, 3, 3)        @expansion{} nan
+!SUBSTR("banana", 3)         @expansion{} @r{error (@code{anana"} is not a valid token)}
+!SUBSTR(!UNQUOTE("banana"), 3) @expansion{} nana
+!SUBSTR("banana", 3, 3)      @expansion{} ana
+
+!SUBSTR(banana, 3, 0)        @expansion{} @r{empty}
+!SUBSTR(banana, 3, 10)       @expansion{} nana
+!SUBSTR(banana, 10, 3)       @expansion{} @r{empty}
+@end example
+@end deffn
+
+@deffn {Macro Function} !UPCASE (arg)
+Expands to an unquoted version of @var{arg} with all letters converted
+to uppercase.
+
+@example
+!UPCASE(freckle)             @expansion{} FRECKLE
+!UPCASE('freckle')           @expansion{} FRECKLE
+!UPCASE('a b c')             @expansion{} A B C
+!UPCASE('A B C')             @expansion{} A B C
+@end example
+@end deffn
+
+@node Macro Expressions
+@subsection Macro Expressions
+
+Macro expressions are used in conditional expansion and loops, which
+are described in the following sections.  A macro expression may use
+the following operators, listed in descending order of operator
+precedence:
+
+@table @code
+@item ()
+Parentheses override the default operator precedence.
+
+@item !EQ !NE !GT !LT !GE !LE = ~= <> > < >= <=
+Relational operators compare their operands and yield a Boolean
+result, either @samp{0} for false or @samp{1} for true.
+
+These operators always compare their operands as strings.  This can be
+surprising when the strings are numbers because, e.g.,@: @code{1 <
+1.0} and @code{10 < 2} both evaluate to @samp{1} (true).
+
+Comparisons are case sensitive, so that @code{a = A} evaluates to
+@samp{0} (false).
+
+@item !NOT ~
+@itemx !AND &
+@itemx !OR |
+Logical operators interpret their operands as Boolean values, where
+quoted or unquoted @samp{0} is false and anything else is true, and
+yield a Boolean result, either @samp{0} for false or @samp{1} for
+true.
+@end table
+
+Macro expressions do not include any arithmetic operators.
+
+An operand in an expression may be a single token (including a macro
+argument name) or a macro function invocation.  Either way, the
+expression evaluator unquotes the operand, so that @code{1 = '1'} is
+true.
+
+@node Macro Conditional Expansion
+@subsection Macro Conditional Expansion
+
+The @code{!IF} construct may be used inside a macro body to allow for
+conditional expansion.  It takes the following forms:
+
+@example
+!IF (@var{expression}) !THEN @var{true-expansion} !IFEND
+!IF (@var{expression}) !THEN @var{true-expansion} !ELSE @var{false-expansion} !IFEND
+@end example
+
+When @var{expression} evaluates to true, the macro processor expands
+@var{true-expansion}; otherwise, it expands @var{false-expansion}, if
+it is present.  The macro processor considers quoted or unquoted
+@samp{0} to be false, and anything else to be true.
+
+@node Macro Loops
+@subsection Macro Loops
+
+The body of a macro may include two forms of loops: loops over
+numerical ranges and loops over tokens.  Both forms expand a @dfn{loop
+body} multiple times, each time setting a named @dfn{loop variable} to
+a different value.  The loop body typically expands the loop variable
+at least once.
+
+The MITERATE setting (@pxref{SET MITERATE}) limits the number of
+iterations in a loop.  This is a safety measure to ensure that macro
+expansion terminates.  PSPP issues a warning when the MITERATE limit
+is exceeded.
+
+@subsubheading Loops Over Ranges
+
+@example
+!DO @var{!var} = @var{start} !TO @var{end} [!BY @var{step}]
+  @var{body}
+!DOEND
+@end example
+
+A loop over a numerical range has the form shown above.  @var{start},
+@var{end}, and @var{step} (if included) must be expressions with
+numeric values.  The macro processor accepts both integers and real
+numbers.  The macro processor expands @var{body} for each numeric
+value from @var{start} to @var{end}, inclusive.
+
+The default value for @var{step} is 1.  If @var{step} is positive and
+@math{@var{first} > @var{last}}, or if @var{step} is negative and
+@math{@var{first} < @var{last}}, then the macro processor doesn't
+expand the body at all.  @var{step} may not be zero.
+
+@subsubheading Loops Over Tokens
+
+@example
+!DO @var{!var} !IN (@var{expression})
+  @var{body}
+!DOEND
+@end example
+
+A loop over tokens takes the form shown above.  The macro processor
+evaluates @var{expression} and expands @var{body} once per token in
+the result, substituting the token for @var{!var} each time it
+appears.
+
+@node Macro Variable Assignment
+@subsection Macro Variable Assignment
+
+The @code{!LET} construct evaluates an expression and assigns the
+result to a macro variable.  It may create a new macro variable or
+change the value of one created by a previous @code{!LET} or
+@code{!DO}, but it may not change the value of a macro argument.
+@code{!LET} has the following form:
+
+@example
+!LET @var{!var} = @var{expression}
+@end example
+
+If @var{expression} is more than one token, it must be enclosed in
+parentheses.
+
+@node Macro Settings
+@subsection Macro Settings
+
+Some macro behavior is controlled through the SET command
+(@pxref{SET}).  This section describes these settings.
+
+Any SET command that changes these settings within a macro body only
+takes effect following the macro.  This is because PSPP expands a
+macro's entire body at once, so that the SET command inside the body
+only executes afterwards.
+
+The MEXPAND setting (@pxref{SET MEXPAND}) controls whether macros will
+be expanded at all.  By default, macro expansion is on.  To avoid
+expansion of macros called within a macro body, use @code{!OFFEXPAND}
+and @code{!ONEXPAND} (@pxref{Controlling Macro Expansion}).
+
+When MPRINT (@pxref{SET MPRINT}) is turned on, PSPP logs an expansion
+of each macro in the input.  This feature can be useful for debugging
+macro definitions.
+
+MNEST (@pxref{SET MNEST}) limits the depth of expansion of macro
+calls, that is, the nesting level of macro expansion.  The default is
+50.  This is mainly useful to avoid infinite expansion in the case of
+a macro that calls itself.
+
+MITERATE (@pxref{SET MITERATE}) limits the number of iterations in a
+@code{!DO} construct.  The default is 1000.
+
+PRESERVE...RESTORE
+
+SET MEXPAND, etc. doesn't work inside macro bodies.
+
+@node Macro Notes
+@subsection Extra Notes
+
+Macros in comments.
+
+Macros in titles.
+
+Define ``unquote.''
+
  @node DO IF
  @section DO IF
  @vindex DO IF
diff --git a/doc/utilities.texi b/doc/utilities.texi

index da5de1ddceaf5164c80cd8788b6f65a921c40d14..bd5b10c9e2bebe0699ccfc408e875fb94d3e55d5 100644 (file)
--- a/doc/utilities.texi
+++ b/doc/utilities.texi
@@ -938,21 +938,25 @@ The following subcommands affect the interpretation of macros.
  
  @table @asis
  @item MEXPAND
+@anchor{SET MEXPAND}
  Controls whether macros are expanded.  The default is ON.
  
  @item MPRINT
+@anchor{SET MPRINT}
  Controls whether the expansion of macros is included in output.  This
  is separate from whether command syntax in general is included in
  output.  The default is OFF.
  
  @item MITERATE
+@anchor{SET MITERATE}
  Limits the number of iterations executed in @code{!DO} loops within
  macros.  This does not affect other language constructs such as
  @cmd{LOOP}.  This must be set to a positive integer.  The default is
  1000.
  
  @item MNEST
-Limits the number of levels of nested macro expansion.  This must be
+@anchor{SET MNEST}
+Limits the number of levels of nested macro expansions.  This must be
  set to a positive integer.  The default is 50.
  @end table
  
diff --git a/src/language/command.def b/src/language/command.def

index a97f9b83e70fd1c7e021188eb6a84107a6c04627..63df224bde598e249819da9ad500f81207e2fbb0 100644 (file)
--- a/src/language/command.def
+++ b/src/language/command.def
@@ -18,6 +18,7 @@
  DEF_CMD (S_ANY, F_ENHANCED, "CLOSE FILE HANDLE", cmd_close_file_handle)
  DEF_CMD (S_ANY, 0, "CACHE", cmd_cache)
  DEF_CMD (S_ANY, 0, "CD", cmd_cd)
+DEF_CMD (S_ANY, 0, "DEFINE", cmd_define)
  DEF_CMD (S_ANY, 0, "DO REPEAT", cmd_do_repeat)
  DEF_CMD (S_ANY, 0, "END REPEAT", cmd_end_repeat)
  DEF_CMD (S_ANY, 0, "ECHO", cmd_echo)
@@ -154,6 +155,7 @@ DEF_CMD (S_INPUT_PROGRAM, 0, "END INPUT PROGRAM", cmd_end_input_program)
  DEF_CMD (S_INPUT_PROGRAM, 0, "REREAD", cmd_reread)
  
  /* Commands for testing PSPP. */
+DEF_CMD (S_ANY, F_TESTING, "DEBUG EXPAND", cmd_debug_expand)
  DEF_CMD (S_ANY, F_TESTING, "DEBUG EVALUATE", cmd_debug_evaluate)
  DEF_CMD (S_ANY, F_TESTING, "DEBUG FORMAT GUESSER", cmd_debug_format_guesser)
  DEF_CMD (S_ANY, F_TESTING, "DEBUG MOMENTS", cmd_debug_moments)
@@ -188,7 +190,6 @@ UNIMPL_CMD ("CSTABULATE", "Tabulate complex samples")
  UNIMPL_CMD ("CTABLES", "Display complex samples")
  UNIMPL_CMD ("CURVEFIT", "Fit curve to line plot")
  UNIMPL_CMD ("DATE", "Create time series data")
-UNIMPL_CMD ("DEFINE", "Syntax macros")
  UNIMPL_CMD ("DETECTANOMALY", "Find unusual cases")
  UNIMPL_CMD ("DISCRIMINANT", "Linear discriminant analysis")
  UNIMPL_CMD ("EDIT", "obsolete")
diff --git a/src/language/control/automake.mk b/src/language/control/automake.mk

index 909acd13db4106bfd0872a265bbb02397e11d3bc..9d09687c81e38330552f5f23c5d6f3b01385edf4 100644 (file)
--- a/src/language/control/automake.mk
+++ b/src/language/control/automake.mk
@@ -20,6 +20,7 @@
  language_control_sources = \
         src/language/control/control-stack.c \
         src/language/control/control-stack.h \
+       src/language/control/define.c \
         src/language/control/do-if.c \
         src/language/control/loop.c \
         src/language/control/repeat.c \
diff --git a/src/language/control/define.c b/src/language/control/define.c

new file mode 100644 (file)

index 0000000..e8155e9
--- /dev/null
+++ b/src/language/control/define.c
@@ -0,0 +1,259 @@
+/* PSPP - a program for statistical analysis.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation, either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>. */
+
+#include <config.h>
+
+#include <limits.h>
+
+#include "language/command.h"
+#include "language/lexer/lexer.h"
+#include "language/lexer/macro.h"
+#include "language/lexer/scan.h"
+#include "language/lexer/token.h"
+#include "libpspp/message.h"
+
+#include "gl/xalloc.h"
+
+#include "gettext.h"
+#define _(msgid) gettext (msgid)
+
+static bool
+force_macro_id (struct lexer *lexer)
+{
+  return lex_token (lexer) == T_MACRO_ID || lex_force_id (lexer);
+}
+
+static bool
+match_macro_id (struct lexer *lexer, const char *keyword)
+{
+  if (keyword[0] != '!')
+    return lex_match_id (lexer, keyword);
+  else if (lex_token (lexer) == T_MACRO_ID
+           && lex_id_match_n (ss_cstr (keyword), lex_tokss (lexer), 4))
+    {
+      lex_get (lexer);
+      return true;
+    }
+  else
+    return false;
+}
+
+static bool
+parse_quoted_token (struct lexer *lexer, struct token *token)
+{
+  if (!lex_force_string (lexer))
+    return false;
+
+  struct substring s = lex_tokss (lexer);
+  struct string_lexer slex;
+  string_lexer_init (&slex, s.string, s.length, SEG_MODE_INTERACTIVE, true);
+  struct token another_token = { .type = T_STOP };
+  if (!string_lexer_next (&slex, token)
+      || string_lexer_next (&slex, &another_token))
+    {
+      token_uninit (token);
+      token_uninit (&another_token);
+      lex_error (lexer, _("String must contain exactly one token."));
+      return false;
+    }
+  lex_get (lexer);
+  return true;
+}
+
+int
+cmd_define (struct lexer *lexer, struct dataset *ds UNUSED)
+{
+  if (!force_macro_id (lexer))
+    return CMD_FAILURE;
+
+  /* Parse macro name. */
+  struct macro *m = xmalloc (sizeof *m);
+  *m = (struct macro) { .name = ss_xstrdup (lex_tokss (lexer)) };
+  lex_get (lexer);
+
+  if (!lex_force_match (lexer, T_LPAREN))
+    goto error;
+
+  size_t allocated_params = 0;
+  while (!lex_match (lexer, T_RPAREN))
+    {
+      if (m->n_params >= allocated_params)
+        m->params = x2nrealloc (m->params, &allocated_params,
+                                sizeof *m->params);
+
+      size_t param_index = m->n_params++;
+      struct macro_param *p = &m->params[param_index];
+      *p = (struct macro_param) { .expand_arg = true };
+
+      /* Parse parameter name. */
+      if (match_macro_id (lexer, "!POSITIONAL"))
+        {
+          if (param_index > 0 && !m->params[param_index - 1].positional)
+            {
+              lex_error (lexer, _("Positional parameters must precede "
+                                  "keyword parameters."));
+              goto error;
+            }
+
+          p->positional = true;
+          p->name = xasprintf ("!%zu", param_index + 1);
+        }
+      else
+        {
+          if (lex_token (lexer) == T_MACRO_ID)
+            {
+              lex_error (lexer, _("Keyword macro parameter must be named in "
+                                  "definition without \"!\" prefix."));
+              goto error;
+            }
+          if (!lex_force_id (lexer))
+            goto error;
+
+          if (is_macro_keyword (lex_tokss (lexer)))
+            {
+              lex_error (lexer, _("Cannot use macro keyword \"%s\" "
+                                  "as an argument name."),
+                         lex_tokcstr (lexer));
+              goto error;
+            }
+
+          p->positional = false;
+          p->name = xasprintf ("!%s", lex_tokcstr (lexer));
+          lex_get (lexer);
+
+          if (!lex_force_match (lexer, T_EQUALS))
+            goto error;
+        }
+
+      /* Parse default value. */
+      if (match_macro_id (lexer, "!DEFAULT"))
+        {
+          if (!lex_force_match (lexer, T_LPAREN))
+            goto error;
+
+          /* XXX Should this handle balanced inner parentheses? */
+          while (!lex_match (lexer, T_RPAREN))
+            {
+              if (lex_token (lexer) == T_ENDCMD)
+                {
+                  lex_error_expecting (lexer, ")");
+                  goto error;
+                }
+              char *syntax = lex_next_representation (lexer, 0, 0);
+              const struct macro_token mt = {
+                .token = *lex_next (lexer, 0),
+                .representation = ss_cstr (syntax),
+              };
+              macro_tokens_add (&p->def, &mt);
+              free (syntax);
+
+              lex_get (lexer);
+            }
+        }
+
+      if (match_macro_id (lexer, "!NOEXPAND"))
+        p->expand_arg = false;
+
+      if (match_macro_id (lexer, "!TOKENS"))
+        {
+          if (!lex_force_match (lexer, T_LPAREN)
+              || !lex_force_int_range (lexer, "!TOKENS", 1, INT_MAX))
+            goto error;
+          p->arg_type = ARG_N_TOKENS;
+          p->n_tokens = lex_integer (lexer);
+          lex_get (lexer);
+          if (!lex_force_match (lexer, T_RPAREN))
+            goto error;
+        }
+      else if (match_macro_id (lexer, "!CHAREND"))
+        {
+          p->arg_type = ARG_CHAREND;
+          p->charend = (struct token) { .type = T_STOP };
+
+          if (!lex_force_match (lexer, T_LPAREN)
+              || !parse_quoted_token (lexer, &p->charend)
+              || !lex_force_match (lexer, T_RPAREN))
+            goto error;
+        }
+      else if (match_macro_id (lexer, "!ENCLOSE"))
+        {
+          p->arg_type = ARG_ENCLOSE;
+          p->enclose[0] = p->enclose[1] = (struct token) { .type = T_STOP };
+
+          if (!lex_force_match (lexer, T_LPAREN)
+              || !parse_quoted_token (lexer, &p->enclose[0])
+              || !lex_force_match (lexer, T_COMMA)
+              || !parse_quoted_token (lexer, &p->enclose[1])
+              || !lex_force_match (lexer, T_RPAREN))
+            goto error;
+        }
+      else if (match_macro_id (lexer, "!CMDEND"))
+        p->arg_type = ARG_CMDEND;
+      else
+        {
+          lex_error_expecting (lexer, "!TOKENS", "!CHAREND",
+                               "!ENCLOSE", "!CMDEND");
+          goto error;
+        }
+
+      if (lex_token (lexer) != T_RPAREN && !lex_force_match (lexer, T_SLASH))
+        goto error;
+    }
+
+  struct string body = DS_EMPTY_INITIALIZER;
+  while (!match_macro_id (lexer, "!ENDDEFINE"))
+    {
+      if (lex_token (lexer) != T_STRING)
+        {
+          lex_error (lexer, _("Expecting macro body or !ENDDEFINE"));
+          ds_destroy (&body);
+          goto error;
+        }
+
+      ds_put_substring (&body, lex_tokss (lexer));
+      ds_put_byte (&body, '\n');
+      lex_get (lexer);
+    }
+
+  macro_tokens_from_string (&m->body, body.ss, lex_get_syntax_mode (lexer));
+  ds_destroy (&body);
+
+  lex_define_macro (lexer, m);
+
+  return CMD_SUCCESS;
+
+error:
+  macro_destroy (m);
+  return CMD_FAILURE;
+}
+
+int
+cmd_debug_expand (struct lexer *lexer, struct dataset *ds UNUSED)
+{
+  settings_set_mprint (true);
+
+  while (lex_token (lexer) != T_STOP)
+    {
+      if (!lex_next_is_from_macro (lexer, 0) && lex_token (lexer) != T_ENDCMD)
+        {
+          char *rep = lex_next_representation (lexer, 0, 0);
+          msg (MN, "unexpanded token \"%s\"", rep);
+          free (rep);
+        }
+      lex_get (lexer);
+    }
+  return CMD_SUCCESS;
+}
diff --git a/src/language/control/repeat.c b/src/language/control/repeat.c

index 118e8d3ccd4fd8c56c075c9512d735945f4e6cd9..86dd36f7f0c8a878d2eac529d152723ec86c1040 100644 (file)
--- a/src/language/control/repeat.c
+++ b/src/language/control/repeat.c
@@ -201,10 +201,7 @@ do_parse_commands (struct substring s, enum segmenter_mode mode,
                     struct hmap *dummies,
                     struct string *outputs, size_t n_outputs)
  {
-  struct segmenter segmenter;
-
-  segmenter_init (&segmenter, mode);
-
+  struct segmenter segmenter = segmenter_init (mode, false);
    while (!ss_is_empty (s))
      {
        enum segment_type type;
diff --git a/src/language/lexer/automake.mk b/src/language/lexer/automake.mk

index 4387c3dd223b77e879a57b99bfc3541100ee7475..01b3df49c6cb62745a2df2902110a9f652766777 100644 (file)
--- a/src/language/lexer/automake.mk
+++ b/src/language/lexer/automake.mk
@@ -24,6 +24,8 @@ language_lexer_sources = \
         src/language/lexer/include-path.h \
         src/language/lexer/lexer.c \
         src/language/lexer/lexer.h \
+       src/language/lexer/macro.c \
+       src/language/lexer/macro.h \
         src/language/lexer/format-parser.c \
         src/language/lexer/format-parser.h \
         src/language/lexer/scan.c \
diff --git a/src/language/lexer/lexer.c b/src/language/lexer/lexer.c

index c14bc6acb84d0fdbf95d7965263748374029f213..2823c24ab3524f19d411eaea7b9d82ce4a985665 100644 (file)
--- a/src/language/lexer/lexer.c
+++ b/src/language/lexer/lexer.c
@@ -31,6 +31,7 @@
  #include <uniwidth.h>
  
  #include "language/command.h"
+#include "language/lexer/macro.h"
  #include "language/lexer/scan.h"
  #include "language/lexer/segment.h"
  #include "language/lexer/token.h"
@@ -61,14 +62,40 @@ struct lex_token
      /* The regular token information. */
      struct token token;
  
-    /* Location of token in terms of the lex_source's buffer.
+    /* For a token obtained through the lexer in an ordinary way, this is the
+       location of the token in terms of the lex_source's buffer.
+
+       For a token produced through macro expansion, this is the entire macro
+       call.
+
         src->tail <= line_pos <= token_pos <= src->head. */
      size_t token_pos;           /* Start of token. */
      size_t token_len;           /* Length of source for token in bytes. */
      size_t line_pos;            /* Start of line containing token_pos. */
      int first_line;             /* Line number at token_pos. */
+
+    /* For a token obtained through macro expansion, this is just this token. */
+    char *macro_rep;        /* The whole macro expansion. */
+    size_t ofs;             /* Offset of this token in macro_rep. */
+    size_t len;             /* Length of this token in macro_rep. */
+    size_t *ref_cnt;        /* Number of lex_tokens that refer to macro_rep. */
    };
  
+static void
+lex_token_uninit (struct lex_token *t)
+{
+  token_uninit (&t->token);
+  if (t->ref_cnt)
+    {
+      assert (*t->ref_cnt > 0);
+      if (!--*t->ref_cnt)
+        {
+          free (t->macro_rep);
+          free (t->ref_cnt);
+        }
+    }
+}
+
  /* A source of tokens, corresponding to a syntax file.
  
     This is conceptually a lex_reader wrapped with everything needed to convert
@@ -77,6 +104,7 @@ struct lex_source
    {
      struct ll ll;               /* In lexer's list of sources. */
      struct lex_reader *reader;
+    struct lexer *lexer;
      struct segmenter segmenter;
      bool eof;                   /* True if T_STOP was read from 'reader'. */
  
@@ -99,23 +127,25 @@ struct lex_source
      struct lex_token *tokens;   /* Lookahead tokens for parser. */
    };
  
-static struct lex_source *lex_source_create (struct lex_reader *);
+static struct lex_source *lex_source_create (struct lexer *,
+                                             struct lex_reader *);
  static void lex_source_destroy (struct lex_source *);
  
  /* Lexer. */
  struct lexer
    {
      struct ll_list sources;     /* Contains "struct lex_source"s. */
+    struct macro_set *macros;
    };
  
  static struct lex_source *lex_source__ (const struct lexer *);
-static struct substring lex_source_get_syntax__ (const struct lex_source *,
-                                                 int n0, int n1);
+static char *lex_source_get_syntax__ (const struct lex_source *,
+                                      int n0, int n1);
  static const struct lex_token *lex_next__ (const struct lexer *, int n);
  static void lex_source_push_endcmd__ (struct lex_source *);
  
  static void lex_source_pop__ (struct lex_source *);
-static bool lex_source_get__ (const struct lex_source *);
+static bool lex_source_get (const struct lex_source *);
  static void lex_source_error_valist (struct lex_source *, int n0, int n1,
                                       const char *format, va_list)
     PRINTF_FORMAT (4, 0);
@@ -150,8 +180,11 @@ lex_reader_set_file_name (struct lex_reader *reader, const char *file_name)
  struct lexer *
  lex_create (void)
  {
-  struct lexer *lexer = xzalloc (sizeof *lexer);
-  ll_init (&lexer->sources);
+  struct lexer *lexer = xmalloc (sizeof *lexer);
+  *lexer = (struct lexer) {
+    .sources = LL_INITIALIZER (lexer->sources),
+    .macros = macro_set_create (),
+  };
    return lexer;
  }
  
@@ -165,10 +198,19 @@ lex_destroy (struct lexer *lexer)
  
        ll_for_each_safe (source, next, struct lex_source, ll, &lexer->sources)
          lex_source_destroy (source);
+      macro_set_destroy (lexer->macros);
        free (lexer);
      }
  }
  
+/* Adds M to LEXER's set of macros.  M replaces any existing macro with the
+   same name.  Takes ownership of M. */
+void
+lex_define_macro (struct lexer *lexer, struct macro *m)
+{
+  macro_set_add (lexer->macros, m);
+}
+
  /* Inserts READER into LEXER so that the next token read by LEXER comes from
     READER.  Before the caller, LEXER must either be empty or at a T_ENDCMD
     token. */
@@ -176,7 +218,7 @@ void
  lex_include (struct lexer *lexer, struct lex_reader *reader)
  {
    assert (ll_is_empty (&lexer->sources) || lex_token (lexer) == T_ENDCMD);
-  ll_push_head (&lexer->sources, &lex_source_create (reader)->ll);
+  ll_push_head (&lexer->sources, &lex_source_create (lexer, reader)->ll);
  }
  
  /* Appends READER to LEXER, so that it will be read after all other current
@@ -184,7 +226,7 @@ lex_include (struct lexer *lexer, struct lex_reader *reader)
  void
  lex_append (struct lexer *lexer, struct lex_reader *reader)
  {
-  ll_push_tail (&lexer->sources, &lex_source_create (reader)->ll);
+  ll_push_tail (&lexer->sources, &lex_source_create (lexer, reader)->ll);
  }
  \f
  /* Advancing. */
@@ -198,20 +240,22 @@ lex_push_token__ (struct lex_source *src)
      src->tokens = deque_expand (&src->deque, src->tokens, sizeof *src->tokens);
  
    token = &src->tokens[deque_push_front (&src->deque)];
-  token_init (&token->token);
+  token->token = (struct token) { .type = T_STOP };
+  token->macro_rep = NULL;
+  token->ref_cnt = NULL;
    return token;
  }
  
  static void
  lex_source_pop__ (struct lex_source *src)
  {
-  token_uninit (&src->tokens[deque_pop_back (&src->deque)].token);
+  lex_token_uninit (&src->tokens[deque_pop_back (&src->deque)]);
  }
  
  static void
  lex_source_pop_front (struct lex_source *src)
  {
-  token_uninit (&src->tokens[deque_pop_front (&src->deque)].token);
+  lex_token_uninit (&src->tokens[deque_pop_front (&src->deque)]);
  }
  
  /* Advances LEXER to the next token, consuming the current token. */
@@ -228,7 +272,7 @@ lex_get (struct lexer *lexer)
      lex_source_pop__ (src);
  
    while (deque_is_empty (&src->deque))
-    if (!lex_source_get__ (src))
+    if (!lex_source_get (src))
        {
          lex_source_destroy (src);
          src = lex_source__ (lexer);
@@ -852,13 +896,17 @@ lex_next__ (const struct lexer *lexer_, int n)
      return lex_source_next__ (src, n);
    else
      {
-      static const struct lex_token stop_token =
-        { TOKEN_INITIALIZER (T_STOP, 0.0, ""), 0, 0, 0, 0 };
-
+      static const struct lex_token stop_token = { .token = { .type = T_STOP } };
        return &stop_token;
      }
  }
  
+static const struct lex_token *
+lex_source_front (const struct lex_source *src)
+{
+  return &src->tokens[deque_front (&src->deque, 0)];
+}
+
  static const struct lex_token *
  lex_source_next__ (const struct lex_source *src, int n)
  {
@@ -866,14 +914,12 @@ lex_source_next__ (const struct lex_source *src, int n)
      {
        if (!deque_is_empty (&src->deque))
          {
-          struct lex_token *front;
-
-          front = &src->tokens[deque_front (&src->deque, 0)];
+          const struct lex_token *front = lex_source_front (src);
            if (front->token.type == T_STOP || front->token.type == T_ENDCMD)
              return front;
          }
  
-      lex_source_get__ (src);
+      lex_source_get (src);
      }
  
    return &src->tokens[deque_back (&src->deque, n)];
@@ -940,15 +986,22 @@ lex_next_tokss (const struct lexer *lexer, int n)
  /* Returns the text of the syntax in tokens N0 ahead of the current one,
     through N1 ahead of the current one, inclusive.  (For example, if N0 and N1
     are both zero, this requests the syntax for the current token.)  The caller
-   must not modify or free the returned string.  The syntax is encoded in UTF-8
-   and in the original form supplied to the lexer so that, for example, it may
-   include comments, spaces, and new-lines if it spans multiple tokens. */
-struct substring
+   must eventually free the returned string (with free()).  The syntax is
+   encoded in UTF-8 and in the original form supplied to the lexer so that, for
+   example, it may include comments, spaces, and new-lines if it spans multiple
+   tokens.  Macro expansion, however, has already been performed. */
+char *
  lex_next_representation (const struct lexer *lexer, int n0, int n1)
  {
    return lex_source_get_syntax__ (lex_source__ (lexer), n0, n1);
  }
  
+bool
+lex_next_is_from_macro (const struct lexer *lexer, int n)
+{
+  return lex_next__ (lexer, n)->macro_rep != NULL;
+}
+
  static bool
  lex_tokens_match (const struct token *actual, const struct token *expected)
  {
@@ -988,7 +1041,7 @@ lex_match_phrase (struct lexer *lexer, const char *s)
    int i;
  
    i = 0;
-  string_lexer_init (&slex, s, strlen (s), SEG_MODE_INTERACTIVE);
+  string_lexer_init (&slex, s, strlen (s), SEG_MODE_INTERACTIVE, true);
    while (string_lexer_next (&slex, &token))
      if (token.type != SCAN_SKIP)
        {
@@ -1164,7 +1217,6 @@ lex_get_encoding (const struct lexer *lexer)
    return src == NULL ? NULL : src->reader->encoding;
  }
  
-
  /* Returns the syntax mode for the syntax file from which the current drawn is
     drawn.  Returns SEG_MODE_AUTO for a T_STOP token or if the command's source
     does not have line numbers.
@@ -1210,7 +1262,8 @@ lex_interactive_reset (struct lexer *lexer)
        src->journal_pos = src->seg_pos = src->line_pos = 0;
        src->n_newlines = 0;
        src->suppress_next_newline = false;
-      segmenter_init (&src->segmenter, segmenter_get_mode (&src->segmenter));
+      src->segmenter = segmenter_init (segmenter_get_mode (&src->segmenter),
+                                       false);
        while (!deque_is_empty (&src->deque))
          lex_source_pop__ (src);
        lex_source_push_endcmd__ (src);
@@ -1323,23 +1376,46 @@ lex_source__ (const struct lexer *lexer)
            : ll_data (ll_head (&lexer->sources), struct lex_source, ll));
  }
  
-static struct substring
-lex_tokens_get_syntax__ (const struct lex_source *src,
-                         const struct lex_token *token0,
-                         const struct lex_token *token1)
+static char *
+lex_source_get_syntax__ (const struct lex_source *src, int n0, int n1)
  {
-  size_t start = token0->token_pos;
-  size_t end = token1->token_pos + token1->token_len;
+  struct string s = DS_EMPTY_INITIALIZER;
+  for (size_t i = n0; i <= n1; )
+    {
+      /* Find [I,J) as the longest sequence of tokens not produced by macro
+         expansion, or otherwise the longest sequence expanded from a single
+         macro call. */
+      const struct lex_token *first = lex_source_next__ (src, i);
+      size_t j;
+      for (j = i + 1; j <= n1; j++)
+        {
+          const struct lex_token *cur = lex_source_next__ (src, j);
+          if ((first->macro_rep != NULL) != (cur->macro_rep != NULL)
+              || first->macro_rep != cur->macro_rep)
+            break;
+        }
+      const struct lex_token *last = lex_source_next__ (src, j - 1);
  
-  return ss_buffer (&src->buffer[start - src->tail], end - start);
-}
+      if (!ds_is_empty (&s))
+        ds_put_byte (&s, ' ');
+      if (!first->macro_rep)
+        {
+          size_t start = first->token_pos;
+          size_t end = last->token_pos + last->token_len;
+          ds_put_substring (&s, ss_buffer (&src->buffer[start - src->tail],
+                                           end - start));
+        }
+      else
+        {
+          size_t start = first->ofs;
+          size_t end = last->ofs + last->len;
+          ds_put_substring (&s, ss_buffer (first->macro_rep + start,
+                                           end - start));
+        }
  
-static struct substring
-lex_source_get_syntax__ (const struct lex_source *src, int n0, int n1)
-{
-  return lex_tokens_get_syntax__ (src,
-                                  lex_source_next__ (src, n0),
-                                  lex_source_next__ (src, MAX (n0, n1)));
+      i = j;
+    }
+  return ds_steal_cstr (&s);
  }
  
  static void
@@ -1377,6 +1453,29 @@ lex_ellipsize__ (struct substring in, char *out, size_t out_size)
    strcpy (&out[out_len], out_len < in.length ? "..." : "");
  }
  
+static bool
+lex_source_contains_macro_call (struct lex_source *src, int n0, int n1)
+{
+  for (size_t i = n0; i <= n1; i++)
+    if (lex_source_next__ (src, i)->macro_rep)
+      return true;
+  return false;
+}
+
+static struct substring
+lex_source_get_macro_call (struct lex_source *src, int n0, int n1)
+{
+  if (!lex_source_contains_macro_call (src, n0, n1))
+    return ss_empty ();
+
+  const struct lex_token *token0 = lex_source_next__ (src, n0);
+  const struct lex_token *token1 = lex_source_next__ (src, MAX (n0, n1));
+  size_t start = token0->token_pos;
+  size_t end = token1->token_pos + token1->token_len;
+
+  return ss_buffer (&src->buffer[start - src->tail], end - start);
+}
+
  static void
  lex_source_error_valist (struct lex_source *src, int n0, int n1,
                           const char *format, va_list args)
@@ -1391,14 +1490,30 @@ lex_source_error_valist (struct lex_source *src, int n0, int n1,
      ds_put_cstr (&s, _("Syntax error at end of command"));
    else
      {
-      struct substring syntax = lex_source_get_syntax__ (src, n0, n1);
-      if (!ss_is_empty (syntax))
+      /* Get the syntax that caused the error. */
+      char *syntax = lex_source_get_syntax__ (src, n0, n1);
+      char syntax_cstr[64];
+      lex_ellipsize__ (ss_cstr (syntax), syntax_cstr, sizeof syntax_cstr);
+      free (syntax);
+
+      /* Get the macro call(s) that expanded to the syntax that caused the
+         error. */
+      char call_cstr[64];
+      struct substring call = lex_source_get_macro_call (src, n0, n1);
+      lex_ellipsize__ (call, call_cstr, sizeof call_cstr);
+
+      if (syntax_cstr[0])
          {
-          char syntax_cstr[64];
-
-          lex_ellipsize__ (syntax, syntax_cstr, sizeof syntax_cstr);
-          ds_put_format (&s, _("Syntax error at `%s'"), syntax_cstr);
+          if (call_cstr[0])
+            ds_put_format (&s, _("Syntax error at `%s' "
+                                 "(in expansion of `%s')"),
+                           syntax_cstr, call_cstr);
+          else
+            ds_put_format (&s, _("Syntax error at `%s'"), syntax_cstr);
          }
+      else if (call_cstr[0])
+        ds_put_format (&s, _("Syntax error in syntax expanded from `%s'"),
+                       call_cstr);
        else
          ds_put_cstr (&s, _("Syntax error"));
      }
@@ -1440,16 +1555,11 @@ lex_get_error (struct lex_source *src, const char *format, ...)
  }
  
  /* Attempts to append an additional token into SRC's deque, reading more from
-   the underlying lex_reader if necessary.  Returns true if successful, false
-   if the deque already represents (a suffix of) the whole lex_reader's
-   contents, */
+   the underlying lex_reader if necessary.  Returns true if a new token was
+   added to SRC's deque, false otherwise. */
  static bool
-lex_source_get__ (const struct lex_source *src_)
+lex_source_try_get (struct lex_source *src)
  {
-  struct lex_source *src = CONST_CAST (struct lex_source *, src_);
-  if (src->eof)
-    return false;
-
    /* State maintained while scanning tokens.  Usually we only need a single
       state, but scanner_push() can return SCAN_SAVE to indicate that the state
       needs to be saved and possibly restored later with SCAN_BACK. */
@@ -1580,80 +1690,182 @@ lex_source_get__ (const struct lex_source *src_)
    switch (token->token.type)
      {
      default:
-      break;
+      return true;
  
      case T_STOP:
        token->token.type = T_ENDCMD;
        src->eof = true;
-      break;
+      return true;
  
      case SCAN_BAD_HEX_LENGTH:
        lex_get_error (src, _("String of hex digits has %d characters, which "
                              "is not a multiple of 2"),
                       (int) token->token.number);
-      break;
+      return false;
  
      case SCAN_BAD_HEX_DIGIT:
      case SCAN_BAD_UNICODE_DIGIT:
        lex_get_error (src, _("`%c' is not a valid hex digit"),
                       (int) token->token.number);
-      break;
+      return false;
  
      case SCAN_BAD_UNICODE_LENGTH:
        lex_get_error (src, _("Unicode string contains %d bytes, which is "
                              "not in the valid range of 1 to 8 bytes"),
                       (int) token->token.number);
-      break;
+      return false;
  
      case SCAN_BAD_UNICODE_CODE_POINT:
        lex_get_error (src, _("U+%04X is not a valid Unicode code point"),
                       (int) token->token.number);
-      break;
+      return false;
  
      case SCAN_EXPECTED_QUOTE:
        lex_get_error (src, _("Unterminated string constant"));
-      break;
+      return false;
  
      case SCAN_EXPECTED_EXPONENT:
        lex_get_error (src, _("Missing exponent following `%s'"),
                       token->token.string.string);
-      break;
+      return false;
  
      case SCAN_UNEXPECTED_CHAR:
        {
          char c_name[16];
          lex_get_error (src, _("Bad character %s in input"),
                         uc_name (token->token.number, c_name));
+        return false;
        }
-      break;
  
      case SCAN_SKIP:
        lex_source_pop_front (src);
-      break;
+      return false;
      }
  
+  NOT_REACHED ();
+}
+
+static bool
+lex_source_get__ (struct lex_source *src)
+{
+  for (;;)
+    {
+      if (src->eof)
+        return false;
+      else if (lex_source_try_get (src))
+        return true;
+    }
+}
+
+static bool
+lex_source_get (const struct lex_source *src_)
+{
+  struct lex_source *src = CONST_CAST (struct lex_source *, src_);
+
+  size_t old_count = deque_count (&src->deque);
+  if (!lex_source_get__ (src))
+    return false;
+
+  if (!settings_get_mexpand ())
+    return true;
+
+  struct macro_expander *me;
+  int retval = macro_expander_create (src->lexer->macros,
+                                      &lex_source_front (src)->token,
+                                      &me);
+  while (!retval)
+    {
+      if (!lex_source_get__ (src))
+        {
+          /* This should not be reachable because we always get a T_ENDCMD at
+             the end of an input file (transformed from T_STOP by
+             lex_source_try_get()) and the macro_expander should always
+             terminate expansion on T_ENDCMD. */
+          NOT_REACHED ();
+        }
+
+      const struct lex_token *front = lex_source_front (src);
+      size_t start = front->token_pos;
+      size_t end = front->token_pos + front->token_len;
+      const struct macro_token mt = {
+        .token = front->token,
+        .representation = ss_buffer (&src->buffer[start - src->tail],
+                                     end - start),
+      };
+      retval = macro_expander_add (me, &mt);
+    }
+  if (retval < 0)
+    {
+      /* XXX handle case where there's a macro invocation starting from some
+         later token we've already obtained */
+      macro_expander_destroy (me);
+      return true;
+    }
+
+  /* XXX handle case where the macro invocation doesn't use all the tokens */
+  const struct lex_token *call_first = lex_source_next__ (src, old_count);
+  const struct lex_token *call_last = lex_source_front (src);
+  size_t call_pos = call_first->token_pos;
+  size_t call_len = (call_last->token_pos + call_last->token_len) - call_pos;
+  size_t line_pos = call_first->line_pos;
+  int first_line = call_first->first_line;
+  while (deque_count (&src->deque) > old_count)
+    lex_source_pop_front (src);
+
+  struct macro_tokens expansion = { .n = 0 };
+  macro_expander_get_expansion (me, &expansion);
+  macro_expander_destroy (me);
+
+  size_t *ofs = xnmalloc (expansion.n, sizeof *ofs);
+  size_t *len = xnmalloc (expansion.n, sizeof *len);
+  struct string s = DS_EMPTY_INITIALIZER;
+  macro_tokens_to_representation (&expansion, &s, ofs, len);
+
+  if (settings_get_mprint ())
+    output_item_submit (text_item_create (TEXT_ITEM_LOG, ds_cstr (&s),
+                                          _("Macro Expansion")));
+
+  char *macro_rep = ds_steal_cstr (&s);
+  size_t *ref_cnt = xmalloc (sizeof *ref_cnt);
+  *ref_cnt = expansion.n;
+  for (size_t i = 0; i < expansion.n; i++)
+    {
+      *lex_push_token__ (src) = (struct lex_token) {
+        .token = expansion.mts[i].token,
+        .token_pos = call_pos,
+        .token_len = call_len,
+        .line_pos = line_pos,
+        .first_line = first_line,
+        .macro_rep = macro_rep,
+        .ofs = ofs[i],
+        .len = len[i],
+        .ref_cnt = ref_cnt,
+      };
+
+      ss_dealloc (&expansion.mts[i].representation);
+    }
+  free (expansion.mts);
+  free (ofs);
+  free (len);
+
    return true;
  }
  \f
  static void
  lex_source_push_endcmd__ (struct lex_source *src)
  {
-  struct lex_token *token = lex_push_token__ (src);
-  token->token.type = T_ENDCMD;
-  token->token_pos = 0;
-  token->token_len = 0;
-  token->line_pos = 0;
-  token->first_line = 0;
+  *lex_push_token__ (src) = (struct lex_token) { .token = { .type = T_ENDCMD } };
  }
  
  static struct lex_source *
-lex_source_create (struct lex_reader *reader)
+lex_source_create (struct lexer *lexer, struct lex_reader *reader)
  {
    struct lex_source *src;
  
    src = xzalloc (sizeof *src);
    src->reader = reader;
-  segmenter_init (&src->segmenter, reader->syntax);
+  src->segmenter = segmenter_init (reader->syntax, false);
+  src->lexer = lexer;
    src->tokens = deque_init (&src->deque, 4, sizeof *src->tokens);
  
    lex_source_push_endcmd__ (src);
diff --git a/src/language/lexer/lexer.h b/src/language/lexer/lexer.h

index f57ee822ed0abb9040f19f337f5714c05016e441..9d7973c2ad1e4afc9e1386f21d2d3cc4d174e153 100644 (file)
--- a/src/language/lexer/lexer.h
+++ b/src/language/lexer/lexer.h
@@ -29,6 +29,7 @@
  #include "libpspp/prompt.h"
  
  struct lexer;
+struct macro;
  
  /* Handling of errors. */
  enum lex_error_mode
@@ -90,6 +91,9 @@ struct lex_reader *lex_reader_for_substring_nocopy (struct substring, const char
  struct lexer *lex_create (void);
  void lex_destroy (struct lexer *);
  
+/* Macros. */
+void lex_define_macro (struct lexer *, struct macro *);
+
  /* Files. */
  void lex_include (struct lexer *, struct lex_reader *);
  void lex_append (struct lexer *, struct lex_reader *);
@@ -143,8 +147,8 @@ double lex_next_tokval (const struct lexer *, int n);
  struct substring lex_next_tokss (const struct lexer *, int n);
  
  /* Token representation. */
-struct substring lex_next_representation (const struct lexer *,
-                                          int n0, int n1);
+char *lex_next_representation (const struct lexer *, int n0, int n1);
+bool lex_next_is_from_macro (const struct lexer *, int n);
  
  /* Current position. */
  int lex_get_first_line_number (const struct lexer *, int n);
diff --git a/src/language/lexer/macro.c b/src/language/lexer/macro.c

new file mode 100644 (file)

index 0000000..bee23c9
--- /dev/null
+++ b/src/language/lexer/macro.c
@@ -0,0 +1,1916 @@
+/* PSPP - a program for statistical analysis.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation, either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>. */
+
+#include <config.h>
+
+#include "language/lexer/macro.h"
+
+#include <errno.h>
+#include <limits.h>
+#include <stdlib.h>
+
+#include "data/settings.h"
+#include "language/lexer/segment.h"
+#include "language/lexer/scan.h"
+#include "libpspp/assertion.h"
+#include "libpspp/cast.h"
+#include "libpspp/i18n.h"
+#include "libpspp/message.h"
+#include "libpspp/str.h"
+#include "libpspp/string-array.h"
+#include "libpspp/string-map.h"
+#include "libpspp/stringi-set.h"
+
+#include "gl/c-ctype.h"
+#include "gl/ftoastr.h"
+
+#include "gettext.h"
+#define _(msgid) gettext (msgid)
+
+void
+macro_token_copy (struct macro_token *dst, const struct macro_token *src)
+{
+  token_copy (&dst->token, &src->token);
+  ss_alloc_substring (&dst->representation, src->representation);
+}
+
+void
+macro_token_uninit (struct macro_token *mt)
+{
+  token_uninit (&mt->token);
+  ss_dealloc (&mt->representation);
+}
+
+void
+macro_token_to_representation (struct macro_token *mt, struct string *s)
+{
+  ds_put_substring (s, mt->representation);
+}
+
+bool
+is_macro_keyword (struct substring s)
+{
+  static struct stringi_set keywords = STRINGI_SET_INITIALIZER (keywords);
+  if (stringi_set_is_empty (&keywords))
+    {
+      static const char *kws[] = {
+        "BREAK",
+        "CHAREND",
+        "CMDEND",
+        "DEFAULT",
+        "DO",
+        "DOEND",
+        "ELSE",
+        "ENCLOSE",
+        "ENDDEFINE",
+        "IF",
+        "IFEND",
+        "IN",
+        "LET",
+        "NOEXPAND",
+        "OFFEXPAND",
+        "ONEXPAND",
+        "POSITIONAL",
+        "THEN",
+        "TOKENS",
+      };
+      for (size_t i = 0; i < sizeof kws / sizeof *kws; i++)
+        stringi_set_insert (&keywords, kws[i]);
+    }
+
+  ss_ltrim (&s, ss_cstr ("!"));
+  return stringi_set_contains_len (&keywords, s.string, s.length);
+}
+
+void
+macro_tokens_copy (struct macro_tokens *dst, const struct macro_tokens *src)
+{
+  *dst = (struct macro_tokens) {
+    .mts = xmalloc (src->n * sizeof *dst->mts),
+    .n = src->n,
+    .allocated = src->n,
+  };
+  for (size_t i = 0; i < src->n; i++)
+    macro_token_copy (&dst->mts[i], &src->mts[i]);
+}
+
+void
+macro_tokens_uninit (struct macro_tokens *mts)
+{
+  for (size_t i = 0; i < mts->n; i++)
+    macro_token_uninit (&mts->mts[i]);
+  free (mts->mts);
+}
+
+struct macro_token *
+macro_tokens_add_uninit (struct macro_tokens *mts)
+{
+  if (mts->n >= mts->allocated)
+    mts->mts = x2nrealloc (mts->mts, &mts->allocated, sizeof *mts->mts);
+  return &mts->mts[mts->n++];
+}
+
+void
+macro_tokens_add (struct macro_tokens *mts, const struct macro_token *mt)
+{
+  macro_token_copy (macro_tokens_add_uninit (mts), mt);
+}
+
+void
+macro_tokens_from_string (struct macro_tokens *mts, const struct substring src,
+                          enum segmenter_mode mode)
+{
+  struct state
+    {
+      struct segmenter segmenter;
+      struct substring body;
+    };
+
+  struct state state = {
+    .segmenter = segmenter_init (mode, true),
+    .body = src,
+  };
+  struct state saved = state;
+
+  while (state.body.length > 0)
+    {
+      struct macro_token mt = {
+        .token = { .type = T_STOP },
+        .representation = { .string = state.body.string },
+      };
+      struct token *token = &mt.token;
+
+      struct scanner scanner;
+      scanner_init (&scanner, token);
+
+      for (;;)
+        {
+          enum segment_type type;
+          int seg_len = segmenter_push (&state.segmenter, state.body.string,
+                                        state.body.length, true, &type);
+          assert (seg_len >= 0);
+
+          struct substring segment = ss_head (state.body, seg_len);
+          ss_advance (&state.body, seg_len);
+
+          enum scan_result result = scanner_push (&scanner, type, segment, token);
+          if (result == SCAN_SAVE)
+            saved = state;
+          else if (result == SCAN_BACK)
+            {
+              state = saved;
+              break;
+            }
+          else if (result == SCAN_DONE)
+            break;
+        }
+
+      /* We have a token in 'token'. */
+      if (is_scan_type (token->type))
+        {
+          if (token->type != SCAN_SKIP)
+            {
+              printf ("error\n");
+              /* XXX report error */
+            }
+        }
+      else
+        {
+          mt.representation.length = state.body.string - mt.representation.string;
+          macro_tokens_add (mts, &mt);
+        }
+      token_uninit (token);
+    }
+}
+
+void
+macro_tokens_print (const struct macro_tokens *mts, FILE *stream)
+{
+  for (size_t i = 0; i < mts->n; i++)
+    token_print (&mts->mts[i].token, stream);
+}
+
+enum token_class
+  {
+    TC_ENDCMD,                  /* No space before or after (new-line after). */
+    TC_BINOP,                   /* Space on both sides. */
+    TC_COMMA,                   /* Space afterward. */
+    TC_ID,                      /* Don't need spaces except sequentially. */
+    TC_PUNCT,                   /* Don't need spaces except sequentially. */
+  };
+
+static bool
+needs_space (enum token_class prev, enum token_class next)
+{
+  /* Don't need a space before or after the end of a command.
+     (A new-line is needed afterward as a special case.) */
+  if (prev == TC_ENDCMD || next == TC_ENDCMD)
+    return false;
+
+  /* Binary operators always have a space on both sides. */
+  if (prev == TC_BINOP || next == TC_BINOP)
+    return true;
+
+  /* A comma always has a space afterward. */
+  if (prev == TC_COMMA)
+    return true;
+
+  /* Otherwise, PREV is TC_ID or TC_PUNCT, which only need a space if there are
+     two or them in a row. */
+  return prev == next;
+}
+
+static enum token_class
+classify_token (enum token_type type)
+{
+  switch (type)
+    {
+    case T_ID:
+    case T_MACRO_ID:
+    case T_POS_NUM:
+    case T_NEG_NUM:
+    case T_STRING:
+      return TC_ID;
+
+    case T_STOP:
+      return TC_PUNCT;
+
+    case T_ENDCMD:
+      return TC_ENDCMD;
+
+    case T_LPAREN:
+    case T_RPAREN:
+    case T_LBRACK:
+    case T_RBRACK:
+      return TC_PUNCT;
+
+    case T_PLUS:
+    case T_DASH:
+    case T_ASTERISK:
+    case T_SLASH:
+    case T_EQUALS:
+    case T_AND:
+    case T_OR:
+    case T_NOT:
+    case T_EQ:
+    case T_GE:
+    case T_GT:
+    case T_LE:
+    case T_LT:
+    case T_NE:
+    case T_ALL:
+    case T_BY:
+    case T_TO:
+    case T_WITH:
+    case T_EXP:
+    case T_MACRO_PUNCT:
+      return TC_BINOP;
+
+    case T_COMMA:
+      return TC_COMMA;
+    }
+
+  NOT_REACHED ();
+}
+
+void
+macro_tokens_to_representation (struct macro_tokens *mts, struct string *s,
+                                size_t *ofs, size_t *len)
+{
+  assert ((ofs != NULL) == (len != NULL));
+
+  if (!mts->n)
+    return;
+
+  for (size_t i = 0; i < mts->n; i++)
+    {
+      if (i > 0)
+        {
+          enum token_type prev = mts->mts[i - 1].token.type;
+          enum token_type next = mts->mts[i].token.type;
+
+          if (prev == T_ENDCMD)
+            ds_put_byte (s, '\n');
+          else
+            {
+              enum token_class pc = classify_token (prev);
+              enum token_class nc = classify_token (next);
+              if (needs_space (pc, nc))
+                ds_put_byte (s, ' ');
+            }
+        }
+
+      if (ofs)
+        ofs[i] = s->ss.length;
+      macro_token_to_representation (&mts->mts[i], s);
+      if (len)
+        len[i] = s->ss.length - ofs[i];
+    }
+}
+
+void
+macro_destroy (struct macro *m)
+{
+  if (!m)
+    return;
+
+  free (m->name);
+  for (size_t i = 0; i < m->n_params; i++)
+    {
+      struct macro_param *p = &m->params[i];
+      free (p->name);
+
+      macro_tokens_uninit (&p->def);
+
+      switch (p->arg_type)
+        {
+        case ARG_N_TOKENS:
+          break;
+
+        case ARG_CHAREND:
+          token_uninit (&p->charend);
+          break;
+
+        case ARG_ENCLOSE:
+          token_uninit (&p->enclose[0]);
+          token_uninit (&p->enclose[1]);
+          break;
+
+        case ARG_CMDEND:
+          break;
+        }
+    }
+  free (m->params);
+  macro_tokens_uninit (&m->body);
+  free (m);
+}
+\f
+struct macro_set *
+macro_set_create (void)
+{
+  struct macro_set *set = xmalloc (sizeof *set);
+  *set = (struct macro_set) {
+    .macros = HMAP_INITIALIZER (set->macros),
+  };
+  return set;
+}
+
+void
+macro_set_destroy (struct macro_set *set)
+{
+  if (!set)
+    return;
+
+  struct macro *macro, *next;
+  HMAP_FOR_EACH_SAFE (macro, next, struct macro, hmap_node, &set->macros)
+    {
+      hmap_delete (&set->macros, &macro->hmap_node);
+      macro_destroy (macro);
+    }
+  hmap_destroy (&set->macros);
+  free (set);
+}
+
+static unsigned int
+hash_macro_name (const char *name)
+{
+  return utf8_hash_case_string (name, 0);
+}
+
+static struct macro *
+macro_set_find__ (struct macro_set *set, const char *name)
+{
+  struct macro *macro;
+  HMAP_FOR_EACH_WITH_HASH (macro, struct macro, hmap_node,
+                           hash_macro_name (name), &set->macros)
+    if (!utf8_strcasecmp (macro->name, name))
+      return macro;
+
+  return NULL;
+}
+
+const struct macro *
+macro_set_find (const struct macro_set *set, const char *name)
+{
+  return macro_set_find__ (CONST_CAST (struct macro_set *, set), name);
+}
+
+/* Adds M to SET.  M replaces any existing macro with the same name.  Takes
+   ownership of M. */
+void
+macro_set_add (struct macro_set *set, struct macro *m)
+{
+  struct macro *victim = macro_set_find__ (set, m->name);
+  if (victim)
+    {
+      hmap_delete (&set->macros, &victim->hmap_node);
+      macro_destroy (victim);
+    }
+
+  hmap_insert (&set->macros, &m->hmap_node, hash_macro_name (m->name));
+}
+\f
+enum me_state
+  {
+    /* Error state. */
+    ME_ERROR,
+
+    /* Accumulating tokens in me->params toward the end of any type of
+       argument. */
+    ME_ARG,
+
+    /* Expecting the opening delimiter of an ARG_ENCLOSE argument. */
+    ME_ENCLOSE,
+
+    /* Expecting a keyword for a keyword argument. */
+    ME_KEYWORD,
+
+    /* Expecting an equal sign for a keyword argument. */
+    ME_EQUALS,
+  };
+
+
+struct macro_expander
+  {
+    const struct macro_set *macros;
+
+    enum me_state state;
+    size_t n_tokens;
+
+    const struct macro *macro;
+    struct macro_tokens **args;
+    const struct macro_param *param;
+  };
+
+static int
+me_finished (struct macro_expander *me)
+{
+  for (size_t i = 0; i < me->macro->n_params; i++)
+    if (!me->args[i])
+      {
+        me->args[i] = xmalloc (sizeof *me->args[i]);
+        macro_tokens_copy (me->args[i], &me->macro->params[i].def);
+      }
+  return me->n_tokens;
+}
+
+static int
+me_next_arg (struct macro_expander *me)
+{
+  if (!me->param)
+    {
+      assert (!me->macro->n_params);
+      return me_finished (me);
+    }
+  else if (me->param->positional)
+    {
+      me->param++;
+      if (me->param >= &me->macro->params[me->macro->n_params])
+        return me_finished (me);
+      else
+        {
+          me->state = (!me->param->positional ? ME_KEYWORD
+                       : me->param->arg_type == ARG_ENCLOSE ? ME_ENCLOSE
+                       : ME_ARG);
+          return 0;
+        }
+    }
+  else
+    {
+      for (size_t i = 0; i < me->macro->n_params; i++)
+        if (!me->args[i])
+          {
+            me->state = ME_KEYWORD;
+            return 0;
+          }
+      return me_finished (me);
+    }
+}
+
+static int
+me_error (struct macro_expander *me)
+{
+  me->state = ME_ERROR;
+  return -1;
+}
+
+static int
+me_add_arg (struct macro_expander *me, const struct macro_token *mt)
+{
+  const struct macro_param *p = me->param;
+
+  const struct token *token = &mt->token;
+  if ((token->type == T_ENDCMD || token->type == T_STOP)
+      && p->arg_type != ARG_CMDEND)
+    {
+      msg (SE, _("Unexpected end of command reading argument %s "
+                 "to macro %s."), me->param->name, me->macro->name);
+
+      return me_error (me);
+    }
+
+  me->n_tokens++;
+
+  struct macro_tokens **argp = &me->args[p - me->macro->params];
+  if (!*argp)
+    *argp = xzalloc (sizeof **argp);
+  struct macro_tokens *arg = *argp;
+  if (p->arg_type == ARG_N_TOKENS)
+    {
+      macro_tokens_add (arg, mt);
+      if (arg->n >= p->n_tokens)
+        return me_next_arg (me);
+      return 0;
+    }
+  else if (p->arg_type == ARG_CMDEND)
+    {
+      if (token->type == T_ENDCMD || token->type == T_STOP)
+        return me_next_arg (me);
+      macro_tokens_add (arg, mt);
+      return 0;
+    }
+  else
+    {
+      const struct token *end
+        = p->arg_type == ARG_CHAREND ? &p->charend : &p->enclose[1];
+      if (token_equal (token, end))
+        return me_next_arg (me);
+      macro_tokens_add (arg, mt);
+      return 0;
+    }
+}
+
+static int
+me_expected (struct macro_expander *me, const struct macro_token *actual,
+             const struct token *expected)
+{
+  const struct substring actual_s
+    = (actual->representation.length ? actual->representation
+       : ss_cstr (_("<end of input>")));
+  char *expected_s = token_to_string (expected);
+  msg (SE, _("Found `%.*s' while expecting `%s' reading argument %s "
+             "to macro %s."),
+       (int) actual_s.length, actual_s.string, expected_s,
+       me->param->name, me->macro->name);
+  free (expected_s);
+
+  return me_error (me);
+}
+
+static int
+me_enclose (struct macro_expander *me, const struct macro_token *mt)
+{
+  const struct token *token = &mt->token;
+  me->n_tokens++;
+
+  if (token_equal (&me->param->enclose[0], token))
+    {
+      me->state = ME_ARG;
+      return 0;
+    }
+
+  return me_expected (me, mt, &me->param->enclose[0]);
+}
+
+static const struct macro_param *
+macro_find_parameter_by_name (const struct macro *m, struct substring name)
+{
+  ss_ltrim (&name, ss_cstr ("!"));
+
+  for (size_t i = 0; i < m->n_params; i++)
+    {
+      const struct macro_param *p = &m->params[i];
+      struct substring p_name = ss_cstr (p->name + 1);
+      if (!utf8_strncasecmp (p_name.string, p_name.length,
+                             name.string, name.length))
+        return p;
+    }
+  return NULL;
+}
+
+static int
+me_keyword (struct macro_expander *me, const struct macro_token *mt)
+{
+  const struct token *token = &mt->token;
+  if (token->type != T_ID)
+    return me_finished (me);
+
+  const struct macro_param *p = macro_find_parameter_by_name (me->macro,
+                                                              token->string);
+  if (p)
+    {
+      size_t arg_index = p - me->macro->params;
+      me->param = p;
+      if (me->args[arg_index])
+        {
+          msg (SE,
+               _("Argument %s multiply specified in call to macro %s."),
+               p->name, me->macro->name);
+          return me_error (me);
+        }
+
+      me->n_tokens++;
+      me->state = ME_EQUALS;
+      return 0;
+    }
+
+  return me_finished (me);
+}
+
+static int
+me_equals (struct macro_expander *me, const struct macro_token *mt)
+{
+  const struct token *token = &mt->token;
+  me->n_tokens++;
+
+  if (token->type == T_EQUALS)
+    {
+      me->state = ME_ARG;
+      return 0;
+    }
+
+  return me_expected (me, mt, &(struct token) { .type = T_EQUALS });
+}
+
+int
+macro_expander_create (const struct macro_set *macros,
+                       const struct token *token,
+                       struct macro_expander **mep)
+{
+  *mep = NULL;
+  if (macro_set_is_empty (macros))
+    return -1;
+  if (token->type != T_ID && token->type != T_MACRO_ID)
+    return -1;
+
+  const struct macro *macro = macro_set_find (macros, token->string.string);
+  if (!macro)
+    return -1;
+
+  struct macro_expander *me = xmalloc (sizeof *me);
+  *me = (struct macro_expander) {
+    .macros = macros,
+    .n_tokens = 1,
+    .macro = macro,
+  };
+  *mep = me;
+
+  if (!macro->n_params)
+    return 1;
+  else
+    {
+      me->state = (!macro->params[0].positional ? ME_KEYWORD
+                   : macro->params[0].arg_type == ARG_ENCLOSE ? ME_ENCLOSE
+                   : ME_ARG);
+      me->args = xcalloc (macro->n_params, sizeof *me->args);
+      me->param = macro->params;
+      return 0;
+    }
+}
+
+void
+macro_expander_destroy (struct macro_expander *me)
+{
+  if (!me)
+    return;
+
+  for (size_t i = 0; i < me->macro->n_params; i++)
+    if (me->args[i])
+      {
+        macro_tokens_uninit (me->args[i]);
+        free (me->args[i]);
+      }
+  free (me->args);
+  free (me);
+}
+
+/* Adds TOKEN to the collection of tokens in ME that potentially need to be
+   macro expanded.
+
+   Returns -1 if the tokens added do not actually invoke a macro.  The caller
+   should consume the first token without expanding it.
+
+   Returns 0 if the macro expander needs more tokens, for macro arguments or to
+   decide whether this is actually a macro invocation.  The caller should call
+   macro_expander_add() again with the next token.
+
+   Returns a positive number to indicate that the returned number of tokens
+   invoke a macro.  The number returned might be less than the number of tokens
+   added because it can take a few tokens of lookahead to determine whether the
+   macro invocation is finished.  The caller should call
+   macro_expander_get_expansion() to obtain the expansion. */
+int
+macro_expander_add (struct macro_expander *me, const struct macro_token *mt)
+{
+  switch (me->state)
+    {
+    case ME_ERROR:
+      return -1;
+
+    case ME_ARG:
+      return me_add_arg (me, mt);
+
+    case ME_ENCLOSE:
+      return me_enclose (me, mt);
+
+    case ME_KEYWORD:
+      return me_keyword (me, mt);
+
+    case ME_EQUALS:
+      return me_equals (me, mt);
+
+    default:
+      NOT_REACHED ();
+    }
+}
+
+/* Each argument to a macro function is one of:
+
+       - A quoted string or other single literal token.
+
+       - An argument to the macro being expanded, e.g. !1 or a named argument.
+
+       - !*.
+
+       - A function invocation.
+
+   Each function invocation yields a character sequence to be turned into a
+   sequence of tokens.  The case where that character sequence is a single
+   quoted string is an important special case.
+*/
+struct parse_macro_function_ctx
+  {
+    const struct macro_token *input;
+    size_t n_input;
+    int nesting_countdown;
+    const struct macro_set *macros;
+    const struct macro_expander *me;
+    struct string_map *vars;
+    bool *expand;
+  };
+
+static void
+macro_expand (const struct macro_tokens *,
+              int nesting_countdown, const struct macro_set *,
+              const struct macro_expander *, struct string_map *vars,
+              bool *expand, bool *break_, struct macro_tokens *exp);
+
+static bool
+expand_macro_function (struct parse_macro_function_ctx *ctx,
+                       struct string *output, size_t *input_consumed);
+
+/* Returns true if the pair of tokens starting at offset OFS within MTS are !*,
+   false otherwise. */
+static bool
+is_bang_star (const struct macro_token *mts, size_t n, size_t ofs)
+{
+  return (ofs + 1 < n
+          && mts[ofs].token.type == T_MACRO_ID
+          && ss_equals (mts[ofs].token.string, ss_cstr ("!"))
+          && mts[ofs + 1].token.type == T_ASTERISK);
+}
+
+static size_t
+parse_function_arg (struct parse_macro_function_ctx *ctx,
+                    size_t i, struct string *farg)
+{
+  const struct macro_token *tokens = ctx->input;
+  const struct token *token = &tokens[i].token;
+  if (token->type == T_MACRO_ID)
+    {
+      const struct macro_param *param = macro_find_parameter_by_name (
+        ctx->me->macro, token->string);
+      if (param)
+        {
+          size_t param_idx = param - ctx->me->macro->params;
+          const struct macro_tokens *marg = ctx->me->args[param_idx];
+          for (size_t i = 0; i < marg->n; i++)
+            {
+              if (i)
+                ds_put_byte (farg, ' ');
+              ds_put_substring (farg, marg->mts[i].representation);
+            }
+          return 1;
+        }
+
+      if (is_bang_star (ctx->input, ctx->n_input, i))
+        {
+          for (size_t i = 0; i < ctx->me->macro->n_params; i++)
+            {
+              if (!ctx->me->macro->params[i].positional)
+                break;
+
+              const struct macro_tokens *marg = ctx->me->args[i];
+              for (size_t j = 0; j < marg->n; j++)
+                {
+                  if (i || j)
+                    ds_put_byte (farg, ' ');
+                  ds_put_substring (farg, marg->mts[j].representation);
+                }
+            }
+          return 2;
+        }
+
+      if (ctx->vars)
+        {
+          const char *value = string_map_find__ (ctx->vars,
+                                                 token->string.string,
+                                                 token->string.length);
+          if (value)
+            {
+              ds_put_cstr (farg, value);
+              return 1;
+            }
+        }
+
+      struct parse_macro_function_ctx subctx = {
+        .input = &ctx->input[i],
+        .n_input = ctx->n_input - i,
+        .nesting_countdown = ctx->nesting_countdown,
+        .macros = ctx->macros,
+        .me = ctx->me,
+        .vars = ctx->vars,
+        .expand = ctx->expand,
+      };
+      size_t subinput_consumed;
+      if (expand_macro_function (&subctx, farg, &subinput_consumed))
+        return subinput_consumed;
+    }
+
+  ds_put_substring (farg, tokens[i].representation);
+  return 1;
+}
+
+static bool
+parse_macro_function (struct parse_macro_function_ctx *ctx,
+                      struct string_array *args,
+                      struct substring function,
+                      int min_args, int max_args,
+                      size_t *input_consumed)
+{
+  const struct macro_token *tokens = ctx->input;
+  size_t n_tokens = ctx->n_input;
+
+  if (!n_tokens
+      || tokens[0].token.type != T_MACRO_ID
+      || !ss_equals_case (tokens[0].token.string, function)) /* XXX abbrevs allowed */
+    return false;
+
+  if (n_tokens < 2 || tokens[1].token.type != T_LPAREN)
+    {
+      printf ("`(' expected following %s'\n", function.string);
+      return false;
+    }
+
+  string_array_init (args);
+
+  for (size_t i = 2;; )
+    {
+      if (i >= n_tokens)
+        goto unexpected_end;
+      if (tokens[i].token.type == T_RPAREN)
+        {
+          *input_consumed = i + 1;
+          if (args->n < min_args || args->n > max_args)
+            {
+              printf ("Wrong number of arguments to %s.\n", function.string);
+              goto error;
+            }
+          return true;
+        }
+
+      struct string s = DS_EMPTY_INITIALIZER;
+      i += parse_function_arg (ctx, i, &s);
+      if (i >= n_tokens)
+        {
+          ds_destroy (&s);
+          goto unexpected_end;
+        }
+      string_array_append_nocopy (args, ds_steal_cstr (&s));
+
+      if (tokens[i].token.type == T_COMMA)
+        i++;
+      else if (tokens[i].token.type != T_RPAREN)
+        {
+          printf ("Expecting `,' or `)' in %s invocation.", function.string);
+          goto error;
+        }
+    }
+
+unexpected_end:
+  printf ("Missing closing parenthesis in arguments to %s.\n",
+          function.string);
+  /* Fall through. */
+error:
+  string_array_destroy (args);
+  return false;
+}
+
+static bool
+unquote_string (const char *s, struct string *content)
+{
+  struct string_lexer slex;
+  string_lexer_init (&slex, s, strlen (s), SEG_MODE_INTERACTIVE /* XXX */,
+                     true);
+
+  struct token token1;
+  if (!string_lexer_next (&slex, &token1))
+    return false;
+
+  if (token1.type != T_STRING)
+    {
+      token_uninit (&token1);
+      return false;
+    }
+
+  struct token token2;
+  if (string_lexer_next (&slex, &token2))
+    {
+      token_uninit (&token1);
+      token_uninit (&token2);
+      return false;
+    }
+
+  ds_put_substring (content, token1.string);
+  token_uninit (&token1);
+  return true;
+}
+
+static const char *
+unquote_string_in_place (const char *s, struct string *tmp)
+{
+  ds_init_empty (tmp);
+  return unquote_string (s, tmp) ? ds_cstr (tmp) : s;
+}
+
+static bool
+parse_integer (const char *s, int *np)
+{
+  errno = 0;
+
+  char *tail;
+  long int n = strtol (s, &tail, 10);
+  *np = n < INT_MIN ? INT_MIN : n > INT_MAX ? INT_MAX : n;
+  tail += strspn (tail, CC_SPACES);
+  return *tail == '\0' && errno != ERANGE && n == *np;
+}
+
+static bool
+expand_macro_function (struct parse_macro_function_ctx *ctx,
+                       struct string *output,
+                       size_t *input_consumed)
+{
+  struct string_array args;
+
+  if (parse_macro_function (ctx, &args, ss_cstr ("!length"), 1, 1,
+                            input_consumed))
+    ds_put_format (output, "%zu", strlen (args.strings[0]));
+  else if (parse_macro_function (ctx, &args, ss_cstr ("!blanks"), 1, 1,
+                                 input_consumed))
+    {
+      int n;
+      if (!parse_integer (args.strings[0], &n))
+        {
+          printf ("argument to !BLANKS must be non-negative integer (not \"%s\")\n", args.strings[0]);
+          string_array_destroy (&args);
+          return false;
+        }
+
+      ds_put_byte_multiple (output, ' ', n);
+    }
+  else if (parse_macro_function (ctx, &args, ss_cstr ("!concat"), 1, INT_MAX,
+                                 input_consumed))
+    {
+      for (size_t i = 0; i < args.n; i++)
+        if (!unquote_string (args.strings[i], output))
+          ds_put_cstr (output, args.strings[i]);
+    }
+  else if (parse_macro_function (ctx, &args, ss_cstr ("!head"), 1, 1,
+                                 input_consumed))
+    {
+      struct string tmp;
+      const char *s = unquote_string_in_place (args.strings[0], &tmp);
+
+      struct macro_tokens mts = { .n = 0 };
+      macro_tokens_from_string (&mts, ss_cstr (s), SEG_MODE_INTERACTIVE /* XXX */);
+      if (mts.n > 0)
+        ds_put_substring (output, mts.mts[0].representation);
+      macro_tokens_uninit (&mts);
+      ds_destroy (&tmp);
+    }
+  else if (parse_macro_function (ctx, &args, ss_cstr ("!index"), 2, 2,
+                                 input_consumed))
+    {
+      const char *haystack = args.strings[0];
+      const char *needle = strstr (haystack, args.strings[1]);
+      ds_put_format (output, "%zu", needle ? needle - haystack + 1 : 0);
+    }
+  else if (parse_macro_function (ctx, &args, ss_cstr ("!quote"), 1, 1,
+                                 input_consumed))
+    {
+      if (unquote_string (args.strings[0], NULL))
+        ds_put_cstr (output, args.strings[0]);
+      else
+        {
+          ds_extend (output, strlen (args.strings[0]) + 2);
+          ds_put_byte (output, '\'');
+          for (const char *p = args.strings[0]; *p; p++)
+            {
+              if (*p == '\'')
+                ds_put_byte (output, '\'');
+              ds_put_byte (output, *p);
+            }
+          ds_put_byte (output, '\'');
+        }
+    }
+  else if (parse_macro_function (ctx, &args, ss_cstr ("!substr"), 2, 3,
+                                 input_consumed))
+    {
+      int start;
+      if (!parse_integer (args.strings[1], &start) || start < 1)
+        {
+          printf ("second argument to !SUBSTR must be positive integer (not \"%s\")\n", args.strings[1]);
+          string_array_destroy (&args);
+          return false;
+        }
+
+      int count = INT_MAX;
+      if (args.n > 2 && (!parse_integer (args.strings[2], &count) || count < 0))
+        {
+          printf ("third argument to !SUBSTR must be non-negative integer (not \"%s\")\n", args.strings[1]);
+          string_array_destroy (&args);
+          return false;
+        }
+
+      struct substring s = ss_cstr (args.strings[0]);
+      ds_put_substring (output, ss_substr (s, start - 1, count));
+    }
+  else if (parse_macro_function (ctx, &args, ss_cstr ("!tail"), 1, 1,
+                                 input_consumed))
+    {
+      struct string tmp;
+      const char *s = unquote_string_in_place (args.strings[0], &tmp);
+
+      struct macro_tokens mts = { .n = 0 };
+      macro_tokens_from_string (&mts, ss_cstr (s), SEG_MODE_INTERACTIVE /* XXX */);
+      if (mts.n > 1)
+        {
+          struct macro_tokens tail = { .mts = mts.mts + 1, .n = mts.n - 1 };
+          macro_tokens_to_representation (&tail, output, NULL, NULL);
+        }
+      macro_tokens_uninit (&mts);
+      ds_destroy (&tmp);
+    }
+  else if (parse_macro_function (ctx, &args, ss_cstr ("!unquote"), 1, 1,
+                                 input_consumed))
+    {
+      if (!unquote_string (args.strings[0], output))
+        ds_put_cstr (output, args.strings[0]);
+    }
+  else if (parse_macro_function (ctx, &args, ss_cstr ("!upcase"), 1, 1,
+                                 input_consumed))
+    {
+      struct string tmp;
+      const char *s = unquote_string_in_place (args.strings[0], &tmp);
+      char *upper = utf8_to_upper (s);
+      ds_put_cstr (output, upper);
+      free (upper);
+      ds_destroy (&tmp);
+    }
+  else if (parse_macro_function (ctx, &args, ss_cstr ("!eval"), 1, 1,
+                                 input_consumed))
+    {
+      struct macro_tokens mts = { .n = 0 };
+      macro_tokens_from_string (&mts, ss_cstr (args.strings[0]),
+                                SEG_MODE_INTERACTIVE /* XXX */);
+      struct macro_tokens exp = { .n = 0 };
+      macro_expand (&mts, ctx->nesting_countdown - 1, ctx->macros, ctx->me,
+                    ctx->vars, ctx->expand, NULL, &exp);
+      macro_tokens_to_representation (&exp, output, NULL, NULL);
+      macro_tokens_uninit (&exp);
+      macro_tokens_uninit (&mts);
+    }
+  else if (ctx->n_input > 0
+           && ctx->input[0].token.type == T_MACRO_ID
+           && ss_equals_case (ctx->input[0].token.string, ss_cstr ("!null")))
+    {
+      *input_consumed = 1;
+      return true;
+    }
+  else
+    return false;
+
+  string_array_destroy (&args);
+  return true;
+}
+
+struct expr_context
+  {
+    int nesting_countdown;
+    const struct macro_set *macros;
+    const struct macro_expander *me;
+    struct string_map *vars;
+    bool *expand;
+  };
+
+static char *macro_evaluate_or (const struct expr_context *ctx,
+                                const struct macro_token **tokens,
+                                const struct macro_token *end);
+
+static char *
+macro_evaluate_literal (const struct expr_context *ctx,
+                        const struct macro_token **tokens,
+                        const struct macro_token *end)
+{
+  const struct macro_token *p = *tokens;
+  if (p >= end)
+    return NULL;
+  if (p->token.type == T_LPAREN)
+    {
+      p++;
+      char *value = macro_evaluate_or (ctx, &p, end);
+      if (!value)
+        return NULL;
+      if (p >= end || p->token.type != T_RPAREN)
+        {
+          free (value);
+          printf ("expecting ')' in macro expression\n");
+          return NULL;
+        }
+      p++;
+      *tokens = p;
+      return value;
+    }
+
+  struct parse_macro_function_ctx fctx = {
+    .input = p,
+    .n_input = end - p,
+    .nesting_countdown = ctx->nesting_countdown,
+    .macros = ctx->macros,
+    .me = ctx->me,
+    .vars = ctx->vars,
+    .expand = ctx->expand,
+  };
+  struct string function_output = DS_EMPTY_INITIALIZER;
+  size_t function_consumed = parse_function_arg (&fctx, 0, &function_output);
+  struct string unquoted = DS_EMPTY_INITIALIZER;
+  if (unquote_string (ds_cstr (&function_output), &unquoted))
+    {
+      ds_swap (&function_output, &unquoted);
+      ds_destroy (&unquoted);
+    }
+  *tokens = p + function_consumed;
+  return ds_steal_cstr (&function_output);
+}
+
+/* Returns true if MT is valid as a macro operator.  Only operators written as
+   symbols (e.g. <>) are usable in macro expressions, not operator written as
+   letters (e.g. EQ). */
+static bool
+is_macro_operator (const struct macro_token *mt)
+{
+  return (mt->representation.length > 0
+          && !c_isalpha (mt->representation.string[0]));
+}
+
+static enum token_type
+parse_relational_op (const struct macro_token *mt)
+{
+  switch (mt->token.type)
+    {
+    case T_EQUALS:
+      return T_EQ;
+
+    case T_NE:
+    case T_LT:
+    case T_GT:
+    case T_LE:
+    case T_GE:
+      return is_macro_operator (mt) ? mt->token.type : T_STOP;
+
+    case T_MACRO_ID:
+      return (ss_equals_case (mt->token.string, ss_cstr ("!EQ")) ? T_EQ
+              : ss_equals_case (mt->token.string, ss_cstr ("!NE")) ? T_NE
+              : ss_equals_case (mt->token.string, ss_cstr ("!LT")) ? T_LT
+              : ss_equals_case (mt->token.string, ss_cstr ("!GT")) ? T_GT
+              : ss_equals_case (mt->token.string, ss_cstr ("!LE")) ? T_LE
+              : ss_equals_case (mt->token.string, ss_cstr ("!GE")) ? T_GE
+              : T_STOP);
+
+    default:
+      return T_STOP;
+    }
+}
+
+static char *
+macro_evaluate_relational (const struct expr_context *ctx,
+                           const struct macro_token **tokens,
+                           const struct macro_token *end)
+{
+  const struct macro_token *p = *tokens;
+  char *lhs = macro_evaluate_literal (ctx, &p, end);
+  if (!lhs)
+    return NULL;
+
+  enum token_type op = p >= end ? T_STOP : parse_relational_op (p);
+  if (op == T_STOP)
+    {
+      *tokens = p;
+      return lhs;
+    }
+  p++;
+
+  char *rhs = macro_evaluate_literal (ctx, &p, end);
+  if (!rhs)
+    {
+      free (lhs);
+      return NULL;
+    }
+
+  struct string lhs_tmp, rhs_tmp;
+  int cmp = strcmp/*XXX*/ (unquote_string_in_place (lhs, &lhs_tmp),
+                           unquote_string_in_place (rhs, &rhs_tmp));
+  ds_destroy (&lhs_tmp);
+  ds_destroy (&rhs_tmp);
+
+  free (lhs);
+  free (rhs);
+
+  bool b = (op == T_EQUALS || op == T_EQ ? !cmp
+            : op == T_NE ? cmp
+            : op == T_LT ? cmp < 0
+            : op == T_GT ? cmp > 0
+            : op == T_LE ? cmp <= 0
+            :/*op == T_GE*/cmp >= 0);
+
+  *tokens = p;
+  return xstrdup (b ? "1" : "0");
+}
+
+static char *
+macro_evaluate_not (const struct expr_context *ctx,
+                    const struct macro_token **tokens,
+                    const struct macro_token *end)
+{
+  const struct macro_token *p = *tokens;
+
+  unsigned int negations = 0;
+  while (p < end
+         && (ss_equals_case (p->representation, ss_cstr ("!NOT"))
+             || ss_equals (p->representation, ss_cstr ("~"))))
+    {
+      p++;
+      negations++;
+    }
+
+  char *operand = macro_evaluate_relational (ctx, &p, end);
+  if (!operand || !negations)
+    {
+      *tokens = p;
+      return operand;
+    }
+
+  bool b = strcmp (operand, "0") ^ (negations & 1);
+  free (operand);
+  *tokens = p;
+  return xstrdup (b ? "1" : "0");
+}
+
+static char *
+macro_evaluate_and (const struct expr_context *ctx,
+                    const struct macro_token **tokens,
+                    const struct macro_token *end)
+{
+  const struct macro_token *p = *tokens;
+  char *lhs = macro_evaluate_not (ctx, &p, end);
+  if (!lhs)
+    return NULL;
+
+  while (p < end
+         && (ss_equals_case (p->representation, ss_cstr ("!AND"))
+             || ss_equals (p->representation, ss_cstr ("&"))))
+    {
+      p++;
+      char *rhs = macro_evaluate_not (ctx, &p, end);
+      if (!rhs)
+        {
+          free (lhs);
+          return NULL;
+        }
+
+      bool b = strcmp (lhs, "0") && strcmp (rhs, "0");
+      free (lhs);
+      free (rhs);
+      lhs = xstrdup (b ? "1" : "0");
+    }
+  *tokens = p;
+  return lhs;
+}
+
+static char *
+macro_evaluate_or (const struct expr_context *ctx,
+                   const struct macro_token **tokens,
+                   const struct macro_token *end)
+{
+  const struct macro_token *p = *tokens;
+  char *lhs = macro_evaluate_and (ctx, &p, end);
+  if (!lhs)
+    return NULL;
+
+  while (p < end
+         && (ss_equals_case (p->representation, ss_cstr ("!OR"))
+             || ss_equals (p->representation, ss_cstr ("|"))))
+    {
+      p++;
+      char *rhs = macro_evaluate_and (ctx, &p, end);
+      if (!rhs)
+        {
+          free (lhs);
+          return NULL;
+        }
+
+      bool b = strcmp (lhs, "0") || strcmp (rhs, "0");
+      free (lhs);
+      free (rhs);
+      lhs = xstrdup (b ? "1" : "0");
+    }
+  *tokens = p;
+  return lhs;
+}
+
+static char *
+macro_evaluate_expression (const struct macro_token **tokens, size_t n_tokens,
+                           int nesting_countdown, const struct macro_set *macros,
+                           const struct macro_expander *me, struct string_map *vars,
+                           bool *expand)
+{
+  const struct expr_context ctx = {
+    .nesting_countdown = nesting_countdown,
+    .macros = macros,
+    .me = me,
+    .vars = vars,
+    .expand = expand,
+  };
+  return macro_evaluate_or (&ctx, tokens, *tokens + n_tokens);
+}
+
+static bool
+macro_evaluate_number (const struct macro_token **tokens, size_t n_tokens,
+                       int nesting_countdown, const struct macro_set *macros,
+                       const struct macro_expander *me, struct string_map *vars,
+                       bool *expand, double *number)
+{
+  char *s = macro_evaluate_expression (tokens, n_tokens, nesting_countdown,
+                                       macros, me, vars, expand);
+  if (!s)
+    return false;
+
+  struct macro_tokens mts = { .n = 0 };
+  macro_tokens_from_string (&mts, ss_cstr (s), SEG_MODE_INTERACTIVE /* XXX */);
+  if (mts.n != 1 || !token_is_number (&mts.mts[0].token))
+    {
+      macro_tokens_print (&mts, stdout);
+      printf ("expression must evaluate to a number (not %s)\n", s);
+      free (s);
+      macro_tokens_uninit (&mts);
+      return false;
+    }
+
+  *number = token_number (&mts.mts[0].token);
+  free (s);
+  macro_tokens_uninit (&mts);
+  return true;
+}
+
+static const struct macro_token *
+find_ifend_clause (const struct macro_token *p, const struct macro_token *end)
+{
+  size_t nesting = 0;
+  for (; p < end; p++)
+    {
+      if (p->token.type != T_MACRO_ID)
+        continue;
+
+      if (ss_equals_case (p->token.string, ss_cstr ("!IF")))
+        nesting++;
+      else if (ss_equals_case (p->token.string, ss_cstr ("!IFEND")))
+        {
+          if (!nesting)
+            return p;
+          nesting--;
+        }
+      else if (ss_equals_case (p->token.string, ss_cstr ("!ELSE")) && !nesting)
+        return p;
+    }
+  return NULL;
+}
+
+static size_t
+macro_expand_if (const struct macro_token *tokens, size_t n_tokens,
+                 int nesting_countdown, const struct macro_set *macros,
+                 const struct macro_expander *me, struct string_map *vars,
+                 bool *expand, bool *break_, struct macro_tokens *exp)
+{
+  const struct macro_token *p = tokens;
+  const struct macro_token *end = tokens + n_tokens;
+
+  if (p >= end || !ss_equals_case (p->token.string, ss_cstr ("!IF")))
+    return 0;
+
+  p++;
+  char *result = macro_evaluate_expression (&p, end - p,
+                                            nesting_countdown, macros, me, vars,
+                                            expand);
+  if (!result)
+    return 0;
+  bool b = strcmp (result, "0");
+  free (result);
+
+  if (p >= end
+      || p->token.type != T_MACRO_ID
+      || !ss_equals_case (p->token.string, ss_cstr ("!THEN")))
+    {
+      printf ("!THEN expected\n");
+      return 0;
+    }
+
+  const struct macro_token *start_then = p + 1;
+  const struct macro_token *end_then = find_ifend_clause (start_then, end);
+  if (!end_then)
+    {
+      printf ("!ELSE or !IFEND expected\n");
+      return 0;
+    }
+
+  const struct macro_token *start_else, *end_if;
+  if (ss_equals_case (end_then->token.string, ss_cstr ("!ELSE")))
+    {
+      start_else = end_then + 1;
+      end_if = find_ifend_clause (start_else, end);
+      if (!end_if
+          || !ss_equals_case (end_if->token.string, ss_cstr ("!IFEND")))
+        {
+          printf ("!IFEND expected\n");
+          return 0;
+        }
+    }
+  else
+    {
+      start_else = NULL;
+      end_if = end_then;
+    }
+
+  const struct macro_token *start;
+  size_t n;
+  if (b)
+    {
+      start = start_then;
+      n = end_then - start_then;
+    }
+  else if (start_else)
+    {
+      start = start_else;
+      n = end_if - start_else;
+    }
+  else
+    {
+      start = NULL;
+      n = 0;
+    }
+
+  if (n)
+    {
+      struct macro_tokens mts = {
+        .mts = CONST_CAST (struct macro_token *, start),
+        .n = n,
+      };
+      macro_expand (&mts, nesting_countdown, macros, me, vars, expand,
+                    break_, exp);
+    }
+  return (end_if + 1) - tokens;
+}
+
+static size_t
+macro_parse_let (const struct macro_token *tokens, size_t n_tokens,
+                 int nesting_countdown, const struct macro_set *macros,
+                 const struct macro_expander *me, struct string_map *vars,
+                 bool *expand)
+{
+  const struct macro_token *p = tokens;
+  const struct macro_token *end = tokens + n_tokens;
+
+  if (p >= end || !ss_equals_case (p->token.string, ss_cstr ("!LET")))
+    return 0;
+  p++;
+
+  if (p >= end || p->token.type != T_MACRO_ID)
+    {
+      printf ("expected macro variable name following !LET\n");
+      return 0;
+    }
+  const struct substring var_name = p->token.string;
+  if (is_macro_keyword (var_name)
+      || macro_find_parameter_by_name (me->macro, var_name))
+    {
+      printf ("cannot use argument name or macro keyword as !LET variable\n");
+      return 0;
+    }
+  p++;
+
+  if (p >= end || p->token.type != T_EQUALS)
+    {
+      printf ("expected = following !LET\n");
+      return 0;
+    }
+  p++;
+
+  char *value = macro_evaluate_expression (&p, end - p,
+                                           nesting_countdown, macros, me, vars,
+                                           expand);
+  if (!value)
+    return 0;
+
+  string_map_replace_nocopy (vars, ss_xstrdup (var_name), value);
+  return p - tokens;
+}
+
+static const struct macro_token *
+find_doend (const struct macro_token *p, const struct macro_token *end)
+{
+  size_t nesting = 0;
+  for (; p < end; p++)
+    {
+      if (p->token.type != T_MACRO_ID)
+        continue;
+
+      if (ss_equals_case (p->token.string, ss_cstr ("!DO")))
+        nesting++;
+      else if (ss_equals_case (p->token.string, ss_cstr ("!DOEND")))
+        {
+          if (!nesting)
+            return p;
+          nesting--;
+        }
+    }
+  printf ("missing !DOEND\n");
+  return NULL;
+}
+
+static size_t
+macro_expand_do (const struct macro_token *tokens, size_t n_tokens,
+                 int nesting_countdown, const struct macro_set *macros,
+                 const struct macro_expander *me, struct string_map *vars,
+                 bool *expand, struct macro_tokens *exp)
+{
+  const struct macro_token *p = tokens;
+  const struct macro_token *end = tokens + n_tokens;
+
+  if (p >= end || !ss_equals_case (p->token.string, ss_cstr ("!DO")))
+    return 0;
+  p++;
+
+  if (p >= end || p->token.type != T_MACRO_ID)
+    {
+      printf ("expected macro variable name following !DO\n");
+      return 0;
+    }
+  const struct substring var_name = p->token.string;
+  if (is_macro_keyword (var_name)
+      || macro_find_parameter_by_name (me->macro, var_name))
+    {
+      printf ("cannot use argument name or macro keyword as !DO variable\n");
+      return 0;
+    }
+  p++;
+
+  int miterate = settings_get_miterate ();
+  if (p < end && p->token.type == T_MACRO_ID
+      && ss_equals_case (p->token.string, ss_cstr ("!IN")))
+    {
+      p++;
+      char *list = macro_evaluate_expression (&p, end - p,
+                                              nesting_countdown, macros, me, vars,
+                                              expand);
+      if (!list)
+        return 0;
+
+      struct macro_tokens items = { .n = 0 };
+      macro_tokens_from_string (&items, ss_cstr (list),
+                                SEG_MODE_INTERACTIVE /* XXX */);
+      free (list);
+
+      const struct macro_token *do_end = find_doend (p, end);
+      if (!do_end)
+        {
+          macro_tokens_uninit (&items);
+          return 0;
+        }
+
+      const struct macro_tokens inner = {
+        .mts = CONST_CAST (struct macro_token *, p),
+        .n = do_end - p
+      };
+      for (size_t i = 0; i < items.n; i++)
+        {
+          if (i >= miterate)
+            {
+              printf ("exceeded maximum number of iterations %d\n", miterate);
+              break;
+            }
+          string_map_replace_nocopy (vars, ss_xstrdup (var_name),
+                                     ss_xstrdup (items.mts[i].representation));
+
+          bool break_ = false;
+          macro_expand (&inner, nesting_countdown, macros,
+                        me, vars, expand, &break_, exp);
+          if (break_)
+            break;
+        }
+      return do_end - tokens + 1;
+    }
+  else if (p < end && p->token.type == T_EQUALS)
+    {
+      p++;
+      double first;
+      if (!macro_evaluate_number (&p, end - p, nesting_countdown, macros, me,
+                                  vars, expand, &first))
+        return 0;
+
+      if (p >= end || p->token.type != T_MACRO_ID
+          || !ss_equals_case (p->token.string, ss_cstr ("!TO")))
+        {
+          printf ("expecting !TO\n");
+          return 0;
+        }
+      p++;
+
+      double last;
+      if (!macro_evaluate_number (&p, end - p, nesting_countdown, macros, me,
+                                  vars, expand, &last))
+        return 0;
+
+      double by = 1.0;
+      if (p < end && p->token.type == T_MACRO_ID
+          && ss_equals_case (p->token.string, ss_cstr ("!BY")))
+        {
+          p++;
+          if (!macro_evaluate_number (&p, end - p, nesting_countdown, macros, me,
+                                      vars, expand, &by))
+            return 0;
+
+          if (by == 0.0)
+            {
+              printf ("!BY value cannot be zero\n");
+              return 0;
+            }
+        }
+
+      const struct macro_token *do_end = find_doend (p, end);
+      if (!do_end)
+        return 0;
+      const struct macro_tokens inner = {
+        .mts = CONST_CAST (struct macro_token *, p),
+        .n = do_end - p
+      };
+
+      if ((by > 0 && first <= last) || (by < 0 && first >= last))
+        {
+          int i = 0;
+          for (double index = first;
+               by > 0 ? (index <= last) : (index >= last);
+               index += by)
+            {
+              if (i++ > miterate)
+                {
+                  printf ("exceeded maximum number of iterations %d\n",
+                          miterate);
+                  break;
+                }
+
+              char index_s[DBL_BUFSIZE_BOUND];
+              c_dtoastr (index_s, sizeof index_s, 0, 0, index);
+              string_map_replace_nocopy (vars, ss_xstrdup (var_name),
+                                         xstrdup (index_s));
+
+              bool break_ = false;
+              macro_expand (&inner, nesting_countdown, macros,
+                            me, vars, expand, &break_, exp);
+              if (break_)
+                break;
+            }
+        }
+
+      return do_end - tokens + 1;
+    }
+  else
+    {
+      printf ("expecting = or !IN in !DO loop\n");
+      return 0;
+    }
+}
+
+static void
+macro_expand (const struct macro_tokens *mts,
+              int nesting_countdown, const struct macro_set *macros,
+              const struct macro_expander *me, struct string_map *vars,
+              bool *expand, bool *break_, struct macro_tokens *exp)
+{
+  if (nesting_countdown <= 0)
+    {
+      printf ("maximum nesting level exceeded\n");
+      for (size_t i = 0; i < mts->n; i++)
+        macro_tokens_add (exp, &mts->mts[i]);
+      return;
+    }
+
+  struct string_map own_vars = STRING_MAP_INITIALIZER (own_vars);
+  if (!vars)
+    vars = &own_vars;
+
+  for (size_t i = 0; i < mts->n && (!break_ || !*break_); i++)
+    {
+      const struct macro_token *mt = &mts->mts[i];
+      const struct token *token = &mt->token;
+      if (token->type == T_MACRO_ID && me)
+        {
+          const struct macro_param *param = macro_find_parameter_by_name (
+            me->macro, token->string);
+          if (param)
+            {
+              const struct macro_tokens *arg = me->args[param - me->macro->params];
+              //macro_tokens_print (arg, stdout);
+              if (*expand && param->expand_arg)
+                macro_expand (arg, nesting_countdown, macros, NULL, NULL,
+                              expand, break_, exp);
+              else
+                for (size_t i = 0; i < arg->n; i++)
+                  macro_tokens_add (exp, &arg->mts[i]);
+              continue;
+            }
+
+          if (is_bang_star (mts->mts, mts->n, i))
+            {
+              for (size_t j = 0; j < me->macro->n_params; j++)
+                {
+                  const struct macro_param *param = &me->macro->params[j];
+                  if (!param->positional)
+                    break;
+
+                  const struct macro_tokens *arg = me->args[j];
+                  if (*expand && param->expand_arg)
+                    macro_expand (arg, nesting_countdown, macros, NULL, NULL,
+                                  expand, break_, exp);
+                  else
+                    for (size_t k = 0; k < arg->n; k++)
+                      macro_tokens_add (exp, &arg->mts[k]);
+                }
+              i++;
+              continue;
+            }
+
+          size_t n = macro_expand_if (&mts->mts[i], mts->n - i,
+                                      nesting_countdown, macros, me, vars,
+                                      expand, break_, exp);
+          if (n > 0)
+            {
+              i += n - 1;
+              continue;
+            }
+        }
+
+      if (token->type == T_MACRO_ID && vars)
+        {
+          const char *value = string_map_find__ (vars, token->string.string,
+                                                 token->string.length);
+          if (value)
+            {
+              macro_tokens_from_string (exp, ss_cstr (value),
+                                        SEG_MODE_INTERACTIVE /* XXX */);
+              continue;
+            }
+        }
+
+      if (*expand)
+        {
+          struct macro_expander *subme;
+          int retval = macro_expander_create (macros, token, &subme);
+          for (size_t j = 1; !retval; j++)
+            {
+              const struct macro_token endcmd = { .token = { .type = T_ENDCMD } };
+              retval = macro_expander_add (
+                subme, i + j < mts->n ? &mts->mts[i + j] : &endcmd);
+            }
+          if (retval > 0)
+            {
+              i += retval - 1;
+              macro_expand (&subme->macro->body, nesting_countdown - 1, macros,
+                            subme, NULL, expand, break_, exp);
+              macro_expander_destroy (subme);
+              continue;
+            }
+
+          macro_expander_destroy (subme);
+        }
+
+      if (token->type != T_MACRO_ID)
+        {
+          macro_tokens_add (exp, mt);
+          continue;
+        }
+
+      if (ss_equals_case (token->string, ss_cstr ("!break")))
+        {
+          if (!break_)
+            printf ("!BREAK outside !DO\n");
+          else
+            {
+              *break_ = true;
+              break;
+            }
+        }
+
+      struct parse_macro_function_ctx ctx = {
+        .input = &mts->mts[i],
+        .n_input = mts->n - i,
+        .nesting_countdown = nesting_countdown,
+        .macros = macros,
+        .me = me,
+        .vars = vars,
+        .expand = expand,
+      };
+      struct string function_output = DS_EMPTY_INITIALIZER;
+      size_t function_consumed;
+      if (expand_macro_function (&ctx, &function_output, &function_consumed))
+        {
+          i += function_consumed - 1;
+
+          macro_tokens_from_string (exp, function_output.ss,
+                                    SEG_MODE_INTERACTIVE /* XXX */);
+          ds_destroy (&function_output);
+
+          continue;
+        }
+
+      size_t n = macro_parse_let (&mts->mts[i], mts->n - i,
+                                  nesting_countdown, macros, me, vars,
+                                  expand);
+      if (n > 0)
+        {
+          i += n - 1;
+          continue;
+        }
+
+      n = macro_expand_do (&mts->mts[i], mts->n - i,
+                           nesting_countdown, macros, me, vars,
+                           expand, exp);
+      if (n > 0)
+        {
+          i += n - 1;
+          continue;
+        }
+
+      if (ss_equals_case (token->string, ss_cstr ("!onexpand")))
+        *expand = true;
+      else if (ss_equals_case (token->string, ss_cstr ("!offexpand")))
+        *expand = false;
+      else
+        macro_tokens_add (exp, mt);
+    }
+  if (vars == &own_vars)
+    string_map_destroy (&own_vars);
+}
+
+void
+macro_expander_get_expansion (struct macro_expander *me, struct macro_tokens *exp)
+{
+#if 0
+  for (size_t i = 0; i < me->macro->n_params; i++)
+    {
+      printf ("%s:\n", me->macro->params[i].name);
+      macro_tokens_print (me->args[i], stdout);
+    }
+#endif
+
+  bool expand = true;
+  macro_expand (&me->macro->body, settings_get_mnest (),
+                me->macros, me, NULL, &expand, NULL, exp);
+
+#if 0
+  printf ("expansion:\n");
+  macro_tokens_print (exp, stdout);
+#endif
+}
+
diff --git a/src/language/lexer/macro.h b/src/language/lexer/macro.h

new file mode 100644 (file)

index 0000000..c10ce8e
--- /dev/null
+++ b/src/language/lexer/macro.h
@@ -0,0 +1,127 @@
+/* PSPP - a program for statistical analysis.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation, either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>. */
+
+#ifndef MACRO_H
+#define MACRO_H 1
+
+#include <stdbool.h>
+#include <stddef.h>
+
+#include "libpspp/hmap.h"
+#include "libpspp/str.h"
+#include "language/lexer/segment.h"
+#include "language/lexer/token.h"
+
+struct macro_expander;
+
+struct macro_token
+  {
+    struct token token;
+    struct substring representation;
+  };
+
+void macro_token_copy (struct macro_token *, const struct macro_token *);
+void macro_token_uninit (struct macro_token *);
+
+void macro_token_to_representation (struct macro_token *, struct string *);
+
+bool is_macro_keyword (struct substring);
+
+struct macro_tokens
+  {
+    struct macro_token *mts;
+    size_t n;
+    size_t allocated;
+  };
+
+void macro_tokens_copy (struct macro_tokens *, const struct macro_tokens *);
+void macro_tokens_uninit (struct macro_tokens *);
+struct macro_token *macro_tokens_add_uninit (struct macro_tokens *);
+void macro_tokens_add (struct macro_tokens *, const struct macro_token *);
+
+void macro_tokens_from_string (struct macro_tokens *, const struct substring,
+                               enum segmenter_mode);
+
+void macro_tokens_to_representation (struct macro_tokens *, struct string *,
+                                     size_t *ofs, size_t *len);
+
+void macro_tokens_print (const struct macro_tokens *, FILE *);
+
+struct macro_param
+  {
+    bool positional;            /* Is this a positional parameter? */
+    char *name;                 /* "!1" or "!name". */
+    struct macro_tokens def;    /* Default expansion. */
+    bool expand_arg;            /* Macro-expand the argument? */
+
+    enum
+      {
+        ARG_N_TOKENS,
+        ARG_CHAREND,
+        ARG_ENCLOSE,
+        ARG_CMDEND
+      }
+    arg_type;
+    union
+      {
+        int n_tokens;
+        struct token charend;
+        struct token enclose[2];
+      };
+  };
+
+struct macro
+  {
+    struct hmap_node hmap_node; /* Indexed by 'name'. */
+    char *name;
+
+    struct macro_param *params;
+    size_t n_params;
+
+    struct macro_tokens body;
+  };
+
+void macro_destroy (struct macro *);
+
+struct macro_set
+  {
+    struct hmap macros;
+  };
+
+struct macro_set *macro_set_create (void);
+void macro_set_destroy (struct macro_set *);
+const struct macro *macro_set_find (const struct macro_set *,
+                                    const char *);
+void macro_set_add (struct macro_set *, struct macro *);
+
+static inline bool
+macro_set_is_empty (const struct macro_set *set)
+{
+  return hmap_is_empty (&set->macros);
+}
+\f
+/* Macro expansion. */
+
+int macro_expander_create (const struct macro_set *,
+                           const struct token *,
+                           struct macro_expander **);
+void macro_expander_destroy (struct macro_expander *);
+
+int macro_expander_add (struct macro_expander *, const struct macro_token *);
+
+void macro_expander_get_expansion (struct macro_expander *, struct macro_tokens *);
+
+#endif /* macro.h */
diff --git a/src/language/lexer/scan.c b/src/language/lexer/scan.c

index 86ebb7d00675cd6c89d223d49b1924cb278473e9..7aa01593f68be02dc424e1950800607e70ea0948 100644 (file)
--- a/src/language/lexer/scan.c
+++ b/src/language/lexer/scan.c
@@ -548,7 +548,7 @@ void
  scanner_init (struct scanner *scanner, struct token *token)
  {
    scanner->state = S_START;
-  token_init (token);
+  *token = (struct token) { .type = T_STOP };
  }
  
  /* Adds the segment with type TYPE and UTF-8 text S to SCANNER.  TOKEN must be
@@ -605,12 +605,14 @@ scanner_push (struct scanner *scanner, enum segment_type type,
     INPUT must not be modified or freed while SLEX is still in use. */
  void
  string_lexer_init (struct string_lexer *slex, const char *input, size_t length,
-                   enum segmenter_mode mode)
+                   enum segmenter_mode mode, bool is_snippet)
  {
-  slex->input = input;
-  slex->length = length;
-  slex->offset = 0;
-  segmenter_init (&slex->segmenter, mode);
+  *slex = (struct string_lexer) {
+    .input = input,
+    .length = length,
+    .offset = 0,
+    .segmenter = segmenter_init (mode, is_snippet),
+  };
  }
  
  /*  */
diff --git a/src/language/lexer/scan.h b/src/language/lexer/scan.h

index 866321b0c84c5c61e8dec03b29fc3901554db3b6..1c0ff7a1e5477286ff1d1256a4e6c5823877fba0 100644 (file)
--- a/src/language/lexer/scan.h
+++ b/src/language/lexer/scan.h
@@ -100,7 +100,7 @@ struct string_lexer
    };
  
  void string_lexer_init (struct string_lexer *, const char *input,
-                        size_t length, enum segmenter_mode);
+                        size_t length, enum segmenter_mode, bool is_snippet);
  bool string_lexer_next (struct string_lexer *, struct token *);
  
  #endif /* scan.h */
diff --git a/src/language/lexer/segment.c b/src/language/lexer/segment.c

index 35240b4c64c18dbf22a7b5d0d8bc8b52d382eb2f..6bb602bbb99d31351dcc3733c8745315f4a7f3df 100644 (file)
--- a/src/language/lexer/segment.c
+++ b/src/language/lexer/segment.c
@@ -28,6 +28,7 @@
  
  #include "gl/c-ctype.h"
  #include "gl/c-strcase.h"
+#include "gl/verify.h"
  
  enum segmenter_state
    {
@@ -1786,17 +1787,28 @@ segment_type_to_string (enum segment_type type)
      }
  }
  
-/* Initializes S as a segmenter with the given syntax MODE.
+/* Returns a segmenter with the given syntax MODE.
+
+   If IS_SNIPPET is false, then the segmenter will parse as if it's being given
+   a whole file.  This means, for example, that it will interpret - or + at the
+   beginning of the syntax as a separator between commands (since - or + at the
+   beginning of a line has this meaning).
+
+   If IS_SNIPPET is true, then the segmenter will parse as if it's being given
+   an isolated piece of syntax.  This means that, for example, that it will
+   interpret - or + at the beginning of the syntax as an operator token or (if
+   followed by a digit) as part of a number.
  
     A segmenter does not contain any external references, so nothing needs to be
     done to destroy one.  For the same reason, segmenters may be copied with
     plain struct assignment (or memcpy). */
-void
-segmenter_init (struct segmenter *s, enum segmenter_mode mode)
+struct segmenter
+segmenter_init (enum segmenter_mode mode, bool is_snippet)
  {
-  s->state = S_SHBANG;
-  s->substate = 0;
-  s->mode = mode;
+  return (struct segmenter) {
+    .state = is_snippet ? S_GENERAL : S_SHBANG,
+    .mode = mode,
+  };
  }
  
  /* Returns the mode passed to segmenter_init() for S. */
diff --git a/src/language/lexer/segment.h b/src/language/lexer/segment.h

index 02a269bdd2779b53a0f0bddd00e2641ddaf184b9..5d550f531fc27c15fe71527fb01807cc41e3a6f4 100644 (file)
--- a/src/language/lexer/segment.h
+++ b/src/language/lexer/segment.h
@@ -117,7 +117,7 @@ struct segmenter
      unsigned char mode;
    };
  
-void segmenter_init (struct segmenter *, enum segmenter_mode);
+struct segmenter segmenter_init (enum segmenter_mode, bool is_snippet);
  
  enum segmenter_mode segmenter_get_mode (const struct segmenter *);
  
diff --git a/src/language/lexer/token.c b/src/language/lexer/token.c

index ec64bbfb4a431cfcf0044cf0dd9a05b31755569f..61f576ed985b934b43789e2a16912e28f4b595fa 100644 (file)
--- a/src/language/lexer/token.c
+++ b/src/language/lexer/token.c
@@ -27,17 +27,17 @@
  #include "libpspp/cast.h"
  #include "libpspp/misc.h"
  
-
  #include "gl/ftoastr.h"
  #include "gl/xalloc.h"
  
-/* Initializes TOKEN with an arbitrary type, number 0, and a null string. */
  void
-token_init (struct token *token)
+token_copy (struct token *dst, const struct token *src)
  {
-  token->type = 0;
-  token->number = 0.0;
-  token->string = ss_empty ();
+  *dst = (struct token) {
+    .type = src->type,
+    .number = src->number,
+  };
+  ss_alloc_substring (&dst->string, src->string);
  }
  
  /* Frees the string that TOKEN contains. */
@@ -45,7 +45,33 @@ void
  token_uninit (struct token *token)
  {
    if (token != NULL)
-    ss_dealloc (&token->string);
+    {
+      ss_dealloc (&token->string);
+      *token = (struct token) { .type = T_STOP };
+    }
+}
+
+bool
+token_equal (const struct token *a, const struct token *b)
+{
+  if (a->type != b->type)
+    return false;
+
+  switch (a->type)
+    {
+    case T_POS_NUM:
+    case T_NEG_NUM:
+      return a->number == b->number;
+
+    case T_ID:
+    case T_MACRO_ID:
+    case T_MACRO_PUNCT:
+    case T_STRING:
+      return ss_equals (a->string, b->string);
+
+    default:
+      return true;
+    }
  }
  
  static char *
@@ -150,7 +176,7 @@ token_to_string (const struct token *token)
        return string_representation (token->string);
  
      default:
-      return xstrdup_if_nonnull (token_type_to_name (token->type));
+      return xstrdup_if_nonnull (token_type_to_string (token->type));
      }
  }
  
@@ -188,3 +214,41 @@ token_integer (const struct token *t)
    assert (token_is_integer (t));
    return t->number;
  }
+\f
+void
+tokens_copy (struct tokens *dst, const struct tokens *src)
+{
+  *dst = (struct tokens) {
+    .tokens = xnmalloc (src->n, sizeof *dst->tokens),
+    .n = src->n,
+    .allocated = src->n,
+  };
+
+  for (size_t i = 0; i < src->n; i++)
+    token_copy (&dst->tokens[i], &src->tokens[i]);
+}
+
+void
+tokens_uninit (struct tokens *tokens)
+{
+  for (size_t i = 0; i < tokens->n; i++)
+    token_uninit (&tokens->tokens[i]);
+  free (tokens->tokens);
+}
+
+void
+tokens_add (struct tokens *tokens, const struct token *t)
+{
+  if (tokens->allocated >= tokens->n)
+    tokens->tokens = x2nrealloc (tokens->tokens, &tokens->allocated,
+                                 sizeof *tokens->tokens);
+
+  token_copy (&tokens->tokens[tokens->n++], t);
+}
+
+void
+tokens_print (const struct tokens *tokens, FILE *stream)
+{
+  for (size_t i = 0; i < tokens->n; i++)
+    token_print (&tokens->tokens[i], stream);
+}
diff --git a/src/language/lexer/token.h b/src/language/lexer/token.h

index 67bc7ddc3b74753d587e4e1a29e4b356caf033d0..d07f76b68b1f139200a9d32426cff0605196843b 100644 (file)
--- a/src/language/lexer/token.h
+++ b/src/language/lexer/token.h
@@ -17,6 +17,7 @@
  #ifndef TOKEN_H
  #define TOKEN_H 1
  
+#include <stdbool.h>
  #include <stdio.h>
  #include "libpspp/assertion.h"
  #include "libpspp/str.h"
@@ -33,15 +34,27 @@ struct token
      struct substring string;
    };
  
-#define TOKEN_INITIALIZER(TYPE, NUMBER, STRING) \
-        { TYPE, NUMBER, SS_LITERAL_INITIALIZER (STRING) }
-
-void token_init (struct token *);
+void token_copy (struct token *, const struct token *);
  void token_uninit (struct token *);
  
+bool token_equal (const struct token *, const struct token *);
+
  char *token_to_string (const struct token *);
  
  void token_print (const struct token *, FILE *);
+\f
+struct tokens
+  {
+    struct token *tokens;
+    size_t n;
+    size_t allocated;
+  };
+
+void tokens_copy (struct tokens *, const struct tokens *);
+void tokens_uninit (struct tokens *);
+void tokens_add (struct tokens *, const struct token *);
+
+void tokens_print (const struct tokens *, FILE *);
  
  static inline bool token_is_number (const struct token *);
  static inline double token_number (const struct token *);
diff --git a/src/language/utilities/title.c b/src/language/utilities/title.c

index a0a71e9cee7d4481d0fd143d1e56bf754c4b0edb..323bdf3fa51e79d35c375e8fc6a4dde22431a979 100644 (file)
--- a/src/language/utilities/title.c
+++ b/src/language/utilities/title.c
@@ -59,7 +59,7 @@ parse_title (struct lexer *lexer, void (*set_title) (const char *))
  
        /* Get the raw representation of all the tokens, including any space
           between them, and use it as the title. */
-      char *title = ss_xstrdup (lex_next_representation (lexer, 0, n - 1));
+      char *title = lex_next_representation (lexer, 0, n - 1);
        set_title (title);
        free (title);
  
diff --git a/src/libpspp/stringi-set.c b/src/libpspp/stringi-set.c

index b442a41567805a9ca92d431fc413468f5f9abc22..a609150d9930b572bd367d311eaa4abb100a1338 100644 (file)
--- a/src/libpspp/stringi-set.c
+++ b/src/libpspp/stringi-set.c
@@ -32,6 +32,8 @@
  
  static struct stringi_set_node *stringi_set_find_node__ (
    const struct stringi_set *, const char *, unsigned int hash);
+static struct stringi_set_node *stringi_set_find_node_len__ (
+  const struct stringi_set *, const char *, size_t length, unsigned int hash);
  static void stringi_set_insert__ (struct stringi_set *, char *,
                                   unsigned int hash);
  static bool stringi_set_delete__ (struct stringi_set *, const char *,
@@ -81,7 +83,16 @@ stringi_set_destroy (struct stringi_set *set)
  bool
  stringi_set_contains (const struct stringi_set *set, const char *s)
  {
-  return stringi_set_find_node (set, s) != NULL;
+  return stringi_set_contains_len (set, s, strlen (s));
+}
+
+/* Returns true if SET contains S with the given LENGTH (or a similar string
+   with different case), false otherwise. */
+bool
+stringi_set_contains_len (const struct stringi_set *set, const char *s,
+                          size_t length)
+{
+  return stringi_set_find_node_len (set, s, length) != NULL;
  }
  
  /* Returns the node in SET that contains S, or a null pointer if SET does not
@@ -89,7 +100,17 @@ stringi_set_contains (const struct stringi_set *set, const char *s)
  struct stringi_set_node *
  stringi_set_find_node (const struct stringi_set *set, const char *s)
  {
-  return stringi_set_find_node__ (set, s, utf8_hash_case_string (s, 0));
+  return stringi_set_find_node_len (set, s, strlen (s));
+}
+
+/* Returns the node in SET that contains S with the given LENGTH, or a null
+   pointer if SET does not contain S. */
+struct stringi_set_node *
+stringi_set_find_node_len (const struct stringi_set *set, const char *s,
+                           size_t length)
+{
+  return stringi_set_find_node_len__ (set, s, length,
+                                      utf8_hash_case_bytes (s, length, 0));
  }
  
  /* Inserts a copy of S into SET.  Returns true if successful, false if SET
@@ -281,13 +302,20 @@ stringi_set_get_sorted_array (const struct stringi_set *set)
  
  static struct stringi_set_node *
  stringi_set_find_node__ (const struct stringi_set *set, const char *s,
-                        unsigned int hash)
+                         unsigned int hash)
+{
+  return stringi_set_find_node_len__ (set, s, strlen (s), hash);
+}
+
+static struct stringi_set_node *
+stringi_set_find_node_len__ (const struct stringi_set *set, const char *s,
+                             size_t length, unsigned int hash)
  {
    struct stringi_set_node *node;
  
    HMAP_FOR_EACH_WITH_HASH (node, struct stringi_set_node, hmap_node,
                             hash, &set->hmap)
-    if (!utf8_strcasecmp (s, node->string))
+    if (!utf8_strncasecmp (s, length, node->string, strlen (node->string)))
        return node;
  
    return NULL;
diff --git a/src/libpspp/stringi-set.h b/src/libpspp/stringi-set.h

index 2a000889ec1fa9a686a519dbb859a3355131f01b..ea4fad6bcf47e9cc34ae54aebe6867652f65b005 100644 (file)
--- a/src/libpspp/stringi-set.h
+++ b/src/libpspp/stringi-set.h
@@ -60,8 +60,13 @@ static inline size_t stringi_set_count (const struct stringi_set *);
  static inline bool stringi_set_is_empty (const struct stringi_set *);
  
  bool stringi_set_contains (const struct stringi_set *, const char *);
+bool stringi_set_contains_len (const struct stringi_set *, const char *,
+                               size_t length);
  struct stringi_set_node *stringi_set_find_node (const struct stringi_set *,
-                                              const char *);
+                                                const char *);
+struct stringi_set_node *stringi_set_find_node_len (const struct stringi_set *,
+                                                    const char *,
+                                                    size_t length);
  
  bool stringi_set_insert (struct stringi_set *, const char *);
  bool stringi_set_insert_nocopy (struct stringi_set *, char *);
diff --git a/tests/automake.mk b/tests/automake.mk

index ec81e5288140196c5dbd368faf8c00d80e8480d7..4de61417b2ef7cf95b8898bd50abded0f386abfd 100644 (file)
--- a/tests/automake.mk
+++ b/tests/automake.mk
@@ -339,6 +339,7 @@ TESTSUITE_AT = \
         tests/data/sys-file.at \
         tests/data/encrypted-file.at \
         tests/language/command.at \
+       tests/language/control/define.at \
         tests/language/control/do-if.at \
         tests/language/control/do-repeat.at \
         tests/language/control/loop.at \
diff --git a/tests/language/control/define.at b/tests/language/control/define.at

new file mode 100644 (file)

index 0000000..adfd18c
--- /dev/null
+++ b/tests/language/control/define.at
@@ -0,0 +1,1126 @@
+dnl PSPP - a program for statistical analysis.
+dnl Copyright (C) 2017 Free Software Foundation, Inc.
+dnl
+dnl This program is free software: you can redistribute it and/or modify
+dnl it under the terms of the GNU General Public License as published by
+dnl the Free Software Foundation, either version 3 of the License, or
+dnl (at your option) any later version.
+dnl
+dnl This program is distributed in the hope that it will be useful,
+dnl but WITHOUT ANY WARRANTY; without even the implied warranty of
+dnl MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+dnl GNU nGeneral Public License for more details.
+dnl
+dnl You should have received a copy of the GNU General Public License
+dnl along with this program.  If not, see <http://www.gnu.org/licenses/>.
+dnl
+AT_BANNER([DEFINE])
+
+m4_define([PSPP_CHECK_MACRO_EXPANSION],
+  [AT_SETUP([macro expansion - $1])
+   AT_KEYWORDS([m4_bpatsubst([$1], [!], [])])
+   AT_DATA([define.sps], [$2
+DEBUG EXPAND.
+$3
+])
+   AT_CAPTURE_FILE([define.sps])
+   AT_DATA([expout], [$4
+])
+   AT_CHECK([pspp --testing-mode define.sps | sed '/^$/d'], [$6], [expout])
+   AT_CLEANUP])
+
+AT_SETUP([simple macro expansion])
+AT_DATA([define.sps], [dnl
+DEFINE !macro()
+a b c d
+e f g h.
+i j k l
+1,2,3,4.
+5+6+7.
+m(n,o).
+"a" "b" "c" 'a' 'b' 'c'.
+"x "" y".
+!ENDDEFINE.
+DEBUG EXPAND.
+!macro
+])
+AT_CHECK([pspp --testing-mode define.sps], [0], [dnl
+a b c d e f g h.
+i j k l 1, 2, 3, 4.
+5 + 6 + 7.
+m(n, o).
+"a" "b" "c" 'a' 'b' 'c'.
+"x "" y".
+])
+AT_CLEANUP
+
+PSPP_CHECK_MACRO_EXPANSION([one !TOKENS(1) positional argument],
+  [DEFINE !t1(!positional !tokens(1)) t1 (!1) !ENDDEFINE.],
+  [!t1 a.
+!t1 b.
+!t1 a b.],
+  [t1(a)
+t1(b)
+t1(a)
+note: unexpanded token "b"])
+
+AT_SETUP([macro expansion with positional arguments])
+AT_DATA([define.sps], [dnl
+DEFINE !title(!positional !tokens(1)) !1 !ENDDEFINE.
+DEFINE !t1(!positional !tokens(1)) t1 (!1) !ENDDEFINE.
+DEFINE !t2(!positional !tokens(2)) t2 (!1) !ENDDEFINE.
+
+DEFINE !ce(!positional !charend('/')) ce (!1) !ENDDEFINE.
+DEFINE !ce2(!positional !charend('(')
+           /!positional !charend(')'))
+ce2 (!1, !2)
+!ENDDEFINE.
+
+DEFINE !e(!positional !enclose('{','}')) e (!1) !ENDDEFINE.
+
+DEFINE !cmd(!positional !cmdend) cmd(!1) !ENDDEFINE.
+DEFINE !cmd2(!positional !cmdend
+            /!positional !tokens(1))
+cmd2(!1, !2)
+!ENDDEFINE.
+
+DEFINE !p(!positional !tokens(1)
+         /!positional !tokens(1)
+        /!positional !tokens(1))
+p(!1, !2, !3)(!*)
+!ENDDEFINE.
+
+DEBUG EXPAND.
+!title "!TOKENS(1) argument."
+!t1 a.
+!t1 b.
+!t1 a b.
+
+!title "!TOKENS(2) argument."
+!t2 a b.
+!t2 b c d.
+
+!title "!CHAREND argument."
+!ce/.
+!ce x/.
+!ce x y/.
+!ce x y z/.
+
+!title "Two !CHAREND arguments."
+!ce2 x(y).
+!ce2 1 2 3 4().
+
+!title "!ENCLOSE argument."
+!e {}.
+!e {a}.
+!e {a b}.
+
+!title "!CMDEND argument."
+!cmd 1 2 3 4.
+!cmd2 5 6.
+7.
+
+!title "Three !TOKENS(1) arguments."
+!p a b c.
+!p 1 -2 -3.
+])
+AT_CHECK([pspp --testing-mode define.sps], [0], [dnl
+"!TOKENS(1) argument."
+
+t1(a)
+
+t1(b)
+
+t1(a)
+
+note: unexpanded token "b"
+
+"!TOKENS(2) argument."
+
+t2(a b)
+
+t2(b c)
+
+note: unexpanded token "d"
+
+"!CHAREND argument."
+
+ce( )
+
+ce(x)
+
+ce(x y)
+
+ce(x y z)
+
+"Two !CHAREND arguments."
+
+ce2(x, y)
+
+ce2(1 2 3 4, )
+
+"!ENCLOSE argument."
+
+e( )
+
+e(a)
+
+e(a b)
+
+"!CMDEND argument."
+
+cmd(1 2 3 4)
+
+cmd2(5 6, 7)
+
+"Three !TOKENS(1) arguments."
+
+p(a, b, c) (a b c)
+
+p(1, -2, -3) (1 -2 -3)
+])
+AT_CLEANUP
+
+AT_SETUP([macro expansion with positional arguments - negative])
+AT_DATA([define.sps], [dnl
+DEFINE !title(!positional !tokens(1)) !1 !ENDDEFINE.
+DEFINE !p(!positional !tokens(1)
+         /!positional !tokens(1)
+        /!positional !tokens(1))
+(!1, !2, !3)
+!ENDDEFINE.
+
+DEFINE !ce(!positional !charend('/')) ce(!1) !ENDDEFINE.
+
+DEFINE !enc1(!positional !enclose('{', '}')) enc1(!1) !ENDDEFINE.
+DEBUG EXPAND.
+!title "Too few tokens for !TOKENS."
+!p a b.
+!p a.
+!p.
+
+!title "Missing charend delimiter."
+!ce a b c.
+
+!title "Missing start delimiter."
+!enc1 a b c.
+
+!title "Missing end delimiter."
+!enc1{a b c.
+])
+AT_CHECK([pspp --testing-mode define.sps], [1], [dnl
+"Too few tokens for !TOKENS."
+
+define.sps:13: error: DEBUG EXPAND: Unexpected end of command reading
+argument !3 to macro !p.
+
+note: unexpanded token "!p"
+
+note: unexpanded token "a"
+
+note: unexpanded token "b"
+
+define.sps:14: error: DEBUG EXPAND: Unexpected end of command reading
+argument !2 to macro !p.
+
+note: unexpanded token "!p"
+
+note: unexpanded token "a"
+
+define.sps:15: error: DEBUG EXPAND: Unexpected end of command reading
+argument !1 to macro !p.
+
+note: unexpanded token "!p"
+
+"Missing charend delimiter."
+
+define.sps:18: error: DEBUG EXPAND: Unexpected end of command reading
+argument !1 to macro !ce.
+
+note: unexpanded token "!ce"
+
+note: unexpanded token "a"
+
+note: unexpanded token "b"
+
+note: unexpanded token "c"
+
+"Missing start delimiter."
+
+define.sps:21: error: DEBUG EXPAND: Found `a' while expecting `{' reading
+argument !1 to macro !enc1.
+
+note: unexpanded token "!enc1"
+
+note: unexpanded token "a"
+
+note: unexpanded token "b"
+
+note: unexpanded token "c"
+
+"Missing end delimiter."
+
+define.sps:24: error: DEBUG EXPAND: Unexpected end of command reading
+argument !1 to macro !enc1.
+
+note: unexpanded token "!enc1"
+
+note: unexpanded token "{"
+
+note: unexpanded token "a"
+
+note: unexpanded token "b"
+
+note: unexpanded token "c"
+])
+AT_CLEANUP
+
+AT_SETUP([keyword macro argument name with ! prefix])
+AT_DATA([define.sps], [dnl
+DEFINE !macro(!x=!TOKENS(1).
+])
+AT_CHECK([pspp -O format=csv define.sps], [1], [dnl
+"define.sps:1.15-1.16: error: DEFINE: Syntax error at `!x': Keyword macro parameter must be named in definition without ""!"" prefix."
+])
+AT_CLEANUP
+
+AT_SETUP([reserved macro keyword argument name])
+AT_DATA([define.sps], [dnl
+DEFINE !macro(if=!TOKENS(1).
+])
+AT_CHECK([pspp -O format=csv define.sps], [1], [dnl
+"define.sps:1.15-1.16: error: DEFINE: Syntax error at `if': Cannot use macro keyword ""if"" as an argument name."
+])
+AT_CLEANUP
+
+PSPP_CHECK_MACRO_EXPANSION([one !TOKENS(1) keyword argument],
+  [DEFINE !k(arg1 = !TOKENS(1)) k(!arg1) !ENDDEFINE.],
+  [!k arg1=x.
+!k arg1=x y.
+!k.],
+  [k(x)
+k(x)
+note: unexpanded token "y"
+k( )])
+
+PSPP_CHECK_MACRO_EXPANSION([one !TOKENS(1) keyword argument - negative],
+  [DEFINE !k(arg1 = !TOKENS(1)) k(!arg1) !ENDDEFINE.],
+  [!k arg1.
+!k arg1=.], [dnl
+define.sps:3: error: DEBUG EXPAND: Found `.' while expecting `=' reading
+argument !arg1 to macro !k.
+note: unexpanded token "!k"
+note: unexpanded token "arg1"
+define.sps:4: error: DEBUG EXPAND: Unexpected end of command reading argument !
+arg1 to macro !k.
+note: unexpanded token "!k"
+note: unexpanded token "arg1"
+note: unexpanded token "="], [1])
+
+PSPP_CHECK_MACRO_EXPANSION([!CHAREND('/') keyword arguments], [dnl
+DEFINE !k(arg1 = !CHAREND('/')
+         /arg2 = !CHAREND('/'))
+k(!arg1, !arg2)
+!ENDDEFINE.],
+  [!k arg1=x/ arg2=y/.
+!k arg1=x/.
+!k arg2=y/.
+!k.],
+  [k(x, y)
+k(x, )
+k(, y)
+k(, )])
+
+PSPP_CHECK_MACRO_EXPANSION([!CHAREND('/') keyword arguments - negative], [dnl
+DEFINE !k(arg1 = !CHAREND('/')
+         /arg2 = !CHAREND('/'))
+k(!arg1, !arg2)
+!ENDDEFINE.],
+  [!k arg1.
+!k arg1=.
+!k arg1=x.
+!k arg1=x/ arg2=y.],
+  [define.sps:6: error: DEBUG EXPAND: Found `.' while expecting `=' reading
+argument !arg1 to macro !k.
+note: unexpanded token "!k"
+note: unexpanded token "arg1"
+define.sps:7: error: DEBUG EXPAND: Unexpected end of command reading argument !
+arg1 to macro !k.
+note: unexpanded token "!k"
+note: unexpanded token "arg1"
+note: unexpanded token "="
+define.sps:8: error: DEBUG EXPAND: Unexpected end of command reading argument !
+arg1 to macro !k.
+note: unexpanded token "!k"
+note: unexpanded token "arg1"
+note: unexpanded token "="
+note: unexpanded token "x"
+define.sps:9: error: DEBUG EXPAND: Unexpected end of command reading argument !
+arg2 to macro !k.
+note: unexpanded token "!k"
+note: unexpanded token "arg1"
+note: unexpanded token "="
+note: unexpanded token "x"
+note: unexpanded token "/"
+note: unexpanded token "arg2"
+note: unexpanded token "="
+note: unexpanded token "y"])
+
+PSPP_CHECK_MACRO_EXPANSION([default keyword arguments],
+  [DEFINE !k(arg1 = !DEFAULT(a b c) !CMDEND) k(!arg1) !ENDDEFINE],
+  [!k arg1=x.
+!k],
+  [k(x)
+k(a b c)])
+
+dnl Keep this test in sync with the examples for !BLANKS in the manual.
+PSPP_CHECK_MACRO_EXPANSION([!BLANKS],
+  [DEFINE !b()
+!BLANKS(0).
+!QUOTE(!BLANKS(0)).
+!BLANKS(1).
+!QUOTE(!BLANKS(1)).
+!BLANKS(2).
+!QUOTE(!BLANKS(2)).
+!BLANKS(5).
+!QUOTE(!BLANKS(5)).
+!ENDDEFINE],
+  [!b.],
+  [.
+''.
+.
+' '.
+.
+'  '.
+.
+'     '.])
+
+dnl Keep this test in sync with the examples for !CONCAT in the manual.
+PSPP_CHECK_MACRO_EXPANSION([!CONCAT],
+  [DEFINE !c()
+!CONCAT(x, y).
+!CONCAT('x', 'y').
+!CONCAT(12, 34).
+!CONCAT(!NULL, 123).
+!CONCAT(x, 0).
+!CONCAT(x, 0, y).
+!CONCAT(0, x).
+!CONCAT(0, x, y).
+!ENDDEFINE],
+  [!c.],
+  [xy.
+xy.
+1234.
+123.
+x0.
+x0y.
+0 x.
+0 xy.])
+
+dnl Keep this test in sync with the examples for !EVAL in the manual.
+PSPP_CHECK_MACRO_EXPANSION([!EVAL],
+  [DEFINE !vars() a b c !ENDDEFINE.
+DEFINE !e()
+!vars.
+!QUOTE(!vars).
+!EVAL(!vars).
+!QUOTE(!EVAL(!vars)).
+!ENDDEFINE
+DEFINE !e2(!positional !enclose('(',')'))
+!1.
+!QUOTE(!1).
+!EVAL(!1).
+!QUOTE(!EVAL(!1)).
+!ENDDEFINE],
+  [!e.
+!e2(!vars)],
+  [a b c.
+'!vars'.
+a b c.
+'a b c'.
+a b c.
+'!vars'.
+a b c.
+'a b c'.])
+
+dnl Keep this test in sync with the examples for !HEAD in the manual.
+PSPP_CHECK_MACRO_EXPANSION([!HEAD],
+  [DEFINE !h()
+!HEAD('a b c').
+!HEAD('a').
+!HEAD(!NULL).
+!HEAD('').
+!ENDDEFINE],
+  [!h.],
+  [a.
+a.
+.
+.])
+
+dnl Keep this test in sync with the examples for !TAIL in the manual.
+PSPP_CHECK_MACRO_EXPANSION([!TAIL],
+  [DEFINE !t()
+!TAIL('a b c').
+!TAIL('a').
+!TAIL(!NULL).
+!TAIL('').
+!ENDDEFINE],
+  [!t.],
+  [b c.
+.
+.
+.])
+
+dnl Keep this test in sync with the examples for !INDEX in the manual.
+PSPP_CHECK_MACRO_EXPANSION([!INDEX],
+  [DEFINE !i()
+!INDEX(banana, an).
+!INDEX(banana, nan).
+!INDEX(banana, apple).
+!INDEX("banana", nan).
+!INDEX("banana", "nan").
+!INDEX(!UNQUOTE("banana"), !UNQUOTE("nan")).
+!ENDDEFINE],
+  [!i.],
+  [2.
+3.
+0.
+4.
+0.
+3.])
+
+dnl Keep this test in sync with the examples for !LENGTH in the manual.
+PSPP_CHECK_MACRO_EXPANSION([!LENGTH],
+  [DEFINE !l()
+!LENGTH(123).
+!LENGTH(123.00).
+!LENGTH( 123 ).
+!LENGTH("123").
+!LENGTH(xyzzy).
+!LENGTH("xyzzy").
+!LENGTH("xy""zzy").
+!LENGTH(!UNQUOTE("xyzzy")).
+!LENGTH(!UNQUOTE("xy""zzy")).
+!LENGTH(!NULL).
+!ENDDEFINE.
+DEFINE !la(!positional !enclose('(',')'))
+!LENGTH(!1).
+!ENDDEFINE.],
+  [!l.
+!la(a b c).
+!la().],
+  [3.
+6.
+3.
+5.
+5.
+7.
+9.
+5.
+6.
+0.
+5.
+0.])
+
+dnl Keep this test in sync with the examples for !SUBSTR in the manual.
+PSPP_CHECK_MACRO_EXPANSION([!SUBSTR],
+  [DEFINE !s()
+!SUBSTR(banana, 3).
+!SUBSTR(banana, 3, 3).
+!SUBSTR("banana", 3).
+!SUBSTR(!UNQUOTE("banana"), 3).
+!SUBSTR("banana", 3, 3).
+!SUBSTR(banana, 3, 0).
+!SUBSTR(banana, 3, 10).
+!SUBSTR(banana, 10, 3).
+!ENDDEFINE.],
+  [!s.],
+  [error
+nana.
+nan.
+anana.
+nana.
+ana.
+.
+nana.
+.])
+
+dnl Keep this test in sync with the examples for !UPCASE in the manual.
+PSPP_CHECK_MACRO_EXPANSION([!UPCASE],
+  [DEFINE !u()
+!UPCASE(freckle).
+!UPCASE('freckle').
+!UPCASE('a b c').
+!UPCASE('A B C').
+!ENDDEFINE.],
+  [!u.],
+  [FRECKLE.
+FRECKLE.
+A B C.
+A B C.])
+
+
+dnl !* is implemented separately inside and outside function arguments
+dnl so this test makes sure to include both.
+PSPP_CHECK_MACRO_EXPANSION([!*], [dnl
+DEFINE !m(!POSITIONAL !TOKENS(1)
+         /!POSITIONAL !TOKENS(1))
+!*/
+!LENGTH(!*)/
+!SUBSTR(!*, 3)/
+!QUOTE(!*).
+!ENDDEFINE.],
+  [!m 123 b
+!m 2 3
+!m '' 'b'.
+], [123 b / 5 / 3 b / '123 b'.
+2 3 / 3 / 3 / '2 3'.
+'' 'b' / 6 / 'b' / ''''' ''b'''.])
+
+AT_SETUP([macro maximum nesting level (MNEST)])
+AT_KEYWORDS([MNEST])
+AT_DATA([define.sps], [dnl
+DEFINE !macro()
+!macro
+!ENDDEFINE.
+!macro.
+])
+AT_CHECK([pspp -O format=csv define.sps], [1], [dnl
+maximum nesting level exceeded
+define.sps:4.1-4.6: error: Syntax error at `!macro' (in expansion of `!macro'): expecting command name.
+])
+AT_CLEANUP
+
+AT_SETUP([macro !IF condition])
+AT_KEYWORDS([if])
+for operators in \
+    '!eq !ne !lt !gt !le !ge' \
+    '  =  <>   <   >  <=  >='
+do
+    set $operators
+    AS_BOX([$operators])
+    cat > define.sps <<EOF
+DEFINE !test(!positional !tokens(1))
+!if (!1 $1 1) !then true !else false !ifend
+!if (!1 $2 1) !then true !else false !ifend
+!if (!1 $3 1) !then true !else false !ifend
+!if (!1 $4 1) !then true !else false !ifend
+!if (!1 $5 1) !then true !else false !ifend
+!if (!1 $6 1) !then true !else false !ifend.
+!ENDDEFINE.
+DEBUG EXPAND.
+!test 0
+!test 1
+!test 2
+!test '1'
+!test 1.0
+EOF
+    AT_CAPTURE_FILE([define.sps])
+    AT_CHECK([pspp --testing-mode define.sps], [0], [dnl
+false true true false true false.
+
+true false false false true true.
+
+false true false true false true.
+
+true false false false true true.
+
+false true false true false true.
+])
+done
+AT_CLEANUP
+
+AT_SETUP([macro !IF condition -- case sensitivity])
+AT_KEYWORDS([if])
+for operators in \
+    '!eq !ne !lt !gt !le !ge' \
+    '  =  <>   <   >  <=  >='
+do
+    set $operators
+    AS_BOX([$operators])
+    cat > define.sps <<EOF
+DEFINE !test(!positional !tokens(1))
+!if (!1 $1 a) !then true !else false !ifend
+!if (!1 $1 A) !then true !else false !ifend
+!if (!1 $2 a) !then true !else false !ifend
+!if (!1 $2 A) !then true !else false !ifend
+!if (!1 $3 a) !then true !else false !ifend
+!if (!1 $3 A) !then true !else false !ifend
+!if (!1 $4 a) !then true !else false !ifend
+!if (!1 $4 A) !then true !else false !ifend
+!if (!1 $5 a) !then true !else false !ifend
+!if (!1 $5 A) !then true !else false !ifend
+!if (!1 $6 a) !then true !else false !ifend
+!if (!1 $6 A) !then true !else false !ifend
+!if (!1 $1 !null) !then true !else false !ifend
+!if (!1 $2 !null) !then true !else false !ifend.
+!ENDDEFINE.
+DEBUG EXPAND.
+!test a
+!test A
+!test b
+!test B
+EOF
+    AT_CAPTURE_FILE([define.sps])
+    AT_CHECK([pspp --testing-mode define.sps], [0], [dnl
+true false false true false false false true true false true true false true.
+
+false true true false true false false false true true false true false true.
+
+false false true true false false true true false false true true false true.
+
+false false true true true false false true true false false true false true.
+])
+done
+AT_CLEANUP
+
+AT_SETUP([macro !IF condition -- logical operators])
+AT_KEYWORDS([if])
+for operators in \
+    '!and !or !not' \
+    '   &   |    ~'
+do
+    set $operators
+    AS_BOX([$operators])
+    cat > define.sps <<EOF
+DEFINE !test_binary(!positional !tokens(1)/!positional !tokens(1))
+!if !1 $1 !2 !then true !else false !ifend
+!if !1 $2 !2 !then true !else false !ifend.
+!ENDDEFINE.
+
+DEFINE !test_unary(!positional !tokens(1))
+!if $3 !1 !then true !else false !ifend.
+!ENDDEFINE.
+
+* These are:
+  ((not A) and B) or C
+  not (A and B) or C
+  not A and (B or C)
+DEFINE !test_prec(!pos !tokens(1)/!pos !tokens(1)/!pos !tokens(1))
+!if $3 !1 $1 !2 $2 !3 !then true !else false !ifend
+!if $3 (!1 $1 !2) $2 !3 !then true !else false !ifend
+!if $3 !1 $1 (!2 $2 !3) !then true !else false !ifend
+!ENDDEFINE.
+
+DEBUG EXPAND.
+!test_binary 0 0
+!test_binary 0 1
+!test_binary 1 0
+!test_binary 1 1
+!test_unary 0
+!test_unary 1
+!test_prec 0 0 0 !test_prec 0 0 1 !test_prec 0 1 0 !test_prec 0 1 1.
+!test_prec 1 0 0 !test_prec 1 0 1 !test_prec 1 1 0 !test_prec 1 1 1.
+EOF
+    AT_CAPTURE_FILE([define.sps])
+    AT_CHECK([pspp --testing-mode define.sps], [0], [dnl
+false false.
+
+false true.
+
+false true.
+
+true true.
+
+true.
+
+false.
+
+false true false
+true true true
+true true true
+true true true
+
+false true false
+true true false
+false false false
+true true false
+])
+done
+AT_CLEANUP
+
+AT_SETUP([macro !LET])
+AT_KEYWORDS([let])
+AT_DATA([define.sps], [dnl
+DEFINE !macro(!POS !CMDEND)
+!LET !v1 = !CONCAT('x',!1,'y')
+!LET !v2 = !QUOTE(!v1)
+!LET !v3 = (!LENGTH(!1) = 1)
+!LET !v4 = (!SUBSTR(!1, 3) = !NULL)
+v1=!v1.
+v2=!v2.
+v3=!v3.
+v4=!v4.
+!ENDDEFINE.
+DEBUG EXPAND.
+!macro 0.
+!macro.
+!macro xyzzy.
+])
+AT_CHECK([pspp --testing-mode define.sps], [0], [dnl
+v1 = x0y.
+v2 = x0y.
+v3 = 1.
+v4 = 1.
+
+v1 = xy.
+v2 = xy.
+v3 = 0.
+v4 = 1.
+
+v1 = xxyzzyy.
+v2 = xxyzzyy.
+v3 = 0.
+v4 = 0.
+])
+AT_CLEANUP
+
+AT_SETUP([macro indexed !DO])
+AT_KEYWORDS([index do])
+AT_DATA([define.sps], [dnl
+DEFINE !title(!POS !TOKENS(1)) !1. !ENDDEFINE.
+
+DEFINE !for(!POS !TOKENS(1) / !POS !TOKENS(1))
+!DO !var = !1 !TO !2 !var !DOEND.
+!ENDDEFINE.
+
+DEFINE !forby(!POS !TOKENS(1) / !POS !TOKENS(1) / !POS !TOKENS(1))
+!DO !var = !1 !TO !2 !BY !3 !var !DOEND.
+!ENDDEFINE.
+
+DEBUG EXPAND.
+!title "increasing".
+!for 1 5.
+!forby 1 5 1.
+!forby 1 5 2.
+!forby 1 5 2.5.
+!forby 1 5 -1.
+
+!title "decreasing".
+!for 5 1.
+!forby 5 1 1.
+!forby 5 1 -1.
+!forby 5 1 -2.
+!forby 5 1 -3.
+
+!title "non-integer".
+!for 1.5 3.5.
+])
+AT_CHECK([pspp --testing-mode define.sps], [0], [dnl
+"increasing".
+
+1 2 3 4 5.
+
+1 2 3 4 5.
+
+1 3 5.
+
+1 3.5.
+
+.
+
+"decreasing".
+
+.
+
+.
+
+5 4 3 2 1.
+
+5 3 1.
+
+5 2.
+
+"non-integer".
+
+1.5 2.5 3.5.
+])
+AT_CLEANUP
+
+AT_SETUP([macro !DO invalid variable names])
+AT_KEYWORDS([index do])
+AT_DATA([define.sps], [dnl
+DEFINE !for(x=!TOKENS(1) / y=!TOKENS(1))
+!DO !x = !x !TO !y !var !DOEND.
+!ENDDEFINE.
+
+DEFINE !for2(x=!TOKENS(1) / y=!TOKENS(1))
+!DO !noexpand = !x !TO !y !var !DOEND.
+!ENDDEFINE.
+
+DEBUG EXPAND.
+!for x=1 y=5.
+!for2 x=1 y=5.
+])
+AT_CHECK([pspp --testing-mode define.sps], [0], [dnl
+cannot use argument name or macro keyword as !DO variable
+cannot use argument name or macro keyword as !DO variable
+!DO 1 = 1 !TO 5 !var !DOEND.
+
+!DO !noexpand = 1 !TO 5 !var !DOEND.
+])
+AT_CLEANUP
+
+AT_SETUP([macro indexed !DO reaches MITERATE])
+AT_KEYWORDS([index do])
+AT_DATA([define.sps], [dnl
+DEFINE !title(!POS !TOKENS(1)) !1. !ENDDEFINE.
+
+DEFINE !for(!POS !TOKENS(1) / !POS !TOKENS(1))
+!DO !var = !1 !TO !2 !var !DOEND.
+!ENDDEFINE.
+
+DEFINE !forby(!POS !TOKENS(1) / !POS !TOKENS(1) / !POS !TOKENS(1))
+!DO !var = !1 !TO !2 !BY !3 !var !DOEND.
+!ENDDEFINE.
+
+SET MITERATE=3.
+DEBUG EXPAND.
+!title "increasing".
+!for 1 5.
+!forby 1 5 1.
+!forby 1 5 2.
+!forby 1 5 2.5.
+!forby 1 5 -1.
+
+!title "decreasing".
+!for 5 1.
+!forby 5 1 1.
+!forby 5 1 -1.
+!forby 5 1 -2.
+!forby 5 1 -3.
+
+!title "non-integer".
+!for 1.5 3.5.
+])
+AT_CHECK([pspp --testing-mode define.sps], [0], [dnl
+exceeded maximum number of iterations 3
+exceeded maximum number of iterations 3
+exceeded maximum number of iterations 3
+"increasing".
+
+1 2 3 4.
+
+1 2 3 4.
+
+1 3 5.
+
+1 3.5.
+
+.
+
+"decreasing".
+
+.
+
+.
+
+5 4 3 2.
+
+5 3 1.
+
+5 2.
+
+"non-integer".
+
+1.5 2.5 3.5.
+])
+AT_CLEANUP
+
+AT_SETUP([!BREAK with macro indexed !DO])
+AT_KEYWORDS([index do break])
+AT_DATA([define.sps], [dnl
+DEFINE !title(!POS !TOKENS(1)) !1. !ENDDEFINE.
+
+DEFINE !for(!POS !TOKENS(1) / !POS !TOKENS(1) / !POS !TOKENS(1))
+!DO !var = !1 !TO !2
+  !var
+  !IF 1 !THEN
+    !IF !var = !3 !THEN
+      x
+      !BREAK
+      y
+    !IFEND
+    ,
+  !IFEND
+!DOEND.
+!ENDDEFINE.
+
+DEBUG EXPAND.
+!for 1 5 4.
+])
+AT_CHECK([pspp --testing-mode define.sps], [0], [dnl
+1, 2, 3, 4 x.
+])
+AT_CLEANUP
+
+AT_SETUP([macro list !DO])
+AT_KEYWORDS([index do])
+AT_DATA([define.sps], [dnl
+DEFINE !for(!POS !CMDEND)
+(!DO !i !IN (!1) (!i) !DOEND).
+!ENDDEFINE.
+
+DEBUG EXPAND.
+!for a b c.
+!for 'foo bar baz quux'.
+!for.
+])
+AT_CHECK([pspp --testing-mode define.sps], [0], [dnl
+( (a) (b) (c) ).
+
+( (foo) (bar) (baz) (quux) ).
+
+( ).
+])
+AT_CLEANUP
+
+AT_SETUP([macro list !DO reaches MITERATE])
+AT_KEYWORDS([index do])
+AT_DATA([define.sps], [dnl
+DEFINE !for(!POS !CMDEND)
+(!DO !i !IN (!1) (!i) !DOEND).
+!ENDDEFINE.
+
+SET MITERATE=2.
+DEBUG EXPAND.
+!for a b c.
+!for 'foo bar baz quux'.
+!for.
+])
+AT_CHECK([pspp --testing-mode define.sps], [0], [dnl
+exceeded maximum number of iterations 2
+exceeded maximum number of iterations 2
+( (a) (b) ).
+
+( (foo) (bar) ).
+
+( ).
+])
+AT_CLEANUP
+
+AT_SETUP([!BREAK with macro list !DO])
+AT_KEYWORDS([index break do])
+AT_DATA([define.sps], [dnl
+DEFINE !for(!POS !TOKENS(1) / !POS !CMDEND)
+(!DO !i !IN (!2)
+  (!i)
+  !IF 1 !THEN
+    !IF !i = !1 !THEN
+      x
+      !BREAK
+      y
+    !IFEND
+    ,
+  !IFEND
+!DOEND).
+!ENDDEFINE.
+
+DEBUG EXPAND.
+!for d a b c.
+!for baz 'foo bar baz quux'.
+!for e.
+])
+AT_CHECK([pspp --testing-mode define.sps], [0], [dnl
+( (a), (b), (c), ).
+
+( (foo), (bar), (baz)x).
+
+( ).
+])
+AT_CLEANUP
+
+AT_SETUP([macro !LET])
+AT_DATA([define.sps], [dnl
+DEFINE !macro(!pos !enclose('(',')'))
+!LET !x=!1
+!LET !y=!QUOTE(!1)
+!LET !z=(!y="abc")
+!y !z
+!ENDDEFINE.
+
+DEBUG EXPAND.
+!macro(1+2).
+!macro(abc).
+])
+AT_CHECK([pspp --testing-mode define.sps -O format=csv], [0], [dnl
+1 + 2 0
+
+abc 1
+])
+AT_CLEANUP
+
+AT_SETUP([macro !LET invalid variable names])
+AT_DATA([define.sps], [dnl
+DEFINE !macro(x=!tokens(1))
+!LET !x=!x
+!ENDDEFINE.
+
+DEFINE !macro2()
+!LET !do=x
+!ENDDEFINE.
+
+DEBUG EXPAND.
+!macro 1.
+!macro2.
+])
+AT_CHECK([pspp --testing-mode define.sps -O format=csv], [0], [dnl
+cannot use argument name or macro keyword as !LET variable
+cannot use argument name or macro keyword as !LET variable
+expected macro variable name following !DO
+!LET =
+
+!LET !do = x
+])
+AT_CLEANUP
+
+AT_SETUP([BEGIN DATA inside a macro])
+AT_DATA([define.sps], [dnl
+DEFINE !macro()
+DATA LIST NOTABLE /x 1.
+BEGIN DATA
+1
+2
+3
+END DATA.
+LIST.
+!ENDDEFINE.
+
+!macro
+])
+AT_CHECK([pspp define.sps -O format=csv], [0], [dnl
+Table: Data List
+x
+1
+2
+3
+])
+AT_CLEANUP
+
+AT_SETUP([TITLE and SUBTITLE with macros])
+AT_KEYWORDS([macro])
+for command in TITLE SUBTITLE; do
+    cat >title.sps <<EOF
+DEFINE !paste(!POS !TOKENS(1) / !POS !TOKENS(1))
+!CONCAT(!1,!2)
+!ENDDEFINE.
+$command prefix !paste foo bar suffix.
+SHOW $command.
+EOF
+    cat >expout <<EOF
+title.sps:5: note: SHOW: $command is prefix foobar suffix.
+EOF
+    AT_CHECK([pspp -O format=csv title.sps], [0], [expout])
+done
+AT_CLEANUP
+
+AT_SETUP([error message within macro expansion])
+AT_DATA([define.sps], [dnl
+DEFINE !vars(!POS !TOKENS(1)) a b C !ENDDEFINE.
+DATA LIST NOTABLE /a b 1-2.
+COMPUTE x = !vars x.
+])
+AT_CHECK([pspp -O format=csv define.sps], [1], [dnl
+define.sps:3.13-3.19: error: COMPUTE: Syntax error at `b' (in expansion of `!vars x'): expecting end of command.
+])
+AT_CLEANUP
+\ No newline at end of file
diff --git a/tests/language/lexer/lexer.at b/tests/language/lexer/lexer.at

index c572e5fd864ad8ac042aaf0c983f3b3bd425e066..212aadeb0897037656e0b0e4ca1c7016947a8c35 100644 (file)
--- a/tests/language/lexer/lexer.at
+++ b/tests/language/lexer/lexer.at
@@ -87,7 +87,7 @@ lexer.sps:1: error: Unknown command `datA dist'.
  
  lexer.sps:2: error: LIST: LIST is allowed only after the active dataset has been defined.
  
-lexer.sps:2.6: error: LIST: Syntax error at `...': Bad character U+0000 in input.
+lexer.sps:2.6: error: LIST: Syntax error: Bad character U+0000 in input.
  ])
  AT_CLEANUP
  
diff --git a/tests/language/lexer/scan-test.c b/tests/language/lexer/scan-test.c

index 1eb04338e3b39b183b3c957c68288839ce68d4ab..2a77e127ace3405ce6770b31aaf0cd7544d3db9d 100644 (file)
--- a/tests/language/lexer/scan-test.c
+++ b/tests/language/lexer/scan-test.c
@@ -73,7 +73,7 @@ main (int argc, char *argv[])
          length--;
      }
  
-  string_lexer_init (&slex, input, length, mode);
+  string_lexer_init (&slex, input, length, mode, false);
    do
      {
        struct token token;
diff --git a/tests/language/lexer/segment-test.c b/tests/language/lexer/segment-test.c

index a3b67b89b24b2cd4eb59eb73617d1bd3682b7f4a..acb444f20018c3dceb87da7a106592be21acaed9 100644 (file)
--- a/tests/language/lexer/segment-test.c
+++ b/tests/language/lexer/segment-test.c
@@ -108,8 +108,7 @@ main (int argc, char *argv[])
  static void
  check_segmentation (const char *input, size_t length, bool print_segments)
  {
-  struct segmenter s;
-  segmenter_init (&s, mode);
+  struct segmenter s = segmenter_init (mode, false);
  
    size_t line_number = 1;
    size_t line_offset = 0;
author	Ben Pfaff <blp@cs.stanford.edu>
	Tue, 23 Mar 2021 14:14:48 +0000 (07:14 -0700)
committer	Ben Pfaff <blp@cs.stanford.edu>
	Sun, 27 Jun 2021 18:30:40 +0000 (11:30 -0700)
doc/flow-control.texi		patch \| blob \| history
doc/utilities.texi		patch \| blob \| history
src/language/command.def		patch \| blob \| history
src/language/control/automake.mk		patch \| blob \| history
src/language/control/define.c	[new file with mode: 0644]	patch \| blob
src/language/control/repeat.c		patch \| blob \| history
src/language/lexer/automake.mk		patch \| blob \| history
src/language/lexer/lexer.c		patch \| blob \| history
src/language/lexer/lexer.h		patch \| blob \| history
src/language/lexer/macro.c	[new file with mode: 0644]	patch \| blob
src/language/lexer/macro.h	[new file with mode: 0644]	patch \| blob
src/language/lexer/scan.c		patch \| blob \| history
src/language/lexer/scan.h		patch \| blob \| history
src/language/lexer/segment.c		patch \| blob \| history
src/language/lexer/segment.h		patch \| blob \| history
src/language/lexer/token.c		patch \| blob \| history
src/language/lexer/token.h		patch \| blob \| history
src/language/utilities/title.c		patch \| blob \| history
src/libpspp/stringi-set.c		patch \| blob \| history
src/libpspp/stringi-set.h		patch \| blob \| history
tests/automake.mk		patch \| blob \| history
tests/language/control/define.at	[new file with mode: 0644]	patch \| blob
tests/language/lexer/lexer.at		patch \| blob \| history
tests/language/lexer/scan-test.c		patch \| blob \| history
tests/language/lexer/segment-test.c		patch \| blob \| history