pintos-os.org Git - pspp/blob - doc/q2c.texi

   1 @node q2c Input Format, GNU Free Documentation License, Data File Format, Top
   2 @appendix @code{q2c} Input Format
   3
   4 PSPP statistical procedures have a bizarre and somewhat irregular
   5 syntax.  Despite this, a parser generator has been written that
   6 adequately addresses many of the possibilities and tries to provide
   7 hooks for the exceptional cases.  This parser generator is named
   8 @code{q2c}.
   9
  10 @menu
  11 * Invoking q2c::                q2c command-line syntax.
  12 * q2c Input Structure::         High-level layout of the input file.
  13 * Grammar Rules::               Syntax of the grammar rules.
  14 @end menu
  15
  16 @node Invoking q2c, q2c Input Structure, q2c Input Format, q2c Input Format
  17 @section Invoking q2c
  18
  19 @example
  20 q2c @var{input.q} @var{output.c}
  21 @end example
  22
  23 @code{q2c} translates a @samp{.q} file into a @samp{.c} file.  It takes
  24 exactly two command-line arguments, which are the input file name and
  25 output file name, respectively.  @code{q2c} does not accept any
  26 command-line options.
  27
  28 @node q2c Input Structure, Grammar Rules, Invoking q2c, q2c Input Format
  29 @section @code{q2c} Input Structure
  30
  31 @code{q2c} input files are divided into two sections: the grammar rules
  32 and the supporting code.  The @dfn{grammar rules}, which make up the
  33 first part of the input, are used to define the syntax of the
  34 statistical procedure to be parsed.  The @dfn{supporting code},
  35 following the grammar rules, are copied largely unchanged to the output
  36 file, except for certain escapes.
  37
  38 The most important lines in the grammar rules are used for defining
  39 procedure syntax.  These lines can be prefixed with a dollar sign
  40 (@samp{$}), which prevents Emacs' CC-mode from munging them.  Besides
  41 this, a bang (@samp{!}) at the beginning of a line causes the line,
  42 minus the bang, to be written verbatim to the output file (useful for
  43 comments).  As a third special case, any line that begins with the exact
  44 characters @code{/* *INDENT} is ignored and not written to the output.
  45 This allows @code{.q} files to be processed through @code{indent}
  46 without being munged.
  47
  48 The syntax of the grammar rules themselves is given in the following
  49 sections.
  50
  51 The supporting code is passed into the output file largely unchanged.
  52 However, the following escapes are supported.  Each escape must appear
  53 on a line by itself.
  54
  55 @table @code
  56 @item /* (header) */
  57
  58 Expands to a series of C @code{#include} directives which include the
  59 headers that are required for the parser generated by @code{q2c}.
  60
  61 @item /* (decls @var{scope}) */
  62
  63 Expands to C variable and data type declarations for the variables and
  64 @code{enum}s input and output by the @code{q2c} parser.  @var{scope}
  65 must be either @code{local} or @code{global}.  @code{local} causes the
  66 declarations to be output as function locals.  @code{global} causes them
  67 to be declared as @code{static} module variables; thus, @code{global} is
  68 a bit of a misnomer.
  69
  70 @item /* (parser) */
  71
  72 Expands to the entire parser.  Must be enclosed within a C function.
  73
  74 @item /* (free) */
  75
  76 Expands to a set of calls to the @code{free} function for variables
  77 declared by the parser.  Only needs to be invoked if subcommands of type
  78 @code{string} are used in the grammar rules.
  79 @end table
  80
  81 @node Grammar Rules,  , q2c Input Structure, q2c Input Format
  82 @section Grammar Rules
  83
  84 The grammar rules describe the format of the syntax that the parser
  85 generated by @code{q2c} will understand.  The way that the grammar rules
  86 are included in @code{q2c} input file are described above.
  87
  88 The grammar rules are divided into tokens of the following types:
  89
  90 @table @asis
  91 @item Identifier (@code{ID})
  92
  93 An identifier token is a sequence of letters, digits, and underscores
  94 (@samp{_}).  Identifiers are @emph{not} case-sensitive.
  95
  96 @item String (@code{STRING})
  97
  98 String tokens are initiated by a double-quote character (@samp{"}) and
  99 consist of all the characters between that double quote and the next
 100 double quote, which must be on the same line as the first.  Within a
 101 string, a backslash can be used as a ``literal escape''.  The only
 102 reasons to use a literal escape are to include a double quote or a
 103 backslash within a string.
 104
 105 @item Special character
 106
 107 Other characters, other than white space, constitute tokens in
 108 themselves.
 109
 110 @end table
 111
 112 The syntax of the grammar rules is as follows:
 113
 114 @example
 115 grammar-rules ::= ID : subcommands .
 116 subcommands ::= subcommand
 117             ::= subcommands ; subcommand
 118 @end example
 119
 120 The syntax begins with an ID or STRING token that gives the name of the
 121 procedure to be parsed.  The rest of the syntax consists of subcommands
 122 separated by semicolons (@samp{;}) and terminated with a full stop
 123 (@samp{.}).
 124
 125 @example
 126 subcommand ::= sbc-options ID sbc-defn
 127 sbc-options ::=
 128             ::= sbc-option
 129             ::= sbc-options sbc-options
 130 sbc-option ::= *
 131            ::= +
 132            ::= ^
 133 sbc-defn ::= opt-prefix = specifiers
 134          ::= [ ID ] = array-sbc
 135          ::= opt-prefix = sbc-special-form
 136 opt-prefix ::=
 137            ::= ( ID )
 138 @end example
 139
 140 Each subcommand can be prefixed with one or more option characters.  An
 141 asterisk (@samp{*}) is used to indicate the default subcommand; the
 142 keyword used for the default subcommand can be omitted in the PSPP
 143 syntax file.  A plus sign (@samp{+}) is used to indicate that a
 144 subcommand can appear more than once; if it is not present then that
 145 subcommand can appear no more than once.
 146 A carat sign (@samp{^}) is used to indicate that a subcommand must appear
 147 at least once.
 148
 149 The subcommand name appears after the option characters.
 150
 151 There are three forms of subcommands.  The first and most common form
 152 simply gives an equals sign (@samp{=}) and a list of specifiers, which
 153 can each be set to a single setting.  The second form declares an array,
 154 which is a set of flags that can be individually turned on by the user.
 155 There are also several special forms that do not take a list of
 156 specifiers.
 157
 158 Arrays require an additional @code{ID} argument.  This is used as a
 159 prefix, prepended to the variable names constructed from the
 160 specifiers.  The other forms also allow an optional prefix to be
 161 specified.
 162
 163 @example
 164 array-sbc ::= alternatives
 165           ::= array-sbc , alternatives
 166 alternatives ::= ID
 167              ::= alternatives | ID
 168 @end example
 169
 170 An array subcommand is a set of Boolean values that can independently be
 171 turned on by the user, listed separated by commas (@samp{,}).  If an value has more
 172 than one name then these names are separated by pipes (@samp{|}).
 173
 174 @example
 175 specifiers ::= specifier
 176            ::= specifiers , specifier
 177 specifier ::= opt-id : settings
 178 opt-id ::=
 179        ::= ID
 180 @end example
 181
 182 Ordinary subcommands (other than arrays and special forms) require a
 183 list of specifiers.  Each specifier has an optional name and a list of
 184 settings.  If the name is given then a correspondingly named variable
 185 will be used to store the user's choice of setting.  If no name is given
 186 then there is no way to tell which setting the user picked; in this case
 187 the settings should probably have values attached.
 188
 189 @example
 190 settings ::= setting
 191          ::= settings / setting
 192 setting ::= setting-options ID setting-value
 193 setting-options ::=
 194                 ::= *
 195                 ::= !
 196                 ::= * !
 197 @end example
 198
 199 Individual settings are separated by forward slashes (@samp{/}).  Each
 200 setting can be as little as an @code{ID} token, but options and values
 201 can optionally be included.  The @samp{*} option means that, for this
 202 setting, the @code{ID} can be omitted.  The @samp{!} option means that
 203 this option is the default for its specifier.
 204
 205 @example
 206 setting-value ::=
 207               ::= ( setting-value-2 )
 208               ::= setting-value-2
 209 setting-value-2 ::= setting-value-options setting-value-type : ID
 210                     setting-value-restriction
 211 setting-value-options ::=
 212                       ::= *
 213 setting-value-type ::= N
 214                    ::= D
 215 setting-value-restriction ::=
 216                           ::= , STRING
 217 @end example
 218
 219 Settings may have values.  If the value must be enclosed in parentheses,
 220 then enclose the value declaration in parentheses.  Declare the setting
 221 type as @samp{n} or @samp{d} for integer or floating point type,
 222 respectively.  The given @code{ID} is used to construct a variable name.
 223 If option @samp{*} is given, then the value is optional; otherwise it
 224 must be specified whenever the corresponding setting is specified.  A
 225 ``restriction'' can also be specified which is a string giving a C
 226 expression limiting the valid range of the value.  The special escape
 227 @code{%s} should be used within the restriction to refer to the
 228 setting's value variable.
 229
 230 @example
 231 sbc-special-form ::= VAR
 232                  ::= VARLIST varlist-options
 233                  ::= INTEGER opt-list
 234                  ::= DOUBLE opt-list
 235                  ::= PINT
 236                  ::= STRING @r{(the literal word STRING)} string-options
 237                  ::= CUSTOM
 238 varlist-options ::=
 239                 ::= ( STRING )
 240 opt-list ::=
 241          ::= LIST
 242 string-options ::=
 243                ::= ( STRING STRING )
 244 @end example
 245
 246 The special forms are of the following types:
 247
 248 @table @code
 249 @item VAR
 250
 251 A single variable name.
 252
 253 @item VARLIST
 254
 255 A list of variables.  If given, the string can be used to provide
 256 @code{PV_@var{*}} options to the call to @code{parse_variables}.
 257
 258 @item INTEGER
 259
 260 A single integer value.
 261
 262 @item INTEGER LIST
 263
 264 A list of integers separated by spaces or commas.
 265
 266 @item DOUBLE
 267
 268 A single floating-point value.
 269
 270 @item DOUBLE LIST
 271
 272 A list of floating-point values.
 273
 274 @item PINT
 275
 276 A single positive integer value.
 277
 278 @item STRING
 279
 280 A string value.  If the options are given then the first string is an
 281 expression giving a restriction on the value of the string; the second
 282 string is an error message to display when the restriction is violated.
 283
 284 @item CUSTOM
 285
 286 A custom function is used to parse this subcommand.  The function must
 287 have prototype @code{int custom_@var{name} (void)}.  It should return 0
 288 on failure (when it has already issued an appropriate diagnostic), 1 on
 289 success, or 2 if it fails and the calling function should issue a syntax
 290 error on behalf of the custom handler.
 291
 292 @end table
 293 @setfilename ignored