X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=doc%2Flanguage.texi;h=5381668c705ad38675c924d2147858a8b2790c82;hb=9ade26c8349b4434008c46cf09bc7473ec743972;hp=d71ecc8a3868d73a4f27d9fa3bdc70edbd6b5cb7;hpb=00feff7775f55b3292d1f9461a79dde54b9eb2ba;p=pspp-builds.git diff --git a/doc/language.texi b/doc/language.texi index d71ecc8a..5381668c 100644 --- a/doc/language.texi +++ b/doc/language.texi @@ -3,18 +3,13 @@ @cindex language, PSPP @cindex PSPP, language -@quotation -@strong{Please note:} PSPP is not even close to completion. -Only a few statistical procedures are implemented. PSPP -is a work in progress. -@end quotation - This chapter discusses elements common to many PSPP commands. Later chapters will describe individual commands in detail. @menu * Tokens:: Characters combine to form tokens. * Commands:: Tokens combine to form commands. +* Syntax Variants:: Batch vs. Interactive mode * Types of Commands:: Commands come in several flavors. * Order of Commands:: Commands combine to form syntax files. * Missing Observations:: Handling missing observations. @@ -24,6 +19,7 @@ Later chapters will describe individual commands in detail. * BNF:: How command syntax is described. @end menu + @node Tokens @section Tokens @cindex language, lexical analysis @@ -115,27 +111,29 @@ character used for quoting in the string, double it, e.g.@: significant inside strings. Strings can be concatenated using @samp{+}, so that @samp{"a" + 'b' + -'c'} is equivalent to @samp{'abc'}. Concatenation is useful for -splitting a single string across multiple source lines. The maximum -length of a string, after concatenation, is 255 characters. - -Strings may also be expressed as hexadecimal, octal, or binary -character values by prefixing the initial quote character by @samp{X}, -@samp{O}, or @samp{B} or their lowercase equivalents. Each pair, -triplet, or octet of characters, according to the radix, is -transformed into a single character with the given value. If there is -an incomplete group of characters, the missing final digits are -assumed to be @samp{0}. These forms of strings are nonportable -because numeric values are associated with different characters by -different operating systems. Therefore, their use should be confined -to syntax files that will not be widely distributed. - -@cindex characters, reserved -@cindex 0 -@cindex white space -The character with value 00 is reserved for -internal use by PSPP. Its use in strings causes an error and -replacement by a space character. +'c'} is equivalent to @samp{'abc'}. So that a long string may be +broken across lines, a line break may precede or follow, or both +precede and follow, the @samp{+}. (However, an entirely blank line +preceding or following the @samp{+} is interpreted as ending the +current command.) + +Strings may also be expressed as hexadecimal character values by +prefixing the initial quote character by @samp{x} or @samp{X}. +Regardless of the syntax file or active dataset's encoding, the +hexadecimal digits in the string are interpreted as Unicode characters +in UTF-8 encoding. + +Individual Unicode code points may also be expressed by specifying the +hexadecimal code point number in single or double quotes preceded by +@samp{u} or @samp{U}. For example, Unicode code point U+1D11E, the +musical G clef character, could be expressed as @code{U'1D11E'}. +Invalid Unicode code points (above U+10FFFF or in between U+D800 and +U+DFFF) are not allowed. + +When strings are concatenated with @samp{+}, each segment's prefix is +considered individually. For example, @code{'The G clef symbol is:' + +u"1d11e" + "."} inserts a G clef symbol in the middle of an otherwise +plain text string. @item Punctuators and Operators @cindex punctuators @@ -152,11 +150,6 @@ punctuator only as the last character on a line (except white space). When it is the last non-space character on a line, a period is not treated as part of another token, even if it would otherwise be part of, e.g.@:, an identifier or a floating-point number. - -Actually, the character that ends a command can be changed with -@cmd{SET}'s ENDCMD subcommand (@pxref{SET}), but we do not recommend -doing so. Throughout the remainder of this manual we will assume that -the default setting is in effect. @end table @node Commands @@ -184,19 +177,43 @@ by a forward slash (@samp{/}). There are multiple ways to mark the end of a command. The most common way is to end the last line of the command with a period (@samp{.}) as described in the previous section (@pxref{Tokens}). A blank line, or -one that consists only of white space or comments, also ends a command -by default, although you can use the NULLINE subcommand of @cmd{SET} -to disable this feature (@pxref{SET}). - -In batch mode only, that is, when reading commands from a file instead -of an interactive user, any line that contains a non-space character -in the leftmost column begins a new command. Thus, each command -consists of a flush-left line followed by any number of lines indented -from the left margin. In this mode, a plus or minus sign -(@samp{+}, @samp{@minus{}}) as the first character -in a line is ignored and causes that line to begin a new command, -which allows for visual indentation of a command without that command -being considered part of the previous command. +one that consists only of white space or comments, also ends a command. + +@node Syntax Variants +@section Syntax Variants + +@cindex Batch syntax +@cindex Interactive syntax + +There are three variants of command syntax, which vary only in how +they detect the end of one command and the start of the next. + +In @dfn{interactive mode}, which is the default for syntax typed at a +command prompt, a period as the last non-blank character on a line +ends a command. A blank line also ends a command. + +In @dfn{batch mode}, an end-of-line period or a blank line also ends a +command. Additionally, it treats any line that has a non-blank +character in the leftmost column as beginning a new command. Thus, in +batch mode the second and subsequent lines in a command must be +indented. + +Regardless of the syntax mode, a plus sign, minus sign, or period in +the leftmost column of a line is ignored and causes that line to begin +a new command. This is most useful in batch mode, in which the first +line of a new command could not otherwise be indented, but it is +accepted regardless of syntax mode. + +The default mode for reading commands from a file is @dfn{auto mode}. +It is the same as batch mode, except that a line with a non-blank in +the leftmost column only starts a new command if that line begins with +the name of a PSPP command. This correctly interprets most valid PSPP +syntax files regardless of the syntax mode for which they are +intended. + +The @option{--interactive} (or @option{-i}) or @option{--batch} (or +@option{-b}) options set the syntax mode for files listed on the PSPP +command line. @xref{Main Options}, for more details. @node Types of Commands @section Types of Commands @@ -360,9 +377,7 @@ spaces. Variables, whether numeric or string, can have designated @dfn{user-missing values}. Every user-missing value is an actual value for that variable. However, most of the time user-missing values are -treated in the same way as the system-missing value. String variables -that are wider than a certain width, usually 8 characters (depending on -computer architecture), cannot have user-missing values. +treated in the same way as the system-missing value. For more information on missing values, see the following sections: @ref{Variables}, @ref{MISSING VALUES}, @ref{Expressions}. See also the @@ -428,13 +443,9 @@ Numeric or string. @item Width (string variables only) String variables with a width of 8 characters or fewer are called @dfn{short string variables}. Short string variables -can be used in many procedures where @dfn{long string variables} (those +may be used in a few contexts where @dfn{long string variables} (those with widths greater than 8) are not allowed. -Certain systems may consider strings longer than 8 -characters to be short strings. Eight characters represents a minimum -figure for the maximum length of a short string. - @item Position Variables in the dictionary are arranged in a specific order. @cmd{DISPLAY} can be used to show this order: see @ref{DISPLAY}. @@ -476,6 +487,11 @@ they are displayed. Example: a width of 8, with 2 decimal places. @item Write format Similar to print format, but used by the @cmd{WRITE} command (@pxref{WRITE}). + +@cindex custom attributes +@item Custom attributes +User-defined associations between names and values. @xref{VARIABLE +ATTRIBUTE}. @end table @node System Variables @@ -1146,6 +1162,7 @@ trailing white space. The maximum width for time and date formats is 40 columns. Minimum input and output width for each of the time and date formats is shown below: + @float @multitable {DATETIME} {Min. Input Width} {Min. Output Width} {4-digit year} @headitem Format @tab Min. Input Width @tab Min. Output Width @tab Option @@ -1331,6 +1348,14 @@ file, or scratch file. Most often, a file handle is specified as the name of a file as a string, that is, enclosed within @samp{'} or @samp{"}. +A file name string that begins or ends with @samp{|} is treated as the +name of a command to pipe data to or from. You can use this feature +to read data over the network using a program such as @samp{curl} +(e.g.@: @code{GET '|curl -s -S http://example.com/mydata.sav'}), to +read compressed data from a file using a program such as @samp{zcat} +(e.g.@: @code{GET '|zcat mydata.sav.gz'}), and for many other +purposes. + PSPP also supports declaring named file handles with the @cmd{FILE HANDLE} command. This command associates an identifier of your choice (the file handle's name) with a file. Later, the file handle name can @@ -1459,4 +1484,3 @@ The first nonterminal defined in a set of productions is called the @dfn{start symbol}. The start symbol defines the entire syntax for that command. @end itemize -@setfilename ignored