X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=doc%2Flanguage.texi;h=5381668c705ad38675c924d2147858a8b2790c82;hb=9ade26c8349b4434008c46cf09bc7473ec743972;hp=7ce4d88938d03786c77fbc3f50ec3792e1850f0a;hpb=85cef9b9391f1aaf4b27bd90a02f8ce5daf004ee;p=pspp-builds.git diff --git a/doc/language.texi b/doc/language.texi index 7ce4d889..5381668c 100644 --- a/doc/language.texi +++ b/doc/language.texi @@ -1,20 +1,15 @@ -@node Language, Expressions, Invocation, Top +@node Language @chapter The PSPP language @cindex language, PSPP @cindex PSPP, language -@quotation -@strong{Please note:} PSPP is not even close to completion. -Only a few statistical procedures are implemented. PSPP -is a work in progress. -@end quotation - This chapter discusses elements common to many PSPP commands. Later chapters will describe individual commands in detail. @menu * Tokens:: Characters combine to form tokens. * Commands:: Tokens combine to form commands. +* Syntax Variants:: Batch vs. Interactive mode * Types of Commands:: Commands come in several flavors. * Order of Commands:: Commands combine to form syntax files. * Missing Observations:: Handling missing observations. @@ -24,7 +19,8 @@ Later chapters will describe individual commands in detail. * BNF:: How command syntax is described. @end menu -@node Tokens, Commands, Language, Language + +@node Tokens @section Tokens @cindex language, lexical analysis @cindex language, tokens @@ -115,27 +111,29 @@ character used for quoting in the string, double it, e.g.@: significant inside strings. Strings can be concatenated using @samp{+}, so that @samp{"a" + 'b' + -'c'} is equivalent to @samp{'abc'}. Concatenation is useful for -splitting a single string across multiple source lines. The maximum -length of a string, after concatenation, is 255 characters. - -Strings may also be expressed as hexadecimal, octal, or binary -character values by prefixing the initial quote character by @samp{X}, -@samp{O}, or @samp{B} or their lowercase equivalents. Each pair, -triplet, or octet of characters, according to the radix, is -transformed into a single character with the given value. If there is -an incomplete group of characters, the missing final digits are -assumed to be @samp{0}. These forms of strings are nonportable -because numeric values are associated with different characters by -different operating systems. Therefore, their use should be confined -to syntax files that will not be widely distributed. - -@cindex characters, reserved -@cindex 0 -@cindex white space -The character with value 00 is reserved for -internal use by PSPP. Its use in strings causes an error and -replacement by a space character. +'c'} is equivalent to @samp{'abc'}. So that a long string may be +broken across lines, a line break may precede or follow, or both +precede and follow, the @samp{+}. (However, an entirely blank line +preceding or following the @samp{+} is interpreted as ending the +current command.) + +Strings may also be expressed as hexadecimal character values by +prefixing the initial quote character by @samp{x} or @samp{X}. +Regardless of the syntax file or active dataset's encoding, the +hexadecimal digits in the string are interpreted as Unicode characters +in UTF-8 encoding. + +Individual Unicode code points may also be expressed by specifying the +hexadecimal code point number in single or double quotes preceded by +@samp{u} or @samp{U}. For example, Unicode code point U+1D11E, the +musical G clef character, could be expressed as @code{U'1D11E'}. +Invalid Unicode code points (above U+10FFFF or in between U+D800 and +U+DFFF) are not allowed. + +When strings are concatenated with @samp{+}, each segment's prefix is +considered individually. For example, @code{'The G clef symbol is:' + +u"1d11e" + "."} inserts a G clef symbol in the middle of an otherwise +plain text string. @item Punctuators and Operators @cindex punctuators @@ -152,14 +150,9 @@ punctuator only as the last character on a line (except white space). When it is the last non-space character on a line, a period is not treated as part of another token, even if it would otherwise be part of, e.g.@:, an identifier or a floating-point number. - -Actually, the character that ends a command can be changed with -@cmd{SET}'s ENDCMD subcommand (@pxref{SET}), but we do not recommend -doing so. Throughout the remainder of this manual we will assume that -the default setting is in effect. @end table -@node Commands, Types of Commands, Tokens, Language +@node Commands @section Forming commands of tokens @cindex PSPP, command structure @@ -184,21 +177,45 @@ by a forward slash (@samp{/}). There are multiple ways to mark the end of a command. The most common way is to end the last line of the command with a period (@samp{.}) as described in the previous section (@pxref{Tokens}). A blank line, or -one that consists only of white space or comments, also ends a command -by default, although you can use the NULLINE subcommand of @cmd{SET} -to disable this feature (@pxref{SET}). - -In batch mode only, that is, when reading commands from a file instead -of an interactive user, any line that contains a non-space character -in the leftmost column begins a new command. Thus, each command -consists of a flush-left line followed by any number of lines indented -from the left margin. In this mode, a plus or minus sign -(@samp{+}, @samp{@minus{}}) as the first character -in a line is ignored and causes that line to begin a new command, -which allows for visual indentation of a command without that command -being considered part of the previous command. - -@node Types of Commands, Order of Commands, Commands, Language +one that consists only of white space or comments, also ends a command. + +@node Syntax Variants +@section Syntax Variants + +@cindex Batch syntax +@cindex Interactive syntax + +There are three variants of command syntax, which vary only in how +they detect the end of one command and the start of the next. + +In @dfn{interactive mode}, which is the default for syntax typed at a +command prompt, a period as the last non-blank character on a line +ends a command. A blank line also ends a command. + +In @dfn{batch mode}, an end-of-line period or a blank line also ends a +command. Additionally, it treats any line that has a non-blank +character in the leftmost column as beginning a new command. Thus, in +batch mode the second and subsequent lines in a command must be +indented. + +Regardless of the syntax mode, a plus sign, minus sign, or period in +the leftmost column of a line is ignored and causes that line to begin +a new command. This is most useful in batch mode, in which the first +line of a new command could not otherwise be indented, but it is +accepted regardless of syntax mode. + +The default mode for reading commands from a file is @dfn{auto mode}. +It is the same as batch mode, except that a line with a non-blank in +the leftmost column only starts a new command if that line begins with +the name of a PSPP command. This correctly interprets most valid PSPP +syntax files regardless of the syntax mode for which they are +intended. + +The @option{--interactive} (or @option{-i}) or @option{--batch} (or +@option{-b}) options set the syntax mode for files listed on the PSPP +command line. @xref{Main Options}, for more details. + +@node Types of Commands @section Types of Commands Commands in PSPP are divided roughly into six categories: @@ -241,7 +258,7 @@ more general sense, a @dfn{procedure} is any command that causes the active file (the data) to be read. @end table -@node Order of Commands, Missing Observations, Types of Commands, Language +@node Order of Commands @section Order of Commands @cindex commands, ordering @cindex order of commands @@ -339,7 +356,7 @@ Cause a transition to the procedure state. @end itemize @end table -@node Missing Observations, Variables, Order of Commands, Language +@node Missing Observations @section Handling missing observations @cindex missing values @cindex values, missing @@ -360,16 +377,14 @@ spaces. Variables, whether numeric or string, can have designated @dfn{user-missing values}. Every user-missing value is an actual value for that variable. However, most of the time user-missing values are -treated in the same way as the system-missing value. String variables -that are wider than a certain width, usually 8 characters (depending on -computer architecture), cannot have user-missing values. +treated in the same way as the system-missing value. For more information on missing values, see the following sections: @ref{Variables}, @ref{MISSING VALUES}, @ref{Expressions}. See also the documentation on individual procedures for information on how they handle missing values. -@node Variables, Files, Missing Observations, Language +@node Variables @section Variables @cindex variables @cindex dictionary @@ -387,7 +402,7 @@ Some details of variables are described in the sections below. * Scratch Variables:: Variables deleted by procedures. @end menu -@node Attributes, System Variables, Variables, Variables +@node Attributes @subsection Attributes of Variables @cindex variables, attributes of @cindex attributes of variables @@ -428,13 +443,9 @@ Numeric or string. @item Width (string variables only) String variables with a width of 8 characters or fewer are called @dfn{short string variables}. Short string variables -can be used in many procedures where @dfn{long string variables} (those +may be used in a few contexts where @dfn{long string variables} (those with widths greater than 8) are not allowed. -Certain systems may consider strings longer than 8 -characters to be short strings. Eight characters represents a minimum -figure for the maximum length of a short string. - @item Position Variables in the dictionary are arranged in a specific order. @cmd{DISPLAY} can be used to show this order: see @ref{DISPLAY}. @@ -476,9 +487,14 @@ they are displayed. Example: a width of 8, with 2 decimal places. @item Write format Similar to print format, but used by the @cmd{WRITE} command (@pxref{WRITE}). + +@cindex custom attributes +@item Custom attributes +User-defined associations between names and values. @xref{VARIABLE +ATTRIBUTE}. @end table -@node System Variables, Sets of Variables, Attributes, Variables +@node System Variables @subsection Variables Automatically Defined by PSPP @cindex system variables @cindex variables, system @@ -522,7 +538,7 @@ was read, in format F20. Page width, in characters, in format F3. @end table -@node Sets of Variables, Input and Output Formats, System Variables, Variables +@node Sets of Variables @subsection Lists of variable names @cindex @code{TO} convention @cindex convention, @code{TO} @@ -551,7 +567,7 @@ After a set of variables has been defined with @cmd{DATA LIST} or another command with this method, the same set can be referenced on later commands using the same syntax. -@node Input and Output Formats, Scratch Variables, Sets of Variables, Variables +@node Input and Output Formats @subsection Input and Output Formats @cindex formats @@ -1146,6 +1162,7 @@ trailing white space. The maximum width for time and date formats is 40 columns. Minimum input and output width for each of the time and date formats is shown below: + @float @multitable {DATETIME} {Min. Input Width} {Min. Output Width} {4-digit year} @headitem Format @tab Min. Input Width @tab Min. Output Width @tab Option @@ -1232,7 +1249,7 @@ as hex digit pairs. On output, hex digits are output in uppercase; on input, uppercase and lowercase are both accepted. The default output format is A format with half the input width. -@node Scratch Variables, , Input and Output Formats, Variables +@node Scratch Variables @subsection Scratch Variables @cindex scratch variables @@ -1331,6 +1348,14 @@ file, or scratch file. Most often, a file handle is specified as the name of a file as a string, that is, enclosed within @samp{'} or @samp{"}. +A file name string that begins or ends with @samp{|} is treated as the +name of a command to pipe data to or from. You can use this feature +to read data over the network using a program such as @samp{curl} +(e.g.@: @code{GET '|curl -s -S http://example.com/mydata.sav'}), to +read compressed data from a file using a program such as @samp{zcat} +(e.g.@: @code{GET '|zcat mydata.sav.gz'}), and for many other +purposes. + PSPP also supports declaring named file handles with the @cmd{FILE HANDLE} command. This command associates an identifier of your choice (the file handle's name) with a file. Later, the file handle name can @@ -1459,4 +1484,3 @@ The first nonterminal defined in a set of productions is called the @dfn{start symbol}. The start symbol defines the entire syntax for that command. @end itemize -@setfilename ignored