X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=doc%2Flanguage.texi;h=a81d4e2e2a13c73db0c33ec557510bb7ec733b9c;hb=b401615e6db40bf74394839b96600afe3a868a95;hp=e23a558028805dbc9babcb360811d34c87258434;hpb=04d2c99833753252b724dd9d4f15cc3a80b6bec8;p=pspp-builds.git diff --git a/doc/language.texi b/doc/language.texi index e23a5580..a81d4e2e 100644 --- a/doc/language.texi +++ b/doc/language.texi @@ -3,10 +3,6 @@ @cindex language, PSPP @cindex PSPP, language -@note{PSPP is not even close to completion. -Only a few statistical procedures are implemented. PSPP -is a work in progress.} - This chapter discusses elements common to many PSPP commands. Later chapters will describe individual commands in detail. @@ -115,27 +111,29 @@ character used for quoting in the string, double it, e.g.@: significant inside strings. Strings can be concatenated using @samp{+}, so that @samp{"a" + 'b' + -'c'} is equivalent to @samp{'abc'}. Concatenation is useful for -splitting a single string across multiple source lines. The maximum -length of a string, after concatenation, is 255 characters. - -Strings may also be expressed as hexadecimal, octal, or binary -character values by prefixing the initial quote character by @samp{X}, -@samp{O}, or @samp{B} or their lowercase equivalents. Each pair, -triplet, or octet of characters, according to the radix, is -transformed into a single character with the given value. If there is -an incomplete group of characters, the missing final digits are -assumed to be @samp{0}. These forms of strings are nonportable -because numeric values are associated with different characters by -different operating systems. Therefore, their use should be confined -to syntax files that will not be widely distributed. - -@cindex characters, reserved -@cindex 0 -@cindex white space -The character with value 00 is reserved for -internal use by PSPP. Its use in strings causes an error and -replacement by a space character. +'c'} is equivalent to @samp{'abc'}. So that a long string may be +broken across lines, a line break may precede or follow, or both +precede and follow, the @samp{+}. (However, an entirely blank line +preceding or following the @samp{+} is interpreted as ending the +current command.) + +Strings may also be expressed as hexadecimal character values by +prefixing the initial quote character by @samp{x} or @samp{X}. +Regardless of the syntax file or active dataset's encoding, the +hexadecimal digits in the string are interpreted as Unicode characters +in UTF-8 encoding. + +Individual Unicode code points may also be expressed by specifying the +hexadecimal code point number in single or double quotes preceded by +@samp{u} or @samp{U}. For example, Unicode code point U+1D11E, the +musical G clef character, could be expressed as @code{U'1D11E'}. +Invalid Unicode code points (above U+10FFFF or in between U+D800 and +U+DFFF) are not allowed. + +When strings are concatenated with @samp{+}, each segment's prefix is +considered individually. For example, @code{'The G clef symbol is:' + +u"1d11e" + "."} inserts a G clef symbol in the middle of an otherwise +plain text string. @item Punctuators and Operators @cindex punctuators @@ -152,11 +150,6 @@ punctuator only as the last character on a line (except white space). When it is the last non-space character on a line, a period is not treated as part of another token, even if it would otherwise be part of, e.g.@:, an identifier or a floating-point number. - -Actually, the character that ends a command can be changed with -@cmd{SET}'s ENDCMD subcommand (@pxref{SET}), but we do not recommend -doing so. Throughout the remainder of this manual we will assume that -the default setting is in effect. @end table @node Commands @@ -184,38 +177,43 @@ by a forward slash (@samp{/}). There are multiple ways to mark the end of a command. The most common way is to end the last line of the command with a period (@samp{.}) as described in the previous section (@pxref{Tokens}). A blank line, or -one that consists only of white space or comments, also ends a command -by default, although you can use the NULLINE subcommand of @cmd{SET} -to disable this feature (@pxref{SET}). +one that consists only of white space or comments, also ends a command. @node Syntax Variants -@section Variants of syntax. +@section Syntax Variants @cindex Batch syntax @cindex Interactive syntax -There are two variants of command syntax, @i{viz}: @dfn{batch} mode and -@dfn{interactive} mode. -Batch mode is the default when reading commands from a file. -Interactive mode is the default when commands are typed at a prompt -by a user. -Certain commands, such as @cmd{INSERT} (@pxref{INSERT}), may explicitly -change the syntax mode. - -In batch mode, any line that contains a non-space character -in the leftmost column begins a new command. -Thus, each command consists of a flush-left line followed by any -number of lines indented from the left margin. -In this mode, a plus or minus sign (@samp{+}, @samp{@minus{}}) as the -first character in a line is ignored and causes that line to begin a -new command, which allows for visual indentation of a command without -that command being considered part of the previous command. -The period terminating the end of a command is optional but recommended. - -In interactive mode, each command must either be terminated with a period, -or an empty line must follow the command. -The use of (@samp{+} and @samp{@minus{}} as continuation characters is not -permitted. +There are three variants of command syntax, which vary only in how +they detect the end of one command and the start of the next. + +In @dfn{interactive mode}, which is the default for syntax typed at a +command prompt, a period as the last non-blank character on a line +ends a command. A blank line also ends a command. + +In @dfn{batch mode}, an end-of-line period or a blank line also ends a +command. Additionally, it treats any line that has a non-blank +character in the leftmost column as beginning a new command. Thus, in +batch mode the second and subsequent lines in a command must be +indented. + +Regardless of the syntax mode, a plus sign, minus sign, or period in +the leftmost column of a line is ignored and causes that line to begin +a new command. This is most useful in batch mode, in which the first +line of a new command could not otherwise be indented, but it is +accepted regardless of syntax mode. + +The default mode for reading commands from a file is @dfn{auto mode}. +It is the same as batch mode, except that a line with a non-blank in +the leftmost column only starts a new command if that line begins with +the name of a PSPP command. This correctly interprets most valid PSPP +syntax files regardless of the syntax mode for which they are +intended. + +The @option{--interactive} (or @option{-i}) or @option{--batch} (or +@option{-b}) options set the syntax mode for files listed on the PSPP +command line. @xref{Main Options}, for more details. @node Types of Commands @section Types of Commands @@ -257,7 +255,7 @@ of Commands}, for details. Analyze data, writing results of analyses to the listing file. Cause transformations specified earlier in the file to be performed. In a more general sense, a @dfn{procedure} is any command that causes the -active file (the data) to be read. +active dataset (the data) to be read. @end table @node Order of Commands @@ -298,7 +296,7 @@ Valid in any state. When executed in the initial or procedure state, causes a transition to the transformation state. @item -Clears the active file if executed in the procedure or transformation +Clears the active dataset if executed in the procedure or transformation state. @end itemize @@ -309,7 +307,7 @@ Invalid in INPUT PROGRAM and FILE TYPE states. @item Causes a transition to the INPUT PROGRAM state. @item -Clears the active file. +Clears the active dataset. @end itemize @item @cmd{FILE TYPE} @@ -319,7 +317,7 @@ Invalid in INPUT PROGRAM and FILE TYPE states. @item Causes a transition to the FILE TYPE state. @item -Clears the active file. +Clears the active dataset. @end itemize @item Other file definition commands @@ -329,7 +327,7 @@ Invalid in INPUT PROGRAM and FILE TYPE states. @item Cause a transition to the transformation state. @item -Clear the active file, except for @cmd{ADD FILES}, @cmd{MATCH FILES}, +Clear the active dataset, except for @cmd{ADD FILES}, @cmd{MATCH FILES}, and @cmd{UPDATE}. @end itemize @@ -532,7 +530,7 @@ System missing value, in format F1. @cindex @code{$TIME} @item $TIME -Number of seconds between midnight 14 Oct 1582 and the time the active file +Number of seconds between midnight 14 Oct 1582 and the time the active dataset was read, in format F20. @cindex @code{$WIDTH} @@ -578,7 +576,7 @@ input field as a number or a string. It might specify that the field contains an ordinary decimal number, a time or date, a number in binary or hexadecimal notation, or one of several other notations. Input formats are used by commands such as @cmd{DATA LIST} that read data or -syntax files into the PSPP active file. +syntax files into the PSPP active dataset. Every input format corresponds to a default @dfn{output format} that specifies the formatting used when the value is output later. It is @@ -1256,7 +1254,7 @@ format is A format with half the input width. @cindex scratch variables Most of the time, variables don't retain their values between cases. -Instead, either they're being read from a data file or the active file, +Instead, either they're being read from a data file or the active dataset, in which case they assume the value read, or, if created with @cmd{COMPUTE} or another transformation, they're initialized to the system-missing value