This is pspp.info, produced by makeinfo version 4.0 from pspp.texi. START-INFO-DIR-ENTRY * PSPP: (pspp). Statistical analysis package. END-INFO-DIR-ENTRY PSPP, for statistical analysis of sampled data, by Ben Pfaff. This file documents PSPP, a statistical package for analysis of sampled data that uses a command language compatible with SPSS. Copyright (C) 1996-9, 2000 Free Software Foundation, Inc. This version of the PSPP documentation is consistent with version 2 of "texinfo.tex". Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above condition for modified versions, except that this permission notice may be stated in a translation approved by the Free Software Foundation.  File: pspp.info, Node: Input/Output Formats, Next: Scratch Variables, Prev: Sets of Variables, Up: Variables Input and Output Formats ------------------------ Data that PSPP inputs and outputs must have one of a number of formats. These formats are described, in general, by a format specification of the form `NAMEw.d', where NAME is the format name and W is a field width. D is the optional desired number of decimal places, if appropriate. If D is not included then it is assumed to be 0. Some formats do not allow D to be specified. When an input format is specified on DATA LIST or another command, then it is converted to an output format for the purposes of PRINT and other data output commands. For most purposes, input and output formats are the same; the salient differences are described below. Below are listed the input and output formats supported by PSPP. If an input format is mapped to a different output format by default, then that mapping is indicated with =>. Each format has the listed bounds on input width (iw) and output width (ow). The standard numeric input and output formats are given in the following table: Fw.d: 1 <= iw,ow <= 40 Standard decimal format with D decimal places. If the number is too large to fit within the field width, it is expressed in scientific notation (`1.2+34') if w >= 6, with always at least two digits in the exponent. When used as an input format, scientific notation is allowed but an E or an F must be used to introduce the exponent. The default output format is the same as the input format, except if D > 1. In that case the output W is always made to be at least 2 + D. Ew.d: 1 <= iw <= 40; 6 <= ow <= 40 For input this is equivalent to F format except that no E or F is require to introduce the exponent. For output, produces scientific notation in the form `1.2+34'. There are always at least two digits given in the exponent. The default output W is the largest of the input W, the input D + 7, and 10. The default output D is the input D, but at least 3. COMMAw.d: 1 <= iw,ow <= 40 Equivalent to F format, except that groups of three digits are comma-separated on output. If the number is too large to express in the field width, then first commas are eliminated, then if there is still not enough space the number is expressed in scientific notation given that w >= 6. Commas are allowed and ignored when this is used as an input format. DOTw.d: 1 <= iw,ow <= 40 Equivalent to COMMA format except that the roles of comma and decimal point are interchanged. However: If SET /DECIMAL=DOT is in effect, then COMMA uses `,' for a decimal point and DOT uses `.' for a decimal point. DOLLARw.d: 1 <= iw <= 40; 2 <= ow <= 40 Equivalent to COMMA format, except that the number is prefixed by a dollar sign (`$') if there is room. On input the value is allowed to be prefixed by a dollar sign, which is ignored. The default output W is the input W, but at least 2. PCTw.d: 2 <= iw,ow <= 40 Equivalent to F format, except that the number is suffixed by a percent sign (`%') if there is room. On input the value is allowed to be suffixed by a percent sign, which is ignored. The default output W is the input W, but at least 2. Nw.d: 1 <= iw,ow <= 40 Only digits are allowed within the field width. The decimal point is assumed to be D digits from the right margin. The default output format is F with the same W and D, except if D > 1. In that case the output W is always made to be at least 2 + D. Zw.d => F: 1 <= iw,ow <= 40 Zoned decimal input. If you need to use this then you know how. IBw.d => F: 1 <= iw,ow <= 8 Integer binary format. The field is interpreted as a fixed-point positive or negative binary number in two's-complement notation. The location of the decimal point is implied. Endianness is the same as the host machine. The default output format is F8.2 if D is 0. Otherwise it is F, with output W as 9 + input D and output D as input D. PIB => F: 1 <= iw,ow <= 8 Positive integer binary format. The field is interpreted as a fixed-point positive binary number. The location of the decimal point is implied. Endianness is teh same as the host machine. The default output format follows the rules for IB format. Pw.d => F: 1 <= iw,ow <= 16 Binary coded decimal format. Each byte from left to right, except the rightmost, represents two digits. The upper nibble of each byte is more significant. The upper nibble of the final byte is the least significant digit. The lower nibble of the final byte is the sign; a value of D represents a negative sign and all other values are considered positive. The decimal point is implied. The default output format follows the rules for IB format. PKw.d => F: 1 <= iw,ow <= 16 Positive binary code decimal format. Same as P but the last byte is the same as the others. The default output format follows the rules for IB format. RBw => F: 2 <= iw,ow <= 8 Binary C architecture-dependent "double" format. For a standard IEEE754 implementation W should be 8. The default output format follows the rules for IB format. PIBHEXw.d => F: 2 <= iw,ow <= 16 PIB format encoded as textual hex digit pairs. W must be even. The input width is mapped to a default output width as follows: 2=>4, 4=>6, 6=>9, 8=>11, 10=>14, 12=>16, 14=>18, 16=>21. No allowances are made for decimal places. RBHEXw => F: 4 <= iw,ow <= 16 RB format encoded as textual hex digits pairs. W must be even. The default output format is F8.2. CCAw.d: 1 <= ow <= 40 CCBw.d: 1 <= ow <= 40 CCCw.d: 1 <= ow <= 40 CCDw.d: 1 <= ow <= 40 CCEw.d: 1 <= ow <= 40 User-defined custom currency formats. May not be used as an input format. *Note SET::, for more details. The date and time numeric input and output formats accept a number of possible formats. Before describing the formats themselves, some definitions of the elements that make up their formats will be helpful: "leader" All formats accept an optional whitespace leader. "day" An integer between 1 and 31 representing the day of month. "day-count" An integer representing a number of days. "date-delimiter" One or more characters of whitespace or the following characters: `- / . ,' "month" A month name in one of the following forms: * An integer between 1 and 12. * Roman numerals representing an integer between 1 and 12. * At least the first three characters of an English month name (January, February, ...). "year" An integer year number between 1582 and 19999, or between 1 and 199. Years between 1 and 199 will have 1900 added. "julian" A single number with a year number in the first 2, 3, or 4 digits (as above) and the day number within the year in the last 3 digits. "quarter" An integer between 1 and 4 representing a quarter. "q-delimiter" The letter `Q' or `q'. "week" An integer between 1 and 53 representing a week within a year. "wk-delimiter" The letters `wk' in any case. "time-delimiter" At least one characters of whitespace or `:' or `.'. "hour" An integer greater than 0 representing an hour. "minute" An integer between 0 and 59 representing a minute within an hour. "opt-second" Optionally, a time-delimiter followed by a real number representing a number of seconds. "hour24" An integer between 0 and 23 representing an hour within a day. "weekday" At least the first two characters of an English day word. "spaces" Any amount or no amount of whitespace. "sign" An optional positive or negative sign. "trailer" All formats accept an optional whitespace trailer. The date input formats are strung together from the above pieces. On output, the date formats are always printed in a single canonical manner, based on field width. The date input and output formats are described below: DATEw: 9 <= iw,ow <= 40 Date format. Input format: leader + day + date-delimiter + month + date-delimiter + year + trailer. Output format: DD-MMM-YY for W < 11, DD-MMM-YYYY otherwise. EDATEw: 8 <= iw,ow <= 40 European date format. Input format same as DATE. Output format: DD.MM.YY for W < 10, DD.MM.YYYY otherwise. SDATEw: 8 <= iw,ow <= 40 Standard date format. Input format: leader + year + date-delimiter + month + date-delimiter + day + trailer. Output format: YY/MM/DD for W < 10, YYYY/MM/DD otherwise. ADATEw: 8 <= iw,ow <= 40 American date format. Input format: leader + month + date-delimiter + day + date-delimiter + year + trailer. Output format: MM/DD/YY for W < 10, MM/DD/YYYY otherwise. JDATEw: 5 <= iw,ow <= 40 Julian date format. Input format: leader + julian + trailer. Output format: YYDDD for W < 7, YYYYDDD otherwise. QYRw: 4 <= iw <= 40, 6 <= ow <= 40 Quarter/year format. Input format: leader + quarter + q-delimiter + year + trailer. Output format: `Q Q YY', where the first `Q' is one of the digits 1, 2, 3, 4, if W < 8, `Q Q YYYY' otherwise. MOYRw: 6 <= iw,ow <= 40 Month/year format. Input format: leader + month + date-delimiter + year + trailer. Output format: `MMM YY' for W < 8, `MMM YYYY' otherwise. WKYRw: 6 <= iw <= 40, 8 <= ow <= 40 Week/year format. Input format: leader + week + wk-delimiter + year + trailer. Output format: `WW WK YY' for W < 10, `WW WK YYYY' otherwise. DATETIMEw.d: 17 <= iw,ow <= 40 Date and time format. Input format: leader + day + date-delimiter + month + date-delimiter + yaer + time-delimiter + hour24 + time-delimiter + minute + opt-second. Output format: `DD-MMM-YYYY HH:MM'. If W > 19 then seconds `:SS' is added. If W > 22 and D > 0 then fractional seconds `.SS' are added. TIMEw.d: 5 <= iw,ow <= 40 Time format. Input format: leader + sign + spaces + hour + time-delimiter + minute + opt-second. Output format: `HH:MM'. Seconds and fractional seconds are available with W of at least 8 and 10, respectively. DTIMEw.d: 1 <= iw <= 40, 8 <= ow <= 40 Time format with day count. Input format: leader + sign + spaces + day-count + time-delimiter + hour + time-delimiter + minute + opt-second. Output format: `DD HH:MM'. Seconds and fractional seconds are available with W of at least 8 and 10, respectively. WKDAYw: 2 <= iw,ow <= 40 A weekday as a number between 1 and 7, where 1 is Sunday. Input format: leader + weekday + trailer. Output format: as many characters, in all capital letters, of the English name of the weekday as will fit in the field width. MONTHw: 3 <= iw,ow <= 40 A month as a number between 1 and 12, where 1 is January. Input format: leader + month + trailer. Output format: as many character, in all capital letters, of the English name of the month as will fit in the field width. There are only two formats that may be used with string variables: Aw: 1 <= iw <= 255, 1 <= ow <= 254 The entire field is treated as a string value. AHEXw => A: 2 <= iw <= 254; 2 <= ow <= 510 The field is composed of characters in a string encoded as textual hex digit pairs. The default output W is half the input W.  File: pspp.info, Node: Scratch Variables, Prev: Input/Output Formats, Up: Variables Scratch Variables ----------------- Most of the time, variables don't retain their values between cases. Instead, either they're being read from a data file or the active file, in which case they assume the value read, or, if created with COMPUTE or another transformation, they're initialized to the system-missing value or to blanks, depending on type. However, sometimes it's useful to have a variable that keeps its value between cases. You can do this with LEAVE (*note LEAVE::), or you can use a "scratch variable". Scratch variables are variables whose names begin with an octothorpe (`#'). Scratch variables have the same properties as variables left with LEAVE: they retain their values between cases, and for the first case they are initialized to 0 or blanks. They have the additional property that they are deleted before the execution of any procedure. For this reason, scratch variables can't be used for analysis. To obtain the same effect, use COMPUTE (*note COMPUTE::) to copy the scratch variable's value into an ordinary variable, then analysis that variable.  File: pspp.info, Node: Files, Next: BNF, Prev: Variables, Up: Language Files Used by PSPP ================== PSPP makes use of many files each time it runs. Some of these it reads, some it writes, some it creates. Here is a table listing the most important of these files: *command file* *syntax file* These names (synonyms) refer to the file that contains instructions to PSPP that tell it what to do. The syntax file's name is specified on the PSPP command line. Syntax files can also be pulled in with the `INCLUDE' command. *data file* Data files contain raw data in ASCII format suitable for being read in by the `DATA LIST' command. Data can be embedded in the syntax file with `BEGIN DATA' and `END DATA' commands: this makes the syntax file a data file too. *listing file* One or more output files are created by PSPP each time it is run. The output files receive the tables and charts produced by statistical procedures. The output files may be in any number of formats, depending on how PSPP is configured. *active file* The active file is the "file" on which all PSPP procedures are performed. The active file contains variable definitions and cases. The active file is not necessarily a disk file: it is stored in memory if there is room.  File: pspp.info, Node: BNF, Prev: Files, Up: Language Backus-Naur Form ================ The syntax of some parts of the PSPP language is presented in this manual using the formalism known as "Backus-Naur Form", or BNF. The following table describes BNF: * Words in all-uppercase are PSPP keyword tokens. In BNF, these are often called "terminals". There are some special terminals, which are actually written in lowercase for clarity: `number' A real number. `integer' An integer number. `string' A string. `var-name' A single variable name. `=', `/', `+', `-', etc. Operators and punctuators. `.' The terminal dot. This is not necessarily an actual dot in the syntax file: *Note Commands::, for more details. * Other words in all lowercase refer to BNF definitions, called "productions". These productions are also known as "nonterminals". Some nonterminals are very common, so they are defined here in English for clarity: `var-list' A list of one or more variable names or the keyword `ALL'. `expression' An expression. *Note Expressions::, for details. * `::=' means "is defined as". The left side of `::=' gives the name of the nonterminal being defined. The right side of `::=' gives the definition of that nonterminal. If the right side is empty, then one possible expansion of that nonterminal is nothing. A BNF definition is called a "production". * So, the key difference between a terminal and a nonterminal is that a terminal cannot be broken into smaller parts--in fact, every terminal is a single token (*note Tokens::). On the other hand, nonterminals are composed of a (possibly empty) sequence of terminals and nonterminals. Thus, terminals indicate the deepest level of syntax description. (In parsing theory, terminals are the leaves of the parse tree; nonterminals form the branches.) * The first nonterminal defined in a set of productions is called the "start symbol". The start symbol defines the entire syntax for that command.  File: pspp.info, Node: Expressions, Next: Data Input and Output, Prev: Language, Up: Top Mathematical Expressions ************************ Some PSPP commands use expressions, which share a common syntax among all PSPP commands. Expressions are made up of "operands", which can be numbers, strings, or variable names, separated by "operators". There are five types of operators: grouping, arithmetic, logical, relational, and functions. Every operator takes one or more "arguments" as input and produces or "returns" exactly one result as output. Both strings and numeric values can be used as arguments and are produced as results, but each operator accepts only specific combinations of numeric and string values as arguments. With few exceptions, operator arguments may be full-fledged expressions in themselves. * Menu: * Booleans:: Boolean values. * Missing Values in Expressions:: Using missing values in expressions. * Grouping Operators:: ( ) * Arithmetic Operators:: + - * / ** * Logical Operators:: AND NOT OR * Relational Operators:: EQ GE GT LE LT NE * Functions:: More-sophisticated operators. * Order of Operations:: Operator precedence.  File: pspp.info, Node: Booleans, Next: Missing Values in Expressions, Prev: Expressions, Up: Expressions Boolean values ============== There is a third type for arguments and results, the "Boolean" type, which is used to represent true/false conditions. Booleans have only three possible values: 0 (false), 1 (true), and system-missing. System-missing is neither true or false. * A numeric expression that has value 0, 1, or system-missing may be used in place of a Boolean. Thus, the expression `0 AND 1' is valid (although it is always true). * A numeric expression with any other value will cause an error if it is used as a Boolean. So, `2 OR 3' is invalid. * A Boolean expression may not be used in place of a numeric expression. Thus, `(1>2) + (3<4)' is invalid. * Strings and Booleans are not compatible, and neither may be used in place of the other.  File: pspp.info, Node: Missing Values in Expressions, Next: Grouping Operators, Prev: Booleans, Up: Expressions Missing Values in Expressions ============================= String missing values are not treated specially in expressions. Most numeric operators return system-missing when given system-missing arguments. Exceptions are listed under particular operator descriptions. User-missing values for numeric variables are always transformed into the system-missing value, except inside the arguments to the `VALUE', `SYSMIS', and `MISSING' functions. The missing-value functions can be used to precisely control how missing values are treated in expressions. *Note Missing Value Functions::, for more details.  File: pspp.info, Node: Grouping Operators, Next: Arithmetic Operators, Prev: Missing Values in Expressions, Up: Expressions Grouping Operators ================== Parentheses (`()') are the grouping operators. Surround an expression with parentheses to force early evaluation. Parentheses also surround the arguments to functions, but in that situation they act as punctuators, not as operators.  File: pspp.info, Node: Arithmetic Operators, Next: Logical Operators, Prev: Grouping Operators, Up: Expressions Arithmetic Operators ==================== The arithmetic operators take numeric arguments and produce numeric results. `A + B' Adds A and B, returning the sum. `A - B' Subtracts B from A, returning the difference. `A * B' Multiplies A and B, returning the product. `A / B' Divides A by B, returning the quotient. If B is zero, the result is system-missing. `A ** B' Returns the result of raising A to the power B. If A is negative and B is not an integer, the result is system-missing. The result of `0**0' is system-missing as well. `- A' Reverses the sign of A.  File: pspp.info, Node: Logical Operators, Next: Relational Operators, Prev: Arithmetic Operators, Up: Expressions Logical Operators ================= The logical operators take logical arguments and produce logical results, meaning "true or false". PSPP logical operators are not true Boolean operators because they may also result in a system-missing value. `A AND B' `A & B' True if both A and B are true. However, if one argument is false and the other is missing, the result is false, not missing. If both arguments are missing, the result is missing. `A OR B' `A | B' True if at least one of A and B is true. If one argument is true and the other is missing, the result is true, not missing. If both arguments are missing, the result is missing. `NOT A' `~ A' True if A is false.  File: pspp.info, Node: Relational Operators, Next: Functions, Prev: Logical Operators, Up: Expressions Relational Operators ==================== The relational operators take numeric or string arguments and produce Boolean results. Note that, with numeric arguments, PSPP does not make exact relational tests. Instead, two numbers are considered to be equal even if they differ by a small amount. This amount, "epsilon", is dependent on the PSPP configuration and determined at compile time. (The default value is 0.000000001, or `10**(-9)'.) Use of epsilon allows for round-off errors. Use of epsilon is also idiotic, but the author is not a numeric analyst. Strings cannot be compared to numbers. When strings of different lengths are compared, the shorter string is right-padded with spaces to match the length of the longer string. The results of string comparisons, other than tests for equality or inequality, are dependent on the character set in use. String comparisons are case-sensitive. `A EQ B' `A = B' True if A is equal to B. `A LE B' `A <= B' True if A is less than or equal to B. `A LT B' `A < B' True if A is less than B. `A GE B' `A >= B' True if A is greater than or equal to B. `A GT B' `A > B' True if A is greater than B. `A NE B' `A ~= B' `A <> B' True is A is not equal to B.  File: pspp.info, Node: Functions, Next: Order of Operations, Prev: Relational Operators, Up: Expressions Functions ========= PSPP functions provide mathematical abilities above and beyond those possible using simple operators. Functions have a common syntax: each is composed of a function name followed by a left parenthesis, one or more arguments, and a right parenthesis. Function names are *not* reserved; their names are specially treated only when followed by a left parenthesis: `EXP(10)' refers to the constant value `e' raised to the 10th power, but `EXP' by itself refers to the value of variable EXP. The sections below describe each function in detail. * Menu: * Advanced Mathematics:: EXP LG10 LN SQRT * Miscellaneous Mathematics:: ABS MOD MOD10 RND TRUNC * Trigonometry:: ACOS ARCOS ARSIN ARTAN ASIN ATAN COS SIN TAN * Missing Value Functions:: MISSING NMISS NVALID SYSMIS VALUE * Pseudo-Random Numbers:: NORMAL UNIFORM * Set Membership:: ANY RANGE * Statistical Functions:: CFVAR MAX MEAN MIN SD SUM VARIANCE * String Functions:: CONCAT INDEX LENGTH LOWER LPAD LTRIM NUMBER RINDEX RPAD RTRIM STRING SUBSTR UPCASE * Time & Date:: CTIME.xxx DATE.xxx TIME.xxx XDATE.xxx * Miscellaneous Functions:: LAG YRMODA * Functions Not Implemented:: CDF.xxx CDFNORM IDF.xxx NCDF.xxx PROBIT RV.xxx  File: pspp.info, Node: Advanced Mathematics, Next: Miscellaneous Mathematics, Prev: Functions, Up: Functions Advanced Mathematical Functions ------------------------------- Advanced mathematical functions take numeric arguments and produce numeric results. - Function: EXP (EXPONENT) Returns e (approximately 2.71828) raised to power EXPONENT. - Function: LG10 (NUMBER) Takes the base-10 logarithm of NUMBER. If NUMBER is not positive, the result is system-missing. - Function: LN (NUMBER) Takes the base-`e' logarithm of NUMBER. If NUMBER is not positive, the result is system-missing. - Function: SQRT (NUMBER) Takes the square root of NUMBER. If NUMBER is negative, the result is system-missing.  File: pspp.info, Node: Miscellaneous Mathematics, Next: Trigonometry, Prev: Advanced Mathematics, Up: Functions Miscellaneous Mathematical Functions ------------------------------------ Miscellaneous mathematical functions take numeric arguments and produce numeric results. - Function: ABS (NUMBER) Results in the absolute value of NUMBER. - Function: MOD (NUMERATOR, DENOMINATOR) Returns the remainder (modulus) of NUMERATOR divided by DENOMINATOR. If DENOMINATOR is 0, the result is system-missing. However, if NUMERATOR is 0 and DENOMINATOR is system-missing, the result is 0. - Function: MOD10 (NUMBER) Returns the remainder when NUMBER is divided by 10. If NUMBER is negative, MOD10(NUMBER) is negative or zero. - Function: RND (NUMBER) Takes the absolute value of NUMBER and rounds it to an integer. Then, if NUMBER was negative originally, negates the result. - Function: TRUNC (NUMBER) Discards the fractional part of NUMBER; that is, rounds NUMBER towards zero.  File: pspp.info, Node: Trigonometry, Next: Missing Value Functions, Prev: Miscellaneous Mathematics, Up: Functions Trigonometric Functions ----------------------- Trigonometric functions take numeric arguments and produce numeric results. - Function: ACOS (NUMBER) - Function: ARCOS (NUMBER) Takes the arccosine, in radians, of NUMBER. Results in system-missing if NUMBER is not between -1 and 1. Portability: none. - Function: ARSIN (NUMBER) Takes the arcsine, in radians, of NUMBER. Results in system-missing if NUMBER is not between -1 and 1 inclusive. - Function: ARTAN (NUMBER) Takes the arctangent, in radians, of NUMBER. - Function: ASIN (NUMBER) Takes the arcsine, in radians, of NUMBER. Results in system-missing if NUMBER is not between -1 and 1 inclusive. Portability: none. - Function: ATAN (NUMBER) Takes the arctangent, in radians, of NUMBER. *Please note:* Use of the AR* group of inverse trigonometric functions is recommended over the A* group because they are more portable. - Function: COS (RADIANS) Takes the cosine of RADIANS. - Function: SIN (ANGLE) Takes the sine of RADIANS. - Function: TAN (ANGLE) Takes the tangent of RADIANS. Results in system-missing at values of ANGLE that are too close to odd multiples of pi/2. Portability: none.  File: pspp.info, Node: Missing Value Functions, Next: Pseudo-Random Numbers, Prev: Trigonometry, Up: Functions Missing-Value Functions ----------------------- Missing-value functions take various types as arguments, returning various types of results. - Function: MISSING (VARIABLE OR EXPRESSION) NUM may be a single variable name or an expression. If it is a variable name, results in 1 if the variable has a user-missing or system-missing value for the current case, 0 otherwise. If it is an expression, results in 1 if the expression has the system-missing value, 0 otherwise. *Please note:* If the argument is a string expression other than a variable name, MISSING is guaranteed to return 0, because strings do not have a system-missing value. Also, when using a numeric expression argument, remember that user-missing values are converted to the system-missing value in most contexts. Thus, the expressions `MISSING(VAR1 OP VAR2)' and `MISSING(VAR1) OR MISSING(VAR2)' are often equivalent, depending on the specific operator OP used. - Function: NMISS (EXPR [, EXPR]...) Each argument must be a numeric expression. Returns the number of user- or system-missing values in the list. As a special extension, the syntax `VAR1 TO VAR2' may be used to refer to a range of variables; see *Note Sets of Variables::, for more details. - Function: NVALID (EXPR [, EXPR]...) Each argument must be a numeric expression. Returns the number of values in the list that are not user- or system-missing. As a special extension, the syntax `VAR1 TO VAR2' may be used to refer to a range of variables; see *Note Sets of Variables::, for more details. - Function: SYSMIS (VARIABLE OR EXPRESSION) When given the name of a numeric variable, returns 1 if the value of that variable is system-missing. Otherwise, if the value is not missing or if it is user-missing, returns 0. If given the name of a string variable, always returns 1. If given an expression other than a single variable name, results in 1 if the value is system- or user-missing, 0 otherwise. - Function: VALUE (VARIABLE) Prevents the user-missing values of VARIABLE from being transformed into system-missing values: If VARIABLE is not system- or user-missing, results in the value of VARIABLE. If VARIABLE is user-missing, results in the value of VARIABLE anyway. If VARIABLE is system-missing, results in system-missing.  File: pspp.info, Node: Pseudo-Random Numbers, Next: Set Membership, Prev: Missing Value Functions, Up: Functions Pseudo-Random Number Generation Functions ----------------------------------------- Pseudo-random number generation functions take numeric arguments and produce numeric results. The system's C library random generator is used as a basis for generating random numbers, since random number generation is a system-dependent task. However, Knuth's Algorithm B is used to shuffle the resultant values, which is enough to make even a stream of consecutive integers random enough for most applications. (If you're worried about the quality of the random number generator, well, you're using a statistical processing package--analyze it!) - Function: NORMAL (NUMBER) Results in a random number. Results from `NORMAL' are normally distributed with a mean of 0 and a standard deviation of NUMBER. - Function: UNIFORM (NUMBER) Results in a random number between 0 and NUMBER. Results from `UNIFORM' are evenly distributed across its entire range. There may be a maximum on the largest random number ever generated--this is often 2**31-1 (2,147,483,647), but it may be orders of magnitude higher or lower.  File: pspp.info, Node: Set Membership, Next: Statistical Functions, Prev: Pseudo-Random Numbers, Up: Functions Set-Membership Functions ------------------------ Set membership functions determine whether a value is a member of a set. They take a set of numeric arguments or a set of string arguments, and produce Boolean results. String comparisons are performed according to the rules given in *Note Relational Operators::. - Function: ANY (VALUE, SET [, SET]...) Results in true if VALUE is equal to any of the SET values. Otherwise, results in false. If VALUE is system-missing, returns system-missing. System-missing values in SET do not cause ANY to return system-missing. - Function: RANGE (VALUE, LOW, HIGH [, LOW, HIGH]...) Results in true if VALUE is in any of the intervals bounded by LOW and HIGH inclusive. Otherwise, results in false. Each LOW must be less than or equal to its corresponding HIGH value. LOW and HIGH must be given in pairs. If VALUE is system-missing, returns system-missing. System-missing values in SET do not cause RANGE to return system-missing.  File: pspp.info, Node: Statistical Functions, Next: String Functions, Prev: Set Membership, Up: Functions Statistical Functions --------------------- Statistical functions compute descriptive statistics on a list of values. Some statistics can be computed on numeric or string values; other can only be computed on numeric values. They result in the same type as their arguments. With statistical functions it is possible to specify a minimum number of non-missing arguments for the function to be evaluated. To do so, append a dot and the number to the function name. For instance, to specify a minimum of three valid arguments to the MEAN function, use the name `MEAN.3'. - Function: CFVAR (NUMBER, NUMBER[, ...]) Results in the coefficient of variation of the values of NUMBER. This function requires at least two valid arguments to give a non-missing result. (The coefficient of variation is the standard deviation divided by the mean.) - Function: MAX (VALUE, VALUE[, ...]) Results in the value of the greatest VALUE. The VALUEs may be numeric or string. Although at least two arguments must be given, only one need be valid for MAX to give a non-missing result. - Function: MEAN (NUMBER, NUMBER[, ...]) Results in the mean of the values of NUMBER. Although at least two arguments must be given, only one need be valid for MEAN to give a non-missing result. - Function: MIN (NUMBER, NUMBER[, ...]) Results in the value of the least VALUE. The VALUEs may be numeric or string. Although at least two arguments must be given, only one need be valid for MAX to give a non-missing result. - Function: SD (NUMBER, NUMBER[, ...]) Results in the standard deviation of the values of NUMBER. This function requires at least two valid arguments to give a non-missing result. - Function: SUM (NUMBER, NUMBER[, ...]) Results in the sum of the values of NUMBER. Although at least two arguments must be given, only one need by valid for SUM to give a non-missing result. - Function: VAR (NUMBER, NUMBER[, ...]) Results in the variance of the values of NUMBER. This function requires at least two valid arguments to give a non-missing result. - Function: VARIANCE (NUMBER, NUMBER[, ...]) Results in the variance of the values of NUMBER. This function requires at least two valid arguments to give a non-missing result. (Use VAR in preference to VARIANCE for reasons of portability.)  File: pspp.info, Node: String Functions, Next: Time & Date, Prev: Statistical Functions, Up: Functions String Functions ---------------- String functions take various arguments and return various results. - Function: CONCAT (STRING, STRING[, ...]) Returns a string consisting of each STRING in sequence. `CONCAT("abc", "def", "ghi")' has a value of `"abcdefghi"'. The resultant string is truncated to a maximum of 255 characters. - Function: INDEX (HAYSTACK, NEEDLE) Returns a positive integer indicating the position of the first occurrence NEEDLE in HAYSTACK. Returns 0 if HAYSTACK does not contain NEEDLE. Returns system-missing if NEEDLE is an empty string. - Function: INDEX (HAYSTACK, NEEDLE, DIVISOR) Divides NEEDLE into parts, each with length DIVISOR. Searches HAYSTACK for the first occurrence of each part, and returns the smallest value. Returns 0 if HAYSTACK does not contain any part in NEEDLE. It is an error if DIVISOR cannot be evenly divided into the length of NEEDLE. Returns system-missing if NEEDLE is an empty string. - Function: LENGTH (STRING) Returns the number of characters in STRING. - Function: LOWER (STRING) Returns a string identical to STRING except that all uppercase letters are changed to lowercase letters. The definitions of "uppercase" and "lowercase" are system-dependent. - Function: LPAD (STRING, LENGTH) If STRING is at least LENGTH characters in length, returns STRING unchanged. Otherwise, returns STRING padded with spaces on the left side to length LENGTH. Returns an empty string if LENGTH is system-missing, negative, or greater than 255. - Function: LPAD (STRING, LENGTH, PADDING) If STRING is at least LENGTH characters in length, returns STRING unchanged. Otherwise, returns STRING padded with PADDING on the left side to length LENGTH. Returns an empty string if LENGTH is system-missing, negative, or greater than 255, or if PADDING does not contain exactly one character. - Function: LTRIM (STRING) Returns STRING, after removing leading spaces. Other whitespace, such as tabs, carriage returns, line feeds, and vertical tabs, is not removed. - Function: LTRIM (STRING, PADDING) Returns STRING, after removing leading PADDING characters. If PADDING does not contain exactly one character, returns an empty string. - Function: NUMBER (STRING) Returns the number produced when STRING is interpreted according to format FX.0, where X is the number of characters in STRING. If STRING does not form a proper number, system-missing is returned without an error message. Portability: none. - Function: NUMBER (STRING, FORMAT) Returns the number produced when STRING is interpreted according to format specifier FORMAT. Only the number of characters in STRING specified by FORMAT are examined. For example, `NUMBER("123", F3.0)' and `NUMBER("1234", F3.0)' both have value 123. If STRING does not form a proper number, system-missing is returned without an error message. - Function: RINDEX (STRING, FORMAT) Returns a positive integer indicating the position of the last occurrence of NEEDLE in HAYSTACK. Returns 0 if HAYSTACK does not contain NEEDLE. Returns system-missing if NEEDLE is an empty string. - Function: RINDEX (HAYSTACK, NEEDLE, DIVISOR) Divides NEEDLE into parts, each with length DIVISOR. Searches HAYSTACK for the last occurrence of each part, and returns the largest value. Returns 0 if HAYSTACK does not contain any part in NEEDLE. It is an error if DIVISOR cannot be evenly divided into the length of NEEDLE. Returns system-missing if NEEDLE is an empty string. - Function: RPAD (STRING, LENGTH) If STRING is at least LENGTH characters in length, returns STRING unchanged. Otherwise, returns STRING padded with spaces on the right to length LENGTH. Returns an empty string if LENGTH is system-missing, negative, or greater than 255. - Function: RPAD (STRING, LENGTH, PADDING) If STRING is at least LENGTH characters in length, returns STRING unchanged. Otherwise, returns STRING padded with PADDING on the right to length LENGTH. Returns an empty string if LENGTH is system-missing, negative, or greater than 255, or if PADDING does not contain exactly one character. - Function: RTRIM (STRING) Returns STRING, after removing trailing spaces. Other types of whitespace are not removed. - Function: RTRIM (STRING, PADDING) Returns STRING, after removing trailing PADDING characters. If PADDING does not contain exactly one character, returns an empty string. - Function: STRING (NUMBER, FORMAT) Returns a string corresponding to NUMBER in the format given by format specifier FORMAT. For example, `STRING(123.56, F5.1)' has the value `"123.6"'. - Function: SUBSTR (STRING, START) Returns a string consisting of the value of STRING from position START onward. Returns an empty string if START is system-missing or has a value less than 1 or greater than the number of characters in STRING. - Function: SUBSTR (STRING, START, COUNT) Returns a string consisting of the first COUNT characters from STRING beginning at position START. Returns an empty string if START or COUNT is system-missing, if START is less than 1 or greater than the number of characters in STRING, or if COUNT is less than 1. Returns a string shorter than COUNT characters if START + COUNT - 1 is greater than the number of characters in STRING. Examples: `SUBSTR("abcdefg", 3, 2)' has value `"cd"'; `SUBSTR("Ben Pfaff", 5, 10)' has the value `"Pfaff"'. - Function: UPCASE (STRING) Returns STRING, changing lowercase letters to uppercase letters.  File: pspp.info, Node: Time & Date, Next: Miscellaneous Functions, Prev: String Functions, Up: Functions Time & Date Functions --------------------- The legal range of dates for use in PSPP is 15 Oct 1582 through 31 Dec 19999. *Please note:* Most time & date extraction functions will accept invalid arguments: * Negative numbers in PSPP time format. * Numbers less than 86,400 in PSPP date format. However, sensible results are not guaranteed for these invalid values. The given equivalents for these functions are definitely not guaranteed for invalid values. *Please note also:* The time & date construction functions *do* produce reasonable and useful results for out-of-range values; these are not considered invalid. * Menu: * Time & Date Concepts:: How times & dates are defined and represented * Time Construction:: TIME.{DAYS HMS} * Time Extraction:: CTIME.{DAYS HOURS MINUTES SECONDS} * Date Construction:: DATE.{DMY MDY MOYR QYR WKYR YRDAY} * Date Extraction:: XDATE.{DATE HOUR JDAY MDAY MINUTE MONTH QUARTER SECOND TDAY TIME WEEK WKDAY YEAR}  File: pspp.info, Node: Time & Date Concepts, Next: Time Construction, Prev: Time & Date, Up: Time & Date How times & dates are defined and represented ............................................. Times and dates are handled by PSPP as single numbers. A "time" is an interval. PSPP measures times in seconds. Thus, the following intervals correspond with the numeric values given: 10 minutes 600 1 hour 3,600 1 day, 3 hours, 10 seconds 97,210 40 days 3,456,000 10010 d, 14 min, 24 s 864,864,864 A "date", on the other hand, is a particular instant in the past or the future. PSPP represents a date as a number of seconds after the midnight that separated 8 Oct 1582 and 9 Oct 1582. (Please note that 15 Oct 1582 immediately followed 9 Oct 1582.) Thus, the midnights before the dates given below correspond with the numeric PSPP dates given: 15 Oct 1582 86,400 4 Jul 1776 6,113,318,400 1 Jan 1900 10,010,390,400 1 Oct 1978 12,495,427,200 24 Aug 1995 13,028,601,600 Please note: * A time may be added to, or subtracted from, a date, resulting in a date. * The difference of two dates may be taken, resulting in a time. * Two times may be added to, or subtracted from, each other, resulting in a time. (Adding two dates does not produce a useful result.) Since times and dates are merely numbers, the ordinary addition and subtraction operators are employed for these purposes. *Please note:* Many dates and times have extremely large values--just look at the values above. Thus, it is not a good idea to take powers of these values; also, the accuracy of some procedures may be affected. If necessary, convert times or dates in seconds to some other unit, like days or years, before performing analysis.  File: pspp.info, Node: Time Construction, Next: Time Extraction, Prev: Time & Date Concepts, Up: Time & Date Functions that Produce Times ............................ These functions take numeric arguments and produce numeric results in PSPP time format. - Function: TIME.DAYS (NDAYS) Results in a time value corresponding to NDAYS days. (`TIME.DAYS(X)' is equivalent to `X * 60 * 60 * 24'.) - Function: TIME.HMS (NHOURS, NMINS, NSECS) Results in a time value corresponding to NHOURS hours, NMINS minutes, and NSECS seconds. (`TIME.HMS(H, M, S)' is equivalent to `H*60*60 + M*60 + S'.)  File: pspp.info, Node: Time Extraction, Next: Date Construction, Prev: Time Construction, Up: Time & Date Functions that Examine Times ............................ These functions take numeric arguments in PSPP time format and give numeric results. - Function: CTIME.DAYS (TIME) Results in the number of days and fractional days in TIME. (`CTIME.DAYS(X)' is equivalent to `X/60/60/24'.) - Function: CTIME.HOURS (TIME) Results in the number of hours and fractional hours in TIME. (`CTIME.HOURS(X)' is equivalent to `X/60/60'.) - Function: CTIME.MINUTES (TIME) Results in the number of minutes and fractional minutes in TIME. (`CTIME.MINUTES(X)' is equivalent to `X/60'.) - Function: CTIME.SECONDS (TIME) Results in the number of seconds and fractional seconds in TIME. (`CTIME.SECONDS' does nothing; `CTIME.SECONDS(X)' is equivalent to `X'.)  File: pspp.info, Node: Date Construction, Next: Date Extraction, Prev: Time Extraction, Up: Time & Date Functions that Produce Dates ............................ These functions take numeric arguments and give numeric results in the PSPP date format. Arguments taken by these functions are: DAY Refers to a day of the month between 1 and 31. MONTH Refers to a month of the year between 1 and 12. QUARTER Refers to a quarter of the year between 1 and 4. The quarters of the year begin on the first days of months 1, 4, 7, and 10. WEEK Refers to a week of the year between 1 and 53. YDAY Refers to a day of the year between 1 and 366. YEAR Refers to a year between 1582 and 19999. If these functions' arguments are out-of-range, they are correctly normalized before conversion to date format. Non-integers are rounded toward zero. - Function: DATE.DMY (DAY, MONTH, YEAR) - Function: DATE.MDY (MONTH, DAY, YEAR) Results in a date value corresponding to the midnight before day DAY of month MONTH of year YEAR. - Function: DATE.MOYR (MONTH, YEAR) Results in a date value corresponding to the midnight before the first day of month MONTH of year YEAR. - Function: DATE.QYR (QUARTER, YEAR) Results in a date value corresponding to the midnight before the first day of quarter QUARTER of year YEAR. - Function: DATE.WKYR (WEEK, YEAR) Results in a date value corresponding to the midnight before the first day of week WEEK of year YEAR. - Function: DATE.YRDAY (YEAR, YDAY) Results in a date value corresponding to the midnight before day YDAY of year YEAR.