pintos-os.org Git - pspp/blob - doc/transformation.texi

   1 @node Data Manipulation
   2 @chapter Data transformations
   3 @cindex transformations
   4
   5 The @pspp{} procedures examined in this chapter manipulate data and
   6 prepare the active dataset for later analyses.  They do not produce output,
   7 as a rule.
   8
   9 @menu
  10 * AGGREGATE::                   Summarize multiple cases into a single case.
  11 * AUTORECODE::                  Automatic recoding of variables.
  12 * COMPUTE::                     Assigning a variable a calculated value.
  13 * COUNT::                       Counting variables with particular values.
  14 * FLIP::                        Exchange variables with cases.
  15 * IF::                          Conditionally assigning a calculated value.
  16 * RECODE::                      Mapping values from one set to another.
  17 * SORT CASES::                  Sort the active dataset.
  18 @end menu
  19
  20 @node AGGREGATE
  21 @section AGGREGATE
  22 @vindex AGGREGATE
  23
  24 @display
  25 AGGREGATE
  26         OUTFILE=@{*,'@var{file_name}',@var{file_handle}@} [MODE=@{REPLACE, ADDVARIABLES@}]
  27         /PRESORTED
  28         /DOCUMENT
  29         /MISSING=COLUMNWISE
  30         /BREAK=@var{var_list}
  31         /@var{dest_var}['@var{label}']@dots{}=@var{agr_func}(@var{src_vars}, @var{args}@dots{})@dots{}
  32 @end display
  33
  34 @cmd{AGGREGATE} summarizes groups of cases into single cases.
  35 Cases are divided into groups that have the same values for one or more
  36 variables called @dfn{break variables}.  Several functions are available
  37 for summarizing case contents.
  38
  39 The @subcmd{OUTFILE} subcommand is required and must appear first.  Specify a
  40 system file or portable file by file name or file
  41 handle (@pxref{File Handles}), or a dataset by its name
  42 (@pxref{Datasets}).
  43 The aggregated cases are written to this file.  If @samp{*} is
  44 specified, then the aggregated cases replace the active dataset's data.
  45 Use of OUTFILE to write a portable file is a @pspp{} extension.
  46
  47 If OUTFILE=@samp{*} is given, then the subcommand MODE may also be
  48 specified.
  49 The mode subcommand has two possible values: @subcmd{ADDVARIABLES} or @subcmd{REPLACE}.
  50 In @subcmd{REPLACE} mode, the entire active dataset is replaced by a new dataset
  51 which contains just the break variables and the destination varibles.
  52 In this mode, the new file will contain as many cases as there are
  53 unique combinations of the break variables.
  54 In @subcmd{ADDVARIABLES} mode, the destination variables will be appended to
  55 the existing active dataset.
  56 Cases which have identical combinations of values in their break
  57 variables, will receive identical values for the destination variables.
  58 The number of cases in the active dataset will remain unchanged.
  59 Note that if @subcmd{ADDVARIABLES} is specified, then the data @emph{must} be
  60 sorted on the break variables.
  61
  62 By default, the active dataset will be sorted based on the break variables
  63 before aggregation takes place.  If the active dataset is already sorted
  64 or otherwise grouped in terms of the break variables, specify
  65 @subcmd{PRESORTED} to save time.
  66 @subcmd{PRESORTED} is assumed if @subcmd{MODE=ADDVARIABLES} is used.
  67
  68 Specify @subcmd{DOCUMENT} to copy the documents from the active dataset into the
  69 aggregate file (@pxref{DOCUMENT}).  Otherwise, the aggregate file will
  70 not contain any documents, even if the aggregate file replaces the
  71 active dataset.
  72
  73 Normally, only a single case (for @subcmd{SD} and @subcmd{SD}., two cases) need be
  74 non-missing in each group for the aggregate variable to be
  75 non-missing.  Specifying @subcmd{/MISSING=COLUMNWISE} inverts this behavior, so
  76 that the aggregate variable becomes missing if any aggregated value is
  77 missing.
  78
  79 If @subcmd{PRESORTED}, @subcmd{DOCUMENT}, or @subcmd{MISSING} are specified, they must appear
  80 between @subcmd{OUTFILE} and @subcmd{BREAK}.
  81
  82 At least one break variable must be specified on @subcmd{BREAK}, a
  83 required subcommand.  The values of these variables are used to divide
  84 the active dataset into groups to be summarized.  In addition, at least
  85 one @var{dest_var} must be specified.
  86
  87 One or more sets of aggregation variables must be specified.  Each set
  88 comprises a list of aggregation variables, an equals sign (@samp{=}),
  89 the name of an aggregation function (see the list below), and a list
  90 of source variables in parentheses.  Some aggregation functions expect
  91 additional arguments following the source variable names.
  92
  93 Aggregation variables typically are created with no variable label,
  94 value labels, or missing values.  Their default print and write
  95 formats depend on the aggregation function used, with details given in
  96 the table below.  A variable label for an aggregation variable may be
  97 specified just after the variable's name in the aggregation variable
  98 list.
  99
 100 Each set must have exactly as many source variables as aggregation
 101 variables.  Each aggregation variable receives the results of applying
 102 the specified aggregation function to the corresponding source
 103 variable.  The MEAN, MEDIAN, SD, and SUM aggregation functions may only be
 104 applied to numeric variables.  All the rest may be applied to numeric
 105 and string variables.
 106
 107 The available aggregation functions are as follows:
 108
 109 @table @asis
 110 @item FGT(@var{var_name}, @var{value})
 111 Fraction of values greater than the specified constant.  The default
 112 format is F5.3.
 113
 114 @item FIN(@var{var_name}, @var{low}, @var{high})
 115 Fraction of values within the specified inclusive range of constants.
 116 The default format is F5.3.
 117
 118 @item FLT(@var{var_name}, @var{value})
 119 Fraction of values less than the specified constant.  The default
 120 format is F5.3.
 121
 122 @item FIRST(@var{var_name})
 123 First non-missing value in break group.  The aggregation variable
 124 receives the complete dictionary information from the source variable.
 125 The sort performed by @cmd{AGGREGATE} (and by @cmd{SORT CASES}) is stable, so that
 126 the first case with particular values for the break variables before
 127 sorting will also be the first case in that break group after sorting.
 128
 129 @item FOUT(@var{var_name}, @var{low}, @var{high})
 130 Fraction of values strictly outside the specified range of constants.
 131 The default format is F5.3.
 132
 133 @item LAST(@var{var_name})
 134 Last non-missing value in break group.  The aggregation variable
 135 receives the complete dictionary information from the source variable.
 136 The sort performed by @cmd{AGGREGATE} (and by @cmd{SORT CASES}) is stable, so that
 137 the last case with particular values for the break variables before
 138 sorting will also be the last case in that break group after sorting.
 139
 140 @item MAX(@var{var_name})
 141 Maximum value.  The aggregation variable receives the complete
 142 dictionary information from the source variable.
 143
 144 @item MEAN(@var{var_name})
 145 Arithmetic mean.  Limited to numeric values.  The default format is
 146 F8.2.
 147
 148 @item MEDIAN(@var{var_name})
 149 The median value.  Limited to numeric values.  The default format is F8.2.
 150
 151 @item MIN(@var{var_name})
 152 Minimum value.  The aggregation variable receives the complete
 153 dictionary information from the source variable.
 154
 155 @item N(@var{var_name})
 156 Number of non-missing values.  The default format is F7.0 if weighting
 157 is not enabled, F8.2 if it is (@pxref{WEIGHT}).
 158
 159 @item N
 160 Number of cases aggregated to form this group.  The default format is
 161 F7.0 if weighting is not enabled, F8.2 if it is (@pxref{WEIGHT}).
 162
 163 @item NMISS(@var{var_name})
 164 Number of missing values.  The default format is F7.0 if weighting is
 165 not enabled, F8.2 if it is (@pxref{WEIGHT}).
 166
 167 @item NU(@var{var_name})
 168 Number of non-missing values.  Each case is considered to have a weight
 169 of 1, regardless of the current weighting variable (@pxref{WEIGHT}).
 170 The default format is F7.0.
 171
 172 @item NU
 173 Number of cases aggregated to form this group.  Each case is considered
 174 to have a weight of 1, regardless of the current weighting variable.
 175 The default format is F7.0.
 176
 177 @item NUMISS(@var{var_name})
 178 Number of missing values.  Each case is considered to have a weight of
 179 1, regardless of the current weighting variable.  The default format is F7.0.
 180
 181 @item PGT(@var{var_name}, @var{value})
 182 Percentage between 0 and 100 of values greater than the specified
 183 constant.  The default format is F5.1.
 184
 185 @item PIN(@var{var_name}, @var{low}, @var{high})
 186 Percentage of values within the specified inclusive range of
 187 constants.  The default format is F5.1.
 188
 189 @item PLT(@var{var_name}, @var{value})
 190 Percentage of values less than the specified constant.  The default
 191 format is F5.1.
 192
 193 @item POUT(@var{var_name}, @var{low}, @var{high})
 194 Percentage of values strictly outside the specified range of
 195 constants.  The default format is F5.1.
 196
 197 @item SD(@var{var_name})
 198 Standard deviation of the mean.  Limited to numeric values.  The
 199 default format is F8.2.
 200
 201 @item SUM(var_name)
 202 Sum.  Limited to numeric values.  The default format is F8.2.
 203 @end table
 204
 205 Aggregation functions compare string values in terms of internal
 206 character codes.  On most modern computers, this is a form of ASCII.
 207
 208 The aggregation functions listed above exclude all user-missing values
 209 from calculations.  To include user-missing values, insert a period
 210 (@samp{.}) at the end of the function name.  (e.g.@: @samp{SUM.}).
 211 (Be aware that specifying such a function as the last token on a line
 212 will cause the period to be interpreted as the end of the command.)
 213
 214 @cmd{AGGREGATE} both ignores and cancels the current @cmd{SPLIT FILE}
 215 settings (@pxref{SPLIT FILE}).
 216
 217 @node AUTORECODE
 218 @section AUTORECODE
 219 @vindex AUTORECODE
 220
 221 @display
 222 AUTORECODE VARIABLES=@var{src_vars} INTO @var{dest_vars}
 223         [ /DESCENDING ]
 224         [ /PRINT ]
 225         [ /GROUP ]
 226         [ /BLANK = @{VALID, MISSING@} ]
 227 @end display
 228
 229 The @cmd{AUTORECODE} procedure considers the @var{n} values that a variable
 230 takes on and maps them onto values 1@dots{}@var{n} on a new numeric
 231 variable.
 232
 233 Subcommand @subcmd{VARIABLES} is the only required subcommand and must come
 234 first.  Specify @subcmd{VARIABLES}, an equals sign (@samp{=}), a list of source
 235 variables, @subcmd{INTO}, and a list of target variables.  There must the same
 236 number of source and target variables.  The target variables must not
 237 already exist.
 238
 239 By default, increasing values of a source variable (for a string, this
 240 is based on character code comparisons) are recoded to increasing values
 241 of its target variable.  To cause increasing values of a source variable
 242 to be recoded to decreasing values of its target variable (@var{n} down
 243 to 1), specify DESCENDING.
 244
 245 PRINT is currently ignored.
 246
 247 The @subcmd{GROUP} subcommand is relevant only if more than one variable is to be
 248 recoded.   It causes a single mapping between source and target values to
 249 be used, instead of one map per variable.
 250
 251 If /BLANK=MISSING is given, then string variables which contain only
 252 whitespace are recoded as SYSMIS.  If /BLANK=VALID is given then they
 253 will be allocated a value like any other.  /BLANK is not relevant
 254 to numeric values. /BLANK=VALID is the default.
 255
 256 @cmd{AUTORECODE} is a procedure.  It causes the data to be read.
 257
 258 @node COMPUTE
 259 @section COMPUTE
 260 @vindex COMPUTE
 261
 262 @display
 263 COMPUTE @var{variable} = @var{expression}.
 264 @end display
 265   or
 266 @display
 267 COMPUTE vector(@var{index}) = @var{expression}.
 268 @end display
 269
 270 @cmd{COMPUTE} assigns the value of an expression to a target
 271 variable.  For each case, the expression is evaluated and its value
 272 assigned to the target variable.  Numeric and string
 273 variables may be assigned.  When a string expression's width differs
 274 from the target variable's width, the string result of the expression
 275 is truncated or padded with spaces on the right as necessary.  The
 276 expression and variable types must match.
 277
 278 For numeric variables only, the target variable need not already
 279 exist.  Numeric variables created by @cmd{COMPUTE} are assigned an
 280 @code{F8.2} output format.  String variables must be declared before
 281 they can be used as targets for @cmd{COMPUTE}.
 282
 283 The target variable may be specified as an element of a vector
 284 (@pxref{VECTOR}).  In this case, an expression @var{index} must be
 285 specified in parentheses following the vector name.  The expression @var{index}
 286 must evaluate to a numeric value that, after rounding down
 287 to the nearest integer, is a valid index for the named vector.
 288
 289 Using @cmd{COMPUTE} to assign to a variable specified on @cmd{LEAVE}
 290 (@pxref{LEAVE}) resets the variable's left state.  Therefore,
 291 @code{LEAVE} should be specified following @cmd{COMPUTE}, not before.
 292
 293 @cmd{COMPUTE} is a transformation.  It does not cause the active dataset to be
 294 read.
 295
 296 When @cmd{COMPUTE} is specified following @cmd{TEMPORARY}
 297 (@pxref{TEMPORARY}), the @cmd{LAG} function may not be used
 298 (@pxref{LAG}).
 299
 300 @node COUNT
 301 @section COUNT
 302 @vindex COUNT
 303
 304 @display
 305 COUNT @var{var_name} = @var{var}@dots{} (@var{value}@dots{}).
 306
 307 Each @var{value} takes one of the following forms:
 308         @var{number}
 309         @var{string}
 310         @var{num1} THRU @var{num2}
 311         MISSING
 312         SYSMIS
 313 In addition, @var{num1} and @var{num2} can be LO or LOWEST, or HI or HIGHEST,
 314 respectively.
 315 @end display
 316
 317 @cmd{COUNT} creates or replaces a numeric @dfn{target} variable that
 318 counts the occurrence of a @dfn{criterion} value or set of values over
 319 one or more @dfn{test} variables for each case.
 320
 321 The target variable values are always nonnegative integers.  They are
 322 never missing.  The target variable is assigned an F8.2 output format.
 323 @xref{Input and Output Formats}.  Any variables, including
 324 string variables, may be test variables.
 325
 326 User-missing values of test variables are treated just like any other
 327 values.  They are @strong{not} treated as system-missing values.
 328 User-missing values that are criterion values or inside ranges of
 329 criterion values are counted as any other values.  However (for numeric
 330 variables), keyword MISSING may be used to refer to all system-
 331 and user-missing values.
 332
 333 @cmd{COUNT} target variables are assigned values in the order
 334 specified.  In the command @code{COUNT @var{A}=@var{A} @var{B}(1) /@var{B}=@var{A} @var{B}(2).}, the
 335 following actions occur:
 336
 337 @itemize @minus
 338 @item
 339 The number of occurrences of 1 between @var{A} and @var{B} is counted.
 340
 341 @item
 342 @var{A} is assigned this value.
 343
 344 @item
 345 The number of occurrences of 1 between @var{B} and the @strong{new}
 346 value of @var{A} is counted.
 347
 348 @item
 349 @var{B} is assigned this value.
 350 @end itemize
 351
 352 Despite this ordering, all @cmd{COUNT} criterion variables must exist
 353 before the procedure is executed---they may not be created as target
 354 variables earlier in the command!  Break such a command into two
 355 separate commands.
 356
 357 The examples below may help to clarify.
 358
 359 @enumerate A
 360 @item
 361 Assuming @code{Q0}, @code{Q2}, @dots{}, @code{Q9} are numeric variables,
 362 the following commands:
 363
 364 @enumerate
 365 @item
 366 Count the number of times the value 1 occurs through these variables
 367 for each case and assigns the count to variable @code{QCOUNT}.
 368
 369 @item
 370 Print out the total number of times the value 1 occurs throughout
 371 @emph{all} cases using @cmd{DESCRIPTIVES}.  @xref{DESCRIPTIVES}, for
 372 details.
 373 @end enumerate
 374
 375 @example
 376 COUNT QCOUNT=Q0 TO Q9(1).
 377 DESCRIPTIVES QCOUNT /STATISTICS=SUM.
 378 @end example
 379
 380 @item
 381 Given these same variables, the following commands:
 382
 383 @enumerate
 384 @item
 385 Count the number of valid values of these variables for each case and
 386 assigns the count to variable @code{QVALID}.
 387
 388 @item
 389 Multiplies each value of @code{QVALID} by 10 to obtain a percentage of
 390 valid values, using @cmd{COMPUTE}.  @xref{COMPUTE}, for details.
 391
 392 @item
 393 Print out the percentage of valid values across all cases, using
 394 @cmd{DESCRIPTIVES}.  @xref{DESCRIPTIVES}, for details.
 395 @end enumerate
 396
 397 @example
 398 COUNT QVALID=Q0 TO Q9 (LO THRU HI).
 399 COMPUTE QVALID=QVALID*10.
 400 DESCRIPTIVES QVALID /STATISTICS=MEAN.
 401 @end example
 402 @end enumerate
 403
 404 @node FLIP
 405 @section FLIP
 406 @vindex FLIP
 407
 408 @display
 409 FLIP /VARIABLES=@var{var_list} /NEWNAMES=@var{var_name}.
 410 @end display
 411
 412 @cmd{FLIP} transposes rows and columns in the active dataset.  It
 413 causes cases to be swapped with variables, and vice versa.
 414
 415 All variables in the transposed active dataset are numeric.  String
 416 variables take on the system-missing value in the transposed file.
 417
 418 @subcmd{N} subcommands are required.  If specified, the @subcmd{VARIABLES} subcommand
 419 selects variables to be transformed into cases, and variables not
 420 specified are discarded.  If the @subcmd{VARIABLES} subcommand is omitted, all
 421 variables are selected for transposition.
 422
 423 The variables specified by @subcmd{NEWNAMES}, which must be a
 424 string variable, is
 425 used to give names to the variables created by @cmd{FLIP}.  Only the
 426 first 8 characters of the variable are used.  If
 427 @subcmd{NEWNAMES} is not
 428 specified then the default is a variable named CASE_LBL, if it exists.
 429 If it does not then the variables created by @cmd{FLIP} are named VAR000
 430 through VAR999, then VAR1000, VAR1001, and so on.
 431
 432 When a @subcmd{NEWNAMES} variable is available, the names must be canonicalized
 433 before becoming variable names.  Invalid characters are replaced by
 434 letter @samp{V} in the first position, or by @samp{_} in subsequent
 435 positions.  If the name thus generated is not unique, then numeric
 436 extensions are added, starting with 1, until a unique name is found or
 437 there are no remaining possibilities.  If the latter occurs then the
 438 @cmd{FLIP} operation aborts.
 439
 440 The resultant dictionary contains a CASE_LBL variable, a string
 441 variable of width 8, which stores the names of the variables in the
 442 dictionary before the transposition.  Variables names longer than 8
 443 characters are truncated.  If the active dataset is subsequently
 444 transposed using @cmd{FLIP}, this variable can be used to recreate the
 445 original variable names.
 446
 447 @cmd{FLIP} honors @cmd{N OF CASES} (@pxref{N OF CASES}).  It ignores
 448 @cmd{TEMPORARY} (@pxref{TEMPORARY}), so that ``temporary''
 449 transformations become permanent.
 450
 451 @node IF
 452 @section IF
 453 @vindex IF
 454
 455 @display
 456 IF @var{condition} @var{variable}=@var{expression}.
 457 @end display
 458   or
 459 @display
 460 IF @var{condition} vector(@var{index})=@var{expression}.
 461 @end display
 462
 463 The @cmd{IF} transformation conditionally assigns the value of a target
 464 expression to a target variable, based on the truth of a test
 465 expression.
 466
 467 Specify a boolean-valued expression (@pxref{Expressions}) to be tested
 468 following the @cmd{IF} keyword.  This expression is evaluated for each case.
 469 If the value is true, then the value of the expression is computed and
 470 assigned to the specified variable.  If the value is false or missing,
 471 nothing is done.  Numeric and string variables may be
 472 assigned.  When a string expression's width differs from the target
 473 variable's width, the string result of the expression is truncated or
 474 padded with spaces on the right as necessary.  The expression and
 475 variable types must match.
 476
 477 The target variable may be specified as an element of a vector
 478 (@pxref{VECTOR}).  In this case, a vector index expression must be
 479 specified in parentheses following the vector name.  The index
 480 expression must evaluate to a numeric value that, after rounding down
 481 to the nearest integer, is a valid index for the named vector.
 482
 483 Using @cmd{IF} to assign to a variable specified on @cmd{LEAVE}
 484 (@pxref{LEAVE}) resets the variable's left state.  Therefore,
 485 @code{LEAVE} should be specified following @cmd{IF}, not before.
 486
 487 When @cmd{IF} is specified following @cmd{TEMPORARY}
 488 (@pxref{TEMPORARY}), the @cmd{LAG} function may not be used
 489 (@pxref{LAG}).
 490
 491 @node RECODE
 492 @section RECODE
 493 @vindex RECODE
 494
 495 The @cmd{RECODE} command is used to transform existing values into other,
 496 user specified values.
 497 The general form is:
 498
 499 @display
 500 RECODE @var{src_vars}
 501         (@var{src_value} @var{src_value} @dots{} = @var{dest_value})
 502         (@var{src_value} @var{src_value} @dots{} = @var{dest_value})
 503         (@var{src_value} @var{src_value} @dots{} = @var{dest_value}) @dots{}
 504          [INTO @var{dest_vars}].
 505 @end display
 506
 507 Following the RECODE keyword itself comes @var{src_vars} which is a list
 508 of variables whose values are to be transformed.
 509 These variables may be string variables or they may be numeric.
 510 However the list must be homogeneous; you may not mix string variables and
 511 numeric variables in the same recoding.
 512
 513 After the list of source variables, there should be one or more @dfn{mappings}.
 514 Each mapping is enclosed in parentheses, and contains the source values and
 515 a destination value separated by a single @samp{=}.
 516 The source values are used to specify the values in the dataset which
 517 need to change, and the destination value specifies the new value
 518 to which they should be changed.
 519 Each @var{src_value} may take one of the following forms:
 520 @itemize @bullet
 521 @item @var{number}
 522 If the source variables are numeric then @var{src_value} may be a literal
 523 number.
 524 @item @var{string}
 525 If the source variables are string variables then @var{src_value} may be a
 526 literal string (like all strings, enclosed in single or double quotes).
 527 @item @var{num1} THRU @var{num2}
 528 This form is valid only when the source variables are numeric.
 529 It specifies all values in the range [@var{num1}, @var{num2}].
 530 Normally you would ensure that @var{num2} is greater than or equal to
 531 @var{num1}.
 532 If @var{num1} however is greater than @var{num2}, then the range
 533 [@var{num2},@var{num1}] will be used instead.
 534 Open-ended ranges may be specified using @samp{LO} or @samp{LOWEST}
 535 for @var{num1}
 536 or @samp{HI} or @samp{HIGHEST} for @var{num2}.
 537 @item @samp{MISSING}
 538 The literal keyword @samp{MISSING} matches both system missing and user
 539 missing values.
 540 It is valid for both numeric and string variables.
 541 @item @samp{SYSMIS}
 542 The literal keyword @samp{SYSMIS} matches system missing
 543 values.
 544 It is valid for both numeric variables only.
 545 @item @samp{ELSE}
 546 The @samp{ELSE} keyword may be used to match any values which are
 547 not matched by any other @var{src_value} appearing in the command.
 548 If this keyword appears, it should be used in the last mapping of the
 549 command.
 550 @end itemize
 551
 552 After the source variables comes an @samp{=} and then the @var{dest_value}.
 553 The @var{dest_value} may take any of the following forms:
 554 @itemize @bullet
 555 @item @var{number}
 556 A literal numeric value to which the source values should be changed.
 557 This implies the destination variable must be numeric.
 558 @item @var{string}
 559 A literal string value (enclosed in quotation marks) to which the source
 560 values should be changed.
 561 This implies the destination variable must be a string variable.
 562 @item @samp{SYSMIS}
 563 The keyword @samp{SYSMIS} changes the value to the system missing value.
 564 This implies the destination variable must be numeric.
 565 @item @samp{COPY}
 566 The special keyword @samp{COPY} means that the source value should not be
 567 modified, but
 568 copied directly to the destination value.
 569 This is meaningful only if @samp{INTO @var{dest_vars}} is specified.
 570 @end itemize
 571
 572 Mappings are considered from left to right.
 573 Therefore, if a value is matched by a @var{src_value} from more than
 574 one mapping, the first (leftmost) mapping which matches will be considered.
 575 Any subsequent matches will be ignored.
 576
 577 The clause @samp{INTO @var{dest_vars}} is optional.
 578 The behaviour of the command is slightly different depending on whether it
 579 appears or not.
 580
 581 If @samp{INTO @var{dest_vars}} does not appear, then values will be recoded
 582 ``in place´´.  This means that the recoded values are written back to the
 583 source variables from whence the original values came.
 584 In this case, the @var{dest_value} for every mapping must imply a value which
 585 has the same type as the @var{src_value}.
 586 For example, if the source value is a string value, it is not permissible for
 587 @var{dest_value} to be @samp{SYSMIS} or another forms which implies a numeric
 588 result.
 589 The following example two numeric variables @var{x} and @var{y} are recoded
 590 in place.
 591 Zero is recoded to 99, the values 1 to 10 inclusive are unchanged,
 592 values 1000 and higher are recoded to the system-missing value and all other
 593 values are changed to 999:
 594 @example
 595 recode @var{x} @var{y}
 596         (0 = 99)
 597         (1 THRU 10 = COPY)
 598         (1000 THRU HIGHEST = SYSMIS)
 599         (ELSE = 999).
 600 @end example
 601
 602 If @samp{INTO @var{dest_vars}} is given, then recoded values are written
 603 into the variables specified in @var{dest_vars}, which must therefore
 604  contain a list of valid variable names.
 605 The number of variables in @var{dest_vars} must be the same as the number
 606 of variables in @var{src_vars}
 607 and the respective order of the variables in @var{dest_vars} corresponds to
 608 the order of @var{src_vars}.
 609 That is to say, recoded values whose
 610 original value came from the @var{n}th variable in @var{src_vars} will be
 611 placed into the @var{n}th variable in @var{dest_vars}.
 612 The source variables will be unchanged.
 613 If any mapping implies a string as its destination value, then the respective
 614 destination variable must already exist, or
 615 have been declared using @cmd{STRING} or another transformation.
 616 Numeric variables however will be automatically created if they don't already
 617 exist.
 618 The following example deals with two source variables, @var{a} and @var{b}
 619 which contain string values.  Hence there are two destination variables
 620 @var{v1} and @var{v2}.
 621 Any cases where @var{a} or @var{b} contain the values @samp{apple},
 622 @samp{pear} or @samp{pomegranate} will result in @var{v1} or @var{v2} being
 623 filled with the string @samp{fruit} whilst cases with
 624 @samp{tomato}, @samp{lettuce} or @samp{carrot} will result in @samp{vegetable}.
 625 Any other values will produce the result @samp{unknown}:
 626 @example
 627 string @var{v1} (a20).
 628 string @var{v2} (a20).
 629
 630 recode @var{a} @var{b}
 631         ("apple" "pear" "pomegranate" = "fruit")
 632         ("tomato" "lettuce" "carrot" = "vegetable")
 633         (ELSE = "unknown")
 634         into @var{v1} @var{v2}.
 635 @end example
 636
 637 There is one very special mapping, not mentioned above.
 638 If the source variable is a string variable
 639 then a mapping may be specified as @samp{(CONVERT)}.
 640 This mapping, if it appears must be the last mapping given and
 641 the @samp{INTO @var{dest_vars}} clause must also be given and
 642 must not refer to a string variable.
 643 @samp{CONVERT} causes a number specified as a string to
 644 be converted to a numeric value.
 645 For example it will convert the string @samp{"3"} into the numeric
 646 value 3 (note that it will not convert @samp{three} into 3).
 647 If the string cannot be parsed as a number, then the system-missing value
 648 is assigned instead.
 649 In the following example, cases where the value of @var{x} (a string variable)
 650 is the empty string, are recoded to 999 and all others are converted to the
 651 numeric equivalent of the input value.  The results are placed into the
 652 numeric variable @var{y}:
 653 @example
 654 recode @var{x}
 655        ("" = 999)
 656         (convert)
 657         into @var{y}.
 658 @end example
 659
 660 It is possible to specify multiple recodings on a single command.
 661 Introduce additional recodings with a slash (@samp{/}) to
 662 separate them from the previous recodings:
 663 @example
 664 recode
 665         @var{a}  (2 = 22) (else = 99)
 666         /@var{b} (1 = 3) into @var{z}
 667         .
 668 @end example
 669 @noindent Here we have two recodings. The first affects the source variable
 670 @var{a} and recodes in-place the value 2 into 22 and all other values to 99.
 671 The second recoding copies the values of @var{b} into the the variable @var{z},
 672 changing any instances of 1 into 3.
 673
 674 @node SORT CASES
 675 @section SORT CASES
 676 @vindex SORT CASES
 677
 678 @display
 679 SORT CASES BY @var{var_list}[(@{D|A@}] [ @var{var_list}[(@{D|A@}] ] ...
 680 @end display
 681
 682 @cmd{SORT CASES} sorts the active dataset by the values of one or more
 683 variables.
 684
 685 Specify @subcmd{BY} and a list of variables to sort by.  By default, variables
 686 are sorted in ascending order.  To override sort order, specify @subcmd{(D)} or
 687 @subcmd{(DOWN)} after a list of variables to get descending order, or @subcmd{(A)} or @subcmd{(UP)}
 688 for ascending order.  These apply to all the listed variables
 689 up until the preceding @subcmd{(A)}, @subcmd{(D)}, @subcmd{(UP)} or @subcmd{(DOWN)}.
 690
 691 The sort algorithms used by @cmd{SORT CASES} are stable.  That is,
 692 records that have equal values of the sort variables will have the
 693 same relative order before and after sorting.  As a special case,
 694 re-sorting an already sorted file will not affect the ordering of
 695 cases.
 696
 697 @cmd{SORT CASES} is a procedure.  It causes the data to be read.
 698
 699 @cmd{SORT CASES} attempts to sort the entire active dataset in main memory.
 700 If workspace is exhausted, it falls back to a merge sort algorithm that
 701 involves creates numerous temporary files.
 702
 703 @cmd{SORT CASES} may not be specified following @cmd{TEMPORARY}.