doc/files.texi

   1 @node System and Portable File IO
   2 @chapter System and Portable File I/O
   3
   4 The commands in this chapter read, write, and examine system files and
   5 portable files.
   6
   7 @menu
   8 * APPLY DICTIONARY::            Apply system file dictionary to active dataset.
   9 * EXPORT::                      Write to a portable file.
  10 * GET::                         Read from a system file.
  11 * GET DATA::                    Read from foreign files.
  12 * IMPORT::                      Read from a portable file.
  13 * SAVE::                        Write to a system file.
  14 * SAVE TRANSLATE::              Write data in foreign file formats.
  15 * SYSFILE INFO::                Display system file dictionary.
  16 * XEXPORT::                     Write to a portable file, as a transformation.
  17 * XSAVE::                       Write to a system file, as a transformation.
  18 @end menu
  19
  20 @node APPLY DICTIONARY
  21 @section APPLY DICTIONARY
  22 @vindex APPLY DICTIONARY
  23
  24 @display
  25 APPLY DICTIONARY FROM=@{'file-name',file_handle@}.
  26 @end display
  27
  28 @cmd{APPLY DICTIONARY} applies the variable labels, value labels,
  29 and missing values taken from a file to corresponding
  30 variables in the active dataset.  In some cases it also updates the
  31 weighting variable.
  32
  33 Specify a system file, portable file, or scratch file with a file name
  34 string or as a file handle (@pxref{File Handles}).  The dictionary in the
  35 file will be read, but it will not replace the active dataset dictionary.
  36 The file's data will not be read.
  37
  38 Only variables with names that exist in both the active dataset and the
  39 system file are considered.  Variables with the same name but different
  40 types (numeric, string) will cause an error message.  Otherwise, the
  41 system file variables' attributes will replace those in their matching
  42 active dataset variables:
  43
  44 @itemize @bullet
  45 @item
  46 If a system file variable has a variable label, then it will replace
  47 the variable label of the active dataset variable.  If the system
  48 file variable does not have a variable label, then the active dataset
  49 variable's variable label, if any, will be retained.
  50
  51 @item
  52 If the system file variable has custom attributes (@pxref{VARIABLE
  53 ATTRIBUTE}), then those attributes replace the active dataset variable's
  54 custom attributes.  If the system file variable does not have custom
  55 attributes, then the active dataset variable's custom attributes, if any,
  56 will be retained.
  57
  58 @item
  59 If the active dataset variable is numeric or short string, then value
  60 labels and missing values, if any, will be copied to the active dataset
  61 variable.  If the system file variable does not have value labels or
  62 missing values, then those in the active dataset variable, if any, will not
  63 be disturbed.
  64 @end itemize
  65
  66 In addition to properties of variables, some properties of the active
  67 file dictionary as a whole are updated:
  68
  69 @itemize @bullet
  70 @item
  71 If the system file has custom attributes (@pxref{DATAFILE ATTRIBUTE}),
  72 then those attributes replace the active dataset variable's custom
  73 attributes.
  74
  75 @item
  76 If the active dataset has a weighting variable (@pxref{WEIGHT}), and the
  77 system file does not, or if the weighting variable in the system file
  78 does not exist in the active dataset, then the active dataset weighting
  79 variable, if any, is retained.  Otherwise, the weighting variable in
  80 the system file becomes the active dataset weighting variable.
  81 @end itemize
  82
  83 @cmd{APPLY DICTIONARY} takes effect immediately.  It does not read the
  84 active dataset.  The system file is not modified.
  85
  86 @node EXPORT
  87 @section EXPORT
  88 @vindex EXPORT
  89
  90 @display
  91 EXPORT
  92         /OUTFILE='file-name'
  93         /UNSELECTED=@{RETAIN,DELETE@}
  94         /DIGITS=n
  95         /DROP=var_list
  96         /KEEP=var_list
  97         /RENAME=(src_names=target_names)@dots{}
  98         /TYPE=@{COMM,TAPE@}
  99         /MAP
 100 @end display
 101
 102 The @cmd{EXPORT} procedure writes the active dataset's dictionary and
 103 data to a specified portable file or scratch file.
 104
 105 By default, cases excluded with FILTER are written to the
 106 file.  These can be excluded by specifying DELETE on the UNSELECTED
 107 subcommand.  Specifying RETAIN makes the default explicit.
 108
 109 Portable files express real numbers in base 30.  Integers are always
 110 expressed to the maximum precision needed to make them exact.
 111 Non-integers are, by default, expressed to the machine's maximum
 112 natural precision (approximately 15 decimal digits on many machines).
 113 If many numbers require this many digits, the portable file may
 114 significantly increase in size.  As an alternative, the DIGITS
 115 subcommand may be used to specify the number of decimal digits of
 116 precision to write.  DIGITS applies only to non-integers.
 117
 118 The OUTFILE subcommand, which is the only required subcommand, specifies
 119 the portable file or scratch file to be written as a file name string or
 120 a file handle (@pxref{File Handles}).
 121
 122 DROP, KEEP, and RENAME follow the same format as the SAVE procedure
 123 (@pxref{SAVE}).
 124
 125 The TYPE subcommand specifies the character set for use in the
 126 portable file.  Its value is currently not used.
 127
 128 The MAP subcommand is currently ignored.
 129
 130 @cmd{EXPORT} is a procedure.  It causes the active dataset to be read.
 131
 132 @node GET
 133 @section GET
 134 @vindex GET
 135
 136 @display
 137 GET
 138         /FILE=@{'file-name',file_handle@}
 139         /DROP=var_list
 140         /KEEP=var_list
 141         /RENAME=(src_names=target_names)@dots{}
 142 @end display
 143
 144 @cmd{GET} clears the current dictionary and active dataset and
 145 replaces them with the dictionary and data from a specified file.
 146
 147 The FILE subcommand is the only required subcommand.  Specify the system
 148 file, portable file, or scratch file to be read as a string file name or
 149 a file handle (@pxref{File Handles}).
 150
 151 By default, all the variables in a file are read.  The DROP
 152 subcommand can be used to specify a list of variables that are not to be
 153 read.  By contrast, the KEEP subcommand can be used to specify variable
 154 that are to be read, with all other variables not read.
 155
 156 Normally variables in a file retain the names that they were
 157 saved under.  Use the RENAME subcommand to change these names.  Specify,
 158 within parentheses, a list of variable names followed by an equals sign
 159 (@samp{=}) and the names that they should be renamed to.  Multiple
 160 parenthesized groups of variable names can be included on a single
 161 RENAME subcommand.  Variables' names may be swapped using a RENAME
 162 subcommand of the form @samp{/RENAME=(A B=B A)}.
 163
 164 Alternate syntax for the RENAME subcommand allows the parentheses to be
 165 eliminated.  When this is done, only a single variable may be renamed at
 166 once.  For instance, @samp{/RENAME=A=B}.  This alternate syntax is
 167 deprecated.
 168
 169 DROP, KEEP, and RENAME are executed in left-to-right order.
 170 Each may be present any number of times.  @cmd{GET} never modifies a
 171 file on disk.  Only the active dataset read from the file
 172 is affected by these subcommands.
 173
 174 @cmd{GET} does not cause the data to be read, only the dictionary.  The data
 175 is read later, when a procedure is executed.
 176
 177 Use of @cmd{GET} to read a portable file or scratch file is a PSPP
 178 extension.
 179
 180 @node GET DATA
 181 @section GET DATA
 182 @vindex GET DATA
 183
 184 @display
 185 GET DATA
 186         /TYPE=@{GNM,PSQL,TXT@}
 187         @dots{}additional subcommands depending on TYPE@dots{}
 188 @end display
 189
 190 The @cmd{GET DATA} command is used to read files and other data
 191 sources created by other applications.  When this command is executed,
 192 the current dictionary and active dataset are replaced with variables
 193 and data read from the specified source.
 194
 195 The TYPE subcommand is mandatory and must be the first subcommand
 196 specified.  It determines the type of the file or source to read.
 197 PSPP currently supports the following file types:
 198
 199 @table @asis
 200 @item GNM
 201 Spreadsheet files created by Gnumeric (@url{http://gnumeric.org}).
 202
 203 @item PSQL
 204 Relations from PostgreSQL databases (@url{http://postgresql.org}).
 205
 206 @item TXT
 207 Textual data files in columnar and delimited formats.
 208 @end table
 209
 210 Each supported file type has additional subcommands, explained in
 211 separate sections below.
 212
 213 @menu
 214 * GET DATA /TYPE=GNM::
 215 * GET DATA /TYPE=PSQL::
 216 * GET DATA /TYPE=TXT::
 217 @end menu
 218
 219 @node GET DATA /TYPE=GNM
 220 @subsection Gnumeric Spreadsheet Files
 221
 222 @display
 223 GET DATA /TYPE=GNM
 224         /FILE=@{'file-name'@}
 225         /SHEET=@{NAME 'sheet-name', INDEX n@}
 226         /CELLRANGE=@{RANGE 'range', FULL@}
 227         /READNAMES=@{ON, OFF@}
 228         /ASSUMEDVARWIDTH=n.
 229 @end display
 230
 231 @cindex Gnumeric
 232 @cindex spreadsheet files
 233 To use GET DATA to read a spreadsheet file created by Gnumeric
 234 (@url{http://gnumeric.org}), specify TYPE=GNM to indicate the file's
 235 format and use FILE to indicate the Gnumeric file to be read.  All
 236 other subcommands are optional.
 237
 238 The format of each variable is determined by the format of the spreadsheet
 239 cell containing the first datum for the variable.
 240 If this cell is of string (text) format, then the width of the variable is
 241 determined from the length of the string it contains, unless the
 242 ASSUMEDVARWIDTH subcommand is given.
 243
 244
 245 The FILE subcommand is mandatory. Specify the name of the file
 246 to be read.
 247
 248 The SHEET subcommand specifies the sheet within the spreadsheet file to read.
 249 There are two forms of the SHEET subcommand.
 250 In the first form,
 251 @samp{/SHEET=name @var{sheet-name}}, the string @var{sheet-name} is the
 252 name of the sheet to read.
 253 In the second form, @samp{/SHEET=index @var{idx}}, @var{idx} is a
 254 integer which is the index of the sheet to read.
 255 The first sheet has the index 1.
 256 If the SHEET subcommand is omitted, then the command will read the
 257 first sheet in the file.
 258
 259 The CELLRANGE subcommand specifies the range of cells within the sheet to read.
 260 If the subcommand is given as @samp{/CELLRANGE=FULL}, then the entire
 261 sheet  is read.
 262 To read only part of a sheet, use the form
 263 @samp{/CELLRANGE=range '@var{top-left-cell}:@var{bottom-right-cell}'}.
 264 For example, the subcommand @samp{/CELLRANGE=range 'C3:P19'} reads
 265 columns C--P, and rows 3--19 inclusive.
 266 If no CELLRANGE subcommand is given, then the entire sheet is read.
 267
 268 If @samp{/READNAMES=ON} is specified, then the contents of cells of
 269 the first row are used as the names of the variables in which to store
 270 the data from subsequent rows.
 271 If the READNAMES command is omitted, or if @samp{/READNAMES=OFF} is
 272 used, then the variables  receive automatically assigned names.
 273
 274 The ASSUMEDVARWIDTH subcommand specifies the maximum width of string
 275 variables read  from the file.
 276 If omitted, the default value is determined from the length of the
 277 string in the first spreadsheet cell for each variable.
 278
 279
 280 @node GET DATA /TYPE=PSQL
 281 @subsection Postgres Database Queries
 282
 283 @display
 284 GET DATA /TYPE=PSQL
 285          /CONNECT=@{connection info@}
 286          /SQL=@{query@}
 287          [/ASSUMEDVARWIDTH=n]
 288          [/UNENCRYPTED]
 289          [/BSIZE=n].
 290 @end display
 291
 292 @cindex postgres
 293 @cindex databases
 294
 295 The PSQL type is used to import data from a postgres database server.
 296 The server may be located locally or remotely.
 297 Variables are automatically created based on the table column names
 298 or the names specified in the SQL query.
 299 Postgres data types of high precision, will loose precision when
 300 imported into PSPP.
 301 Not all the postgres data types are able to be represented in PSPP.
 302 If a datum cannot be represented a warning will be issued and that
 303 datum will be set to SYSMIS.
 304
 305 The CONNECT subcommand is mandatory.
 306 It is a string specifying the parameters of the database server from
 307 which the data should be fetched.
 308 The format of the string is given in the postgres manual
 309 @url{http://www.postgresql.org/docs/8.0/static/libpq.html#LIBPQ-CONNECT}.
 310
 311 The SQL subcommand is mandatory.
 312 It must be a valid SQL string to retrieve data from the database.
 313
 314 The ASSUMEDVARWIDTH subcommand specifies the maximum width of string
 315 variables read  from the database.
 316 If omitted, the default value is determined from the length of the
 317 string in the first value read for each variable.
 318
 319 The UNENCRYPTED subcommand allows data to be retrieved over an insecure
 320 connection.
 321 If the connection is not encrypted, and the UNENCRYPTED subcommand is not
 322 given, then an error will occur.
 323 Whether or not the connection is
 324 encrypted depends upon the underlying psql library and the
 325 capabilities of the database server.
 326
 327 The BSIZE subcommand serves only to optimise the speed of data transfer.
 328 It specifies an upper limit on
 329 number of cases to fetch from the database at once.
 330 The default value is 4096.
 331 If your SQL statement fetches a large number of cases but only a small number of
 332 variables, then the data transfer may be faster if you increase this value.
 333 Conversely, if the number of variables is large, or if the machine on which
 334 PSPP is running has only a
 335 small amount of memory, then a smaller value will be better.
 336
 337
 338 The following syntax is an example:
 339 @example
 340 GET DATA /TYPE=PSQL
 341      /CONNECT='host=example.com port=5432 dbname=product user=fred passwd=xxxx'
 342      /SQL='select * from manufacturer'.
 343 @end example
 344
 345
 346 @node GET DATA /TYPE=TXT
 347 @subsection Textual Data Files
 348
 349 @display
 350 GET DATA /TYPE=TXT
 351         /FILE=@{'file-name',file_handle@}
 352         [/ARRANGEMENT=@{DELIMITED,FIXED@}]
 353         [/FIRSTCASE=@{first_case@}]
 354         [/IMPORTCASE=@{ALL,FIRST max_cases,PERCENT percent@}]
 355         @dots{}additional subcommands depending on ARRANGEMENT@dots{}
 356 @end display
 357
 358 @cindex text files
 359 @cindex data files
 360 When TYPE=TXT is specified, GET DATA reads data in a delimited or
 361 fixed columnar format, much like DATA LIST (@pxref{DATA LIST}).
 362
 363 The FILE subcommand is mandatory.  Specify the file to be read as
 364 a string file name or (for textual data
 365 only) a file handle (@pxref{File Handles}).
 366
 367 The ARRANGEMENT subcommand determines the file's basic format.
 368 DELIMITED, the default setting, specifies that fields in the input
 369 data are separated by spaces, tabs, or other user-specified
 370 delimiters.  FIXED specifies that fields in the input data appear at
 371 particular fixed column positions within records of a case.
 372
 373 By default, cases are read from the input file starting from the first
 374 line.  To skip lines at the beginning of an input file, set FIRSTCASE
 375 to the number of the first line to read: 2 to skip the first line, 3
 376 to skip the first two lines, and so on.
 377
 378 IMPORTCASE can be used to limit the number of cases read from the
 379 input file.  With the default setting, ALL, all cases in the file are
 380 read.  Specify FIRST @i{max_cases} to read at most @i{max_cases} cases
 381 from the file.  Use PERCENT @i{percent} to read only @i{percent}
 382 percent, approximately, of the cases contained in the file.  (The
 383 percentage is approximate, because there is no way to accurately count
 384 the number of cases in the file without reading the entire file.  The
 385 number of cases in some kinds of unusual files cannot be estimated;
 386 PSPP will read all cases in such files.)
 387
 388 FIRSTCASE and IMPORTCASE may be used with delimited and fixed-format
 389 data.  The remaining subcommands, which apply only to one of the two  file
 390 arrangements, are described below.
 391
 392 @menu
 393 * GET DATA /TYPE=TXT /ARRANGEMENT=DELIMITED::
 394 * GET DATA /TYPE=TXT /ARRANGEMENT=FIXED::
 395 @end menu
 396
 397 @node GET DATA /TYPE=TXT /ARRANGEMENT=DELIMITED
 398 @subsubsection Reading Delimited Data
 399
 400 @display
 401 GET DATA /TYPE=TXT
 402         /FILE=@{'file-name',file_handle@}
 403         [/ARRANGEMENT=@{DELIMITED,FIXED@}]
 404         [/FIRSTCASE=@{first_case@}]
 405         [/IMPORTCASE=@{ALL,FIRST max_cases,PERCENT percent@}]
 406
 407         /DELIMITERS="delimiters"
 408         [/QUALIFIER="quotes" [/ESCAPE]]
 409         [/DELCASE=@{LINE,VARIABLES n_variables@}]
 410         /VARIABLES=del_var [del_var]@dots{}
 411 where each del_var takes the form:
 412         variable format
 413 @end display
 414
 415 The GET DATA command with TYPE=TXT and ARRANGEMENT=DELIMITED reads
 416 input data from text files in delimited format, where fields are
 417 separated by a set of user-specified delimiters.  Its capabilities are
 418 similar to those of DATA LIST FREE (@pxref{DATA LIST FREE}), with a
 419 few enhancements.
 420
 421 The required FILE subcommand and optional FIRSTCASE and IMPORTCASE
 422 subcommands are described above (@pxref{GET DATA /TYPE=TXT}).
 423
 424 DELIMITERS, which is required, specifies the set of characters that
 425 may separate fields.  Each character in the string specified on
 426 DELIMITERS separates one field from the next.  The end of a line also
 427 separates fields, regardless of DELIMITERS.  Two consecutive
 428 delimiters in the input yield an empty field, as does a delimiter at
 429 the end of a line.  A space character as a delimiter is an exception:
 430 consecutive spaces do not yield an empty field and neither does any
 431 number of spaces at the end of a line.
 432
 433 To use a tab as a delimiter, specify @samp{\t} at the beginning of the
 434 DELIMITERS string.  To use a backslash as a delimiter, specify
 435 @samp{\\} as the first delimiter or, if a tab should also be a
 436 delimiter, immediately following @samp{\t}.  To read a data file in
 437 which each field appears on a separate line, specify the empty string
 438 for DELIMITERS.
 439
 440 The optional QUALIFIER subcommand names one or more characters that
 441 can be used to quote values within fields in the input.  A field that
 442 begins with one of the specified quote characters ends at the next
 443 matching quote.  Intervening delimiters become part of the field,
 444 instead of terminating it.  The ability to specify more than one quote
 445 character is a PSPP extension.
 446
 447 By default, a character specified on QUALIFIER cannot itself be
 448 embedded within a field that it quotes, because the quote character
 449 always terminates the quoted field.  With ESCAPE, however, a doubled
 450 quote character within a quoted field inserts a single instance of the
 451 quote into the field.  For example, if @samp{'} is specified on
 452 QUALIFIER, then without ESCAPE @code{'a''b'} specifies a pair of
 453 fields that contain @samp{a} and @samp{b}, but with ESCAPE it
 454 specifies a single field that contains @samp{a'b}.  ESCAPE is a PSPP
 455 extension.
 456
 457 The DELCASE subcommand controls how data may be broken across lines in
 458 the data file.  With LINE, the default setting, each line must contain
 459 all the data for exactly one case.  For additional flexibility, to
 460 allow a single case to be split among lines or multiple cases to be
 461 contained on a single line, specify VARIABLES @i{n_variables}, where
 462 @i{n_variables} is the number of variables per case.
 463
 464 The VARIABLES subcommand is required and must be the last subcommand.
 465 Specify the name of each variable and its input format (@pxref{Input
 466 and Output Formats}) in the order they should be read from the input
 467 file.
 468
 469 @subsubheading Examples
 470
 471 @noindent
 472 On a Unix-like system, the @samp{/etc/passwd} file has a format
 473 similar to this:
 474
 475 @example
 476 root:$1$nyeSP5gD$pDq/:0:0:,,,:/root:/bin/bash
 477 blp:$1$BrP/pFg4$g7OG:1000:1000:Ben Pfaff,,,:/home/blp:/bin/bash
 478 john:$1$JBuq/Fioq$g4A:1001:1001:John Darrington,,,:/home/john:/bin/bash
 479 jhs:$1$D3li4hPL$88X1:1002:1002:Jason Stover,,,:/home/jhs:/bin/csh
 480 @end example
 481
 482 @noindent
 483 The following syntax reads a file in the format used by
 484 @samp{/etc/passwd}:
 485
 486 @c If you change this example, change the regression test in
 487 @c tests/language/data-io/get-data.at to match.
 488 @example
 489 GET DATA /TYPE=TXT /FILE='/etc/passwd' /DELIMITERS=':'
 490         /VARIABLES=username A20
 491                    password A40
 492                    uid F10
 493                    gid F10
 494                    gecos A40
 495                    home A40
 496                    shell A40.
 497 @end example
 498
 499 @noindent
 500 Consider the following data on used cars:
 501
 502 @example
 503 model   year    mileage price   type    age
 504 Civic   2002    29883   15900   Si      2
 505 Civic   2003    13415   15900   EX      1
 506 Civic   1992    107000  3800    n/a     12
 507 Accord  2002    26613   17900   EX      1
 508 @end example
 509
 510 @noindent
 511 The following syntax can be used to read the used car data:
 512
 513 @c If you change this example, change the regression test in
 514 @c tests/language/data-io/get-data.at to match.
 515 @example
 516 GET DATA /TYPE=TXT /FILE='cars.data' /DELIMITERS=' ' /FIRSTCASE=2
 517         /VARIABLES=model A8
 518                    year F4
 519                    mileage F6
 520                    price F5
 521                    type A4
 522                    age F2.
 523 @end example
 524
 525 @noindent
 526 Consider the following information on animals in a pet store:
 527
 528 @example
 529 'Pet''s Name', "Age", "Color", "Date Received", "Price", "Height", "Type"
 530 , (Years), , , (Dollars), ,
 531 "Rover", 4.5, Brown, "12 Feb 2004", 80, '1''4"', "Dog"
 532 "Charlie", , Gold, "5 Apr 2007", 12.3, "3""", "Fish"
 533 "Molly", 2, Black, "12 Dec 2006", 25, '5"', "Cat"
 534 "Gilly", , White, "10 Apr 2007", 10, "3""", "Guinea Pig"
 535 @end example
 536
 537 @noindent
 538 The following syntax can be used to read the pet store data:
 539
 540 @c If you change this example, change the regression test in
 541 @c tests/language/data-io/get-data.at to match.
 542 @example
 543 GET DATA /TYPE=TXT /FILE='pets.data' /DELIMITERS=', ' /QUALIFIER='''"' /ESCAPE
 544         /FIRSTCASE=3
 545         /VARIABLES=name A10
 546                    age F3.1
 547                    color A5
 548                    received EDATE10
 549                    price F5.2
 550                    height a5
 551                    type a10.
 552 @end example
 553
 554 @node GET DATA /TYPE=TXT /ARRANGEMENT=FIXED
 555 @subsubsection Reading Fixed Columnar Data
 556
 557 @display
 558 GET DATA /TYPE=TXT
 559         /FILE=@{'file-name',file_handle@}
 560         [/ARRANGEMENT=@{DELIMITED,FIXED@}]
 561         [/FIRSTCASE=@{first_case@}]
 562         [/IMPORTCASE=@{ALL,FIRST max_cases,PERCENT percent@}]
 563
 564         [/FIXCASE=n]
 565         /VARIABLES fixed_var [fixed_var]@dots{}
 566             [/rec# fixed_var [fixed_var]@dots{}]@dots{}
 567 where each fixed_var takes the form:
 568         variable start-end format
 569 @end display
 570
 571 The GET DATA command with TYPE=TXT and ARRANGEMENT=FIXED reads input
 572 data from text files in fixed format, where each field is located in
 573 particular fixed column positions within records of a case.  Its
 574 capabilities are similar to those of DATA LIST FIXED (@pxref{DATA LIST
 575 FIXED}), with a few enhancements.
 576
 577 The required FILE subcommand and optional FIRSTCASE and IMPORTCASE
 578 subcommands are described above (@pxref{GET DATA /TYPE=TXT}).
 579
 580 The optional FIXCASE subcommand may be used to specify the positive
 581 integer number of input lines that make up each case.  The default
 582 value is 1.
 583
 584 The VARIABLES subcommand, which is required, specifies the positions
 585 at which each variable can be found.  For each variable, specify its
 586 name, followed by its start and end column separated by @samp{-}
 587 (e.g.@: @samp{0-9}), followed by the input format type (e.g.@:
 588 @samp{F}).  For this command, columns are numbered starting from 0 at
 589 the left column.  Introduce the variables in the second and later
 590 lines of a case by a slash followed by the number of the line within
 591 the case, e.g.@: @samp{/2} for the second line.
 592
 593 @subsubheading Examples
 594
 595 @noindent
 596 Consider the following data on used cars:
 597
 598 @example
 599 model   year    mileage price   type    age
 600 Civic   2002    29883   15900   Si      2
 601 Civic   2003    13415   15900   EX      1
 602 Civic   1992    107000  3800    n/a     12
 603 Accord  2002    26613   17900   EX      1
 604 @end example
 605
 606 @noindent
 607 The following syntax can be used to read the used car data:
 608
 609 @c If you change this example, change the regression test in
 610 @c tests/language/data-io/get-data.at to match.
 611 @example
 612 GET DATA /TYPE=TXT /FILE='cars.data' /ARRANGEMENT=FIXED /FIRSTCASE=2
 613         /VARIABLES=model 0-7 A
 614                    year 8-15 F
 615                    mileage 16-23 F
 616                    price 24-31 F
 617                    type 32-40 A
 618                    age 40-47 F.
 619 @end example
 620
 621 @node IMPORT
 622 @section IMPORT
 623 @vindex IMPORT
 624
 625 @display
 626 IMPORT
 627         /FILE='file-name'
 628         /TYPE=@{COMM,TAPE@}
 629         /DROP=var_list
 630         /KEEP=var_list
 631         /RENAME=(src_names=target_names)@dots{}
 632 @end display
 633
 634 The @cmd{IMPORT} transformation clears the active dataset dictionary and
 635 data and
 636 replaces them with a dictionary and data from a system, portable file,
 637 or scratch file.
 638
 639 The FILE subcommand, which is the only required subcommand, specifies
 640 the portable file to be read as a file name string or a file handle
 641 (@pxref{File Handles}).
 642
 643 The TYPE subcommand is currently not used.
 644
 645 DROP, KEEP, and RENAME follow the syntax used by @cmd{GET} (@pxref{GET}).
 646
 647 @cmd{IMPORT} does not cause the data to be read, only the dictionary.  The
 648 data is read later, when a procedure is executed.
 649
 650 Use of @cmd{IMPORT} to read a system file or scratch file is a PSPP
 651 extension.
 652
 653 @node SAVE
 654 @section SAVE
 655 @vindex SAVE
 656
 657 @display
 658 SAVE
 659         /OUTFILE=@{'file-name',file_handle@}
 660         /UNSELECTED=@{RETAIN,DELETE@}
 661         /@{COMPRESSED,UNCOMPRESSED@}
 662         /PERMISSIONS=@{WRITEABLE,READONLY@}
 663         /DROP=var_list
 664         /KEEP=var_list
 665         /VERSION=version
 666         /RENAME=(src_names=target_names)@dots{}
 667         /NAMES
 668         /MAP
 669 @end display
 670
 671 The @cmd{SAVE} procedure causes the dictionary and data in the active
 672 dataset to
 673 be written to a system file or scratch file.
 674
 675 OUTFILE is the only required subcommand.  Specify the system file or
 676 scratch file to be written as a string file name or a file handle
 677 (@pxref{File Handles}).
 678
 679 By default, cases excluded with FILTER are written to the system file.
 680 These can be excluded by specifying DELETE on the UNSELECTED
 681 subcommand.  Specifying RETAIN makes the default explicit.
 682
 683 The COMPRESS and UNCOMPRESS subcommand determine whether the saved
 684 system file is compressed.  By default, system files are compressed.
 685 This default can be changed with the SET command (@pxref{SET}).
 686
 687 The PERMISSIONS subcommand specifies permissions for the new system
 688 file.  WRITEABLE, the default, creates the file with read and write
 689 permission.  READONLY creates the file for read-only access.
 690
 691 By default, all the variables in the active dataset dictionary are written
 692 to the system file.  The DROP subcommand can be used to specify a list
 693 of variables not to be written.  In contrast, KEEP specifies variables
 694 to be written, with all variables not specified not written.
 695
 696 Normally variables are saved to a system file under the same names they
 697 have in the active dataset.  Use the RENAME subcommand to change these names.
 698 Specify, within parentheses, a list of variable names followed by an
 699 equals sign (@samp{=}) and the names that they should be renamed to.
 700 Multiple parenthesized groups of variable names can be included on a
 701 single RENAME subcommand.  Variables' names may be swapped using a
 702 RENAME subcommand of the form @samp{/RENAME=(A B=B A)}.
 703
 704 Alternate syntax for the RENAME subcommand allows the parentheses to be
 705 eliminated.  When this is done, only a single variable may be renamed at
 706 once.  For instance, @samp{/RENAME=A=B}.  This alternate syntax is
 707 deprecated.
 708
 709 DROP, KEEP, and RENAME are performed in left-to-right order.  They
 710 each may be present any number of times.  @cmd{SAVE} never modifies
 711 the active dataset.  DROP, KEEP, and RENAME only affect the system file
 712 written to disk.
 713
 714 The VERSION subcommand specifies the version of the file format. Valid
 715 versions are 2 and 3.  The default version is 3.  In version 2 system
 716 files, variable names longer than 8 bytes will be truncated.  The two
 717 versions are otherwise identical.
 718
 719 The NAMES and MAP subcommands are currently ignored.
 720
 721 @cmd{SAVE} causes the data to be read.  It is a procedure.
 722
 723 @node SAVE TRANSLATE
 724 @section SAVE TRANSLATE
 725 @vindex SAVE TRANSLATE
 726
 727 @display
 728 SAVE TRANSLATE
 729         /OUTFILE=@{'file-name',file_handle@}
 730         /TYPE=@{CSV,TAB@}
 731         [/REPLACE]
 732         [/MISSING=@{IGNORE,RECODE@}]
 733
 734         [/DROP=var_list]
 735         [/KEEP=var_list]
 736         [/RENAME=(src_names=target_names)@dots{}]
 737         [/UNSELECTED=@{RETAIN,DELETE@}]
 738         [/MAP]
 739
 740         @dots{}additional subcommands depending on TYPE@dots{}
 741 @end display
 742
 743 The @cmd{SAVE TRANSLATE} command is used to save data into various
 744 formats understood by other applications.
 745
 746 The OUTFILE and TYPE subcommands are mandatory.  OUTFILE specifies the
 747 file to be written, as a string file name or a file handle
 748 (@pxref{File Handles}).  TYPE determines the type of the file or
 749 source to read.  It must be one of the following:
 750
 751 @table @asis
 752 @item CSV
 753 Comma-separated value format,
 754
 755 @item TAB
 756 Tab-delimited format.
 757 @end table
 758
 759 By default, SAVE TRANSLATE will not overwrite an existing file.  Use
 760 REPLACE to force an existing file to be overwritten.
 761
 762 With MISSING=IGNORE, the default, SAVE TRANSLATE treats user-missing
 763 values as if they were not missing.  Specify MISSING=RECODE to output
 764 numeric user-missing values like system-missing values and string
 765 user-missing values as all spaces.
 766
 767 By default, all the variables in the active dataset dictionary are saved
 768 to the system file, but DROP or KEEP can select a subset of variable
 769 to save.  The RENAME subcommand can also be used to change the names
 770 under which variables are saved.  UNSELECTED determines whether cases
 771 filtered out by the FILTER command are written to the output file.
 772 These subcommands have the same syntax and meaning as on the
 773 @cmd{SAVE} command (@pxref{SAVE}).
 774
 775 Each supported file type has additional subcommands, explained in
 776 separate sections below.
 777
 778 @cmd{SAVE TRANSLATE} causes the data to be read.  It is a procedure.
 779
 780 @menu
 781 * SAVE TRANSLATE /TYPE=CSV and TYPE=TAB::
 782 @end menu
 783
 784 @node SAVE TRANSLATE /TYPE=CSV and TYPE=TAB
 785 @subsection Writing Comma- and Tab-Separated Data Files
 786
 787 @display
 788 SAVE TRANSLATE
 789         /OUTFILE=@{'file-name',file_handle@}
 790         /TYPE=CSV
 791         [/REPLACE]
 792         [/MISSING=@{IGNORE,RECODE@}]
 793
 794         [/DROP=var_list]
 795         [/KEEP=var_list]
 796         [/RENAME=(src_names=target_names)@dots{}]
 797         [/UNSELECTED=@{RETAIN,DELETE@}]
 798
 799         [/FIELDNAMES]
 800         [/CELLS=@{VALUES,LABELS@}]
 801         [/TEXTOPTIONS DELIMITER='delimiter']
 802         [/TEXTOPTIONS QUALIFIER='qualifier']
 803         [/TEXTOPTIONS DECIMAL=@{DOT,COMMA@}]
 804         [/TEXTOPTIONS FORMAT=@{PLAIN,VARIABLE@}]
 805 @end display
 806
 807 The SAVE TRANSLATE command with TYPE=CSV or TYPE=TAB writes data in a
 808 comma- or tab-separated value format similar to that described by
 809 RFC@tie{}4180.  Each variable becomes one output column, and each case
 810 becomes one line of output.  If FIELDNAMES is specified, an additional
 811 line at the top of the output file lists variable names.
 812
 813 The CELLS and TEXTOPTIONS FORMAT settings determine how values are
 814 written to the output file:
 815
 816 @table @asis
 817 @item CELLS=VALUES FORMAT=PLAIN (the default settings)
 818 Writes variables to the output in ``plain'' formats that ignore the
 819 details of variable formats.  Numeric values are written as plain
 820 decimal numbers with enough digits to indicate their exact values in
 821 machine representation.  Numeric values include @samp{e} followed by
 822 an exponent if the exponent value would be less than -4 or greater
 823 than 16.  Dates are written in MM/DD/YYYY format and times in HH:MM:SS
 824 format.  WKDAY and MONTH values are written as decimal numbers.
 825
 826 Numeric values use, by default, the decimal point character set with
 827 SET DECIMAL (@pxref{SET DECIMAL}).  Use DECIMAL=DOT or DECIMAL=COMMA
 828 to force a particular decimal point character.
 829
 830 @item CELLS=VALUES FORMAT=VARIABLE
 831 Writes variables using their print formats.  Leading and trailing
 832 spaces are removed from numeric values, and trailing spaces are
 833 removed from string values.
 834
 835 @item CELLS=LABEL FORMAT=PLAIN
 836 @itemx CELLS=LABEL FORMAT=VARIABLE
 837 Writes value labels where they exist, and otherwise writes the values
 838 themselves as described above.
 839 @end table
 840
 841 Regardless of CELLS and TEXTOPTIONS FORMAT, numeric system-missing
 842 values are output as a single space.
 843
 844 For TYPE=TAB, tab characters delimit values.  For TYPE=CSV, the
 845 TEXTOPTIONS DELIMITER and DECIMAL settings determine the character
 846 that separate values within a line.  If DELIMITER is specified, then
 847 the specified string separate values.  If DELIMITER is not specified,
 848 then the default is a comma with DECIMAL=DOT or a semicolon with
 849 DECIMAL=COMMA.  If DECIMAL is not given either, it is implied by the
 850 decimal point character set with SET DECIMAL (@pxref{SET DECIMAL}).
 851
 852 The TEXTOPTIONS QUALIFIER setting specifies a character that is output
 853 before and after a value that contains the delimiter character or the
 854 qualifier character.  The default is a double quote (@samp{@@}).  A
 855 qualifier character that appears within a value is doubled.
 856
 857 @node SYSFILE INFO
 858 @section SYSFILE INFO
 859 @vindex SYSFILE INFO
 860
 861 @display
 862 SYSFILE INFO FILE='file-name'.
 863 @end display
 864
 865 @cmd{SYSFILE INFO} reads the dictionary in a system file and
 866 displays the information in its dictionary.
 867
 868 Specify a file name or file handle.  @cmd{SYSFILE INFO} reads that file as
 869 a system file and displays information on its dictionary.
 870
 871 @cmd{SYSFILE INFO} does not affect the current active dataset.
 872
 873 @node XEXPORT
 874 @section XEXPORT
 875 @vindex XEXPORT
 876
 877 @display
 878 XEXPORT
 879         /OUTFILE='file-name'
 880         /DIGITS=n
 881         /DROP=var_list
 882         /KEEP=var_list
 883         /RENAME=(src_names=target_names)@dots{}
 884         /TYPE=@{COMM,TAPE@}
 885         /MAP
 886 @end display
 887
 888 The @cmd{EXPORT} transformation writes the active dataset dictionary and
 889 data to a specified portable file.
 890
 891 This transformation is a PSPP extension.
 892
 893 It is similar to the @cmd{EXPORT} procedure, with two differences:
 894
 895 @itemize
 896 @item
 897 @cmd{XEXPORT} is a transformation, not a procedure.  It is executed when
 898 the data is read by a procedure or procedure-like command.
 899
 900 @item
 901 @cmd{XEXPORT} does not support the UNSELECTED subcommand.
 902 @end itemize
 903
 904 @xref{EXPORT}, for more information.
 905
 906 @node XSAVE
 907 @section XSAVE
 908 @vindex XSAVE
 909
 910 @display
 911 XSAVE
 912         /OUTFILE='file-name'
 913         /@{COMPRESSED,UNCOMPRESSED@}
 914         /PERMISSIONS=@{WRITEABLE,READONLY@}
 915         /DROP=var_list
 916         /KEEP=var_list
 917         /VERSION=version
 918         /RENAME=(src_names=target_names)@dots{}
 919         /NAMES
 920         /MAP
 921 @end display
 922
 923 The @cmd{XSAVE} transformation writes the active dataset's dictionary and
 924 data to a system file or scratch file.  It is similar to the @cmd{SAVE}
 925 procedure, with two differences:
 926
 927 @itemize
 928 @item
 929 @cmd{XSAVE} is a transformation, not a procedure.  It is executed when
 930 the data is read by a procedure or procedure-like command.
 931
 932 @item
 933 @cmd{XSAVE} does not support the UNSELECTED subcommand.
 934 @end itemize
 935
 936 @xref{SAVE}, for more information.