pintos-os.org Git - pspp/blob - doc/files.texi

   1 @node System and Portable File IO
   2 @chapter System and Portable File I/O
   3
   4 The commands in this chapter read, write, and examine system files and
   5 portable files.
   6
   7 @menu
   8 * APPLY DICTIONARY::            Apply system file dictionary to active dataset.
   9 * EXPORT::                      Write to a portable file.
  10 * GET::                         Read from a system file.
  11 * GET DATA::                    Read from foreign files.
  12 * IMPORT::                      Read from a portable file.
  13 * SAVE::                        Write to a system file.
  14 * SAVE TRANSLATE::              Write data in foreign file formats.
  15 * SYSFILE INFO::                Display system file dictionary.
  16 * XEXPORT::                     Write to a portable file, as a transformation.
  17 * XSAVE::                       Write to a system file, as a transformation.
  18 @end menu
  19
  20 @node APPLY DICTIONARY
  21 @section APPLY DICTIONARY
  22 @vindex APPLY DICTIONARY
  23
  24 @display
  25 APPLY DICTIONARY FROM=@{'file-name',file_handle@}.
  26 @end display
  27
  28 @cmd{APPLY DICTIONARY} applies the variable labels, value labels,
  29 and missing values taken from a file to corresponding
  30 variables in the active dataset.  In some cases it also updates the
  31 weighting variable.
  32
  33 Specify a system file or portable file's name, a data set name
  34 (@pxref{Datasets}), or a file handle name (@pxref{File Handles}).  The
  35 dictionary in the file will be read, but it will not replace the
  36 active dataset's dictionary.  The file's data will not be read.
  37
  38 Only variables with names that exist in both the active dataset and the
  39 system file are considered.  Variables with the same name but different
  40 types (numeric, string) will cause an error message.  Otherwise, the
  41 system file variables' attributes will replace those in their matching
  42 active dataset variables:
  43
  44 @itemize @bullet
  45 @item
  46 If a system file variable has a variable label, then it will replace
  47 the variable label of the active dataset variable.  If the system
  48 file variable does not have a variable label, then the active dataset
  49 variable's variable label, if any, will be retained.
  50
  51 @item
  52 If the system file variable has custom attributes (@pxref{VARIABLE
  53 ATTRIBUTE}), then those attributes replace the active dataset variable's
  54 custom attributes.  If the system file variable does not have custom
  55 attributes, then the active dataset variable's custom attributes, if any,
  56 will be retained.
  57
  58 @item
  59 If the active dataset variable is numeric or short string, then value
  60 labels and missing values, if any, will be copied to the active dataset
  61 variable.  If the system file variable does not have value labels or
  62 missing values, then those in the active dataset variable, if any, will not
  63 be disturbed.
  64 @end itemize
  65
  66 In addition to properties of variables, some properties of the active
  67 file dictionary as a whole are updated:
  68
  69 @itemize @bullet
  70 @item
  71 If the system file has custom attributes (@pxref{DATAFILE ATTRIBUTE}),
  72 then those attributes replace the active dataset variable's custom
  73 attributes.
  74
  75 @item
  76 If the active dataset has a weighting variable (@pxref{WEIGHT}), and the
  77 system file does not, or if the weighting variable in the system file
  78 does not exist in the active dataset, then the active dataset weighting
  79 variable, if any, is retained.  Otherwise, the weighting variable in
  80 the system file becomes the active dataset weighting variable.
  81 @end itemize
  82
  83 @cmd{APPLY DICTIONARY} takes effect immediately.  It does not read the
  84 active dataset.  The system file is not modified.
  85
  86 @node EXPORT
  87 @section EXPORT
  88 @vindex EXPORT
  89
  90 @display
  91 EXPORT
  92         /OUTFILE='file-name'
  93         /UNSELECTED=@{RETAIN,DELETE@}
  94         /DIGITS=n
  95         /DROP=var_list
  96         /KEEP=var_list
  97         /RENAME=(src_names=target_names)@dots{}
  98         /TYPE=@{COMM,TAPE@}
  99         /MAP
 100 @end display
 101
 102 The @cmd{EXPORT} procedure writes the active dataset's dictionary and
 103 data to a specified portable file.
 104
 105 By default, cases excluded with FILTER are written to the
 106 file.  These can be excluded by specifying DELETE on the UNSELECTED
 107 subcommand.  Specifying RETAIN makes the default explicit.
 108
 109 Portable files express real numbers in base 30.  Integers are always
 110 expressed to the maximum precision needed to make them exact.
 111 Non-integers are, by default, expressed to the machine's maximum
 112 natural precision (approximately 15 decimal digits on many machines).
 113 If many numbers require this many digits, the portable file may
 114 significantly increase in size.  As an alternative, the DIGITS
 115 subcommand may be used to specify the number of decimal digits of
 116 precision to write.  DIGITS applies only to non-integers.
 117
 118 The OUTFILE subcommand, which is the only required subcommand, specifies
 119 the portable file to be written as a file name string or
 120 a file handle (@pxref{File Handles}).
 121
 122 DROP, KEEP, and RENAME follow the same format as the SAVE procedure
 123 (@pxref{SAVE}).
 124
 125 The TYPE subcommand specifies the character set for use in the
 126 portable file.  Its value is currently not used.
 127
 128 The MAP subcommand is currently ignored.
 129
 130 @cmd{EXPORT} is a procedure.  It causes the active dataset to be read.
 131
 132 @node GET
 133 @section GET
 134 @vindex GET
 135
 136 @display
 137 GET
 138         /FILE=@{'file-name',file_handle@}
 139         /DROP=var_list
 140         /KEEP=var_list
 141         /RENAME=(src_names=target_names)@dots{}
 142 @end display
 143
 144 @cmd{GET} clears the current dictionary and active dataset and
 145 replaces them with the dictionary and data from a specified file.
 146
 147 The FILE subcommand is the only required subcommand.  Specify the system
 148 file or portable file to be read as a string file name or
 149 a file handle (@pxref{File Handles}).
 150
 151 By default, all the variables in a file are read.  The DROP
 152 subcommand can be used to specify a list of variables that are not to be
 153 read.  By contrast, the KEEP subcommand can be used to specify variable
 154 that are to be read, with all other variables not read.
 155
 156 Normally variables in a file retain the names that they were
 157 saved under.  Use the RENAME subcommand to change these names.  Specify,
 158 within parentheses, a list of variable names followed by an equals sign
 159 (@samp{=}) and the names that they should be renamed to.  Multiple
 160 parenthesized groups of variable names can be included on a single
 161 RENAME subcommand.  Variables' names may be swapped using a RENAME
 162 subcommand of the form @samp{/RENAME=(A B=B A)}.
 163
 164 Alternate syntax for the RENAME subcommand allows the parentheses to be
 165 eliminated.  When this is done, only a single variable may be renamed at
 166 once.  For instance, @samp{/RENAME=A=B}.  This alternate syntax is
 167 deprecated.
 168
 169 DROP, KEEP, and RENAME are executed in left-to-right order.
 170 Each may be present any number of times.  @cmd{GET} never modifies a
 171 file on disk.  Only the active dataset read from the file
 172 is affected by these subcommands.
 173
 174 @cmd{GET} does not cause the data to be read, only the dictionary.  The data
 175 is read later, when a procedure is executed.
 176
 177 Use of @cmd{GET} to read a portable file is a PSPP extension.
 178
 179 @node GET DATA
 180 @section GET DATA
 181 @vindex GET DATA
 182
 183 @display
 184 GET DATA
 185         /TYPE=@{GNM,ODS,PSQL,TXT@}
 186         @dots{}additional subcommands depending on TYPE@dots{}
 187 @end display
 188
 189 The @cmd{GET DATA} command is used to read files and other data
 190 sources created by other applications.  When this command is executed,
 191 the current dictionary and active dataset are replaced with variables
 192 and data read from the specified source.
 193
 194 The TYPE subcommand is mandatory and must be the first subcommand
 195 specified.  It determines the type of the file or source to read.
 196 PSPP currently supports the following file types:
 197
 198 @table @asis
 199 @item GNM
 200 Spreadsheet files created by Gnumeric (@url{http://gnumeric.org}).
 201
 202 @item ODS
 203 Spreadsheet files in OpenDocument format.
 204
 205 @item PSQL
 206 Relations from PostgreSQL databases (@url{http://postgresql.org}).
 207
 208 @item TXT
 209 Textual data files in columnar and delimited formats.
 210 @end table
 211
 212 Each supported file type has additional subcommands, explained in
 213 separate sections below.
 214
 215 @menu
 216 * GET DATA /TYPE=GNM/ODS::     Spreadsheets
 217 * GET DATA /TYPE=PSQL::        Databases
 218 * GET DATA /TYPE=TXT::         Delimited Text Files
 219 @end menu
 220
 221 @node GET DATA /TYPE=GNM/ODS
 222 @subsection Spreadsheet Files
 223
 224 @display
 225 GET DATA /TYPE=@{GNM, ODS@}
 226         /FILE=@{'file-name'@}
 227         /SHEET=@{NAME 'sheet-name', INDEX n@}
 228         /CELLRANGE=@{RANGE 'range', FULL@}
 229         /READNAMES=@{ON, OFF@}
 230         /ASSUMEDVARWIDTH=n.
 231 @end display
 232
 233 @cindex Gnumeric
 234 @cindex OpenDocument
 235 @cindex spreadsheet files
 236
 237 Gnumeric spreadsheets (@url{http://gnumeric.org}), and spreadsheets
 238 in OpenDocument format
 239 (@url{http://libreplanet.org/wiki/Group:OpenDocument/Software})
 240 can be read using the GET DATA command.
 241 Use the TYPE subcommand to indicate the file's format.
 242 /TYPE=GNM indicates Gnumeric files,
 243 /TYPE=ODS indicates OpenDocument.
 244 The FILE subcommand is mandatory.
 245 Use it to specify the name file to be read.
 246 All other subcommands are optional.
 247
 248 The format of each variable is determined by the format of the spreadsheet
 249 cell containing the first datum for the variable.
 250 If this cell is of string (text) format, then the width of the variable is
 251 determined from the length of the string it contains, unless the
 252 ASSUMEDVARWIDTH subcommand is given.
 253
 254 The SHEET subcommand specifies the sheet within the spreadsheet file to read.
 255 There are two forms of the SHEET subcommand.
 256 In the first form,
 257 @samp{/SHEET=name @var{sheet-name}}, the string @var{sheet-name} is the
 258 name of the sheet to read.
 259 In the second form, @samp{/SHEET=index @var{idx}}, @var{idx} is a
 260 integer which is the index of the sheet to read.
 261 The first sheet has the index 1.
 262 If the SHEET subcommand is omitted, then the command will read the
 263 first sheet in the file.
 264
 265 The CELLRANGE subcommand specifies the range of cells within the sheet to read.
 266 If the subcommand is given as @samp{/CELLRANGE=FULL}, then the entire
 267 sheet  is read.
 268 To read only part of a sheet, use the form
 269 @samp{/CELLRANGE=range '@var{top-left-cell}:@var{bottom-right-cell}'}.
 270 For example, the subcommand @samp{/CELLRANGE=range 'C3:P19'} reads
 271 columns C--P, and rows 3--19 inclusive.
 272 If no CELLRANGE subcommand is given, then the entire sheet is read.
 273
 274 If @samp{/READNAMES=ON} is specified, then the contents of cells of
 275 the first row are used as the names of the variables in which to store
 276 the data from subsequent rows.  This is the default.
 277 If @samp{/READNAMES=OFF} is
 278 used, then the variables  receive automatically assigned names.
 279
 280 The ASSUMEDVARWIDTH subcommand specifies the maximum width of string
 281 variables read  from the file.
 282 If omitted, the default value is determined from the length of the
 283 string in the first spreadsheet cell for each variable.
 284
 285
 286 @node GET DATA /TYPE=PSQL
 287 @subsection Postgres Database Queries
 288
 289 @display
 290 GET DATA /TYPE=PSQL
 291          /CONNECT=@{connection info@}
 292          /SQL=@{query@}
 293          [/ASSUMEDVARWIDTH=n]
 294          [/UNENCRYPTED]
 295          [/BSIZE=n].
 296 @end display
 297
 298 @cindex postgres
 299 @cindex databases
 300
 301 The PSQL type is used to import data from a postgres database server.
 302 The server may be located locally or remotely.
 303 Variables are automatically created based on the table column names
 304 or the names specified in the SQL query.
 305 Postgres data types of high precision, will loose precision when
 306 imported into PSPP.
 307 Not all the postgres data types are able to be represented in PSPP.
 308 If a datum cannot be represented a warning will be issued and that
 309 datum will be set to SYSMIS.
 310
 311 The CONNECT subcommand is mandatory.
 312 It is a string specifying the parameters of the database server from
 313 which the data should be fetched.
 314 The format of the string is given in the postgres manual
 315 @url{http://www.postgresql.org/docs/8.0/static/libpq.html#LIBPQ-CONNECT}.
 316
 317 The SQL subcommand is mandatory.
 318 It must be a valid SQL string to retrieve data from the database.
 319
 320 The ASSUMEDVARWIDTH subcommand specifies the maximum width of string
 321 variables read  from the database.
 322 If omitted, the default value is determined from the length of the
 323 string in the first value read for each variable.
 324
 325 The UNENCRYPTED subcommand allows data to be retrieved over an insecure
 326 connection.
 327 If the connection is not encrypted, and the UNENCRYPTED subcommand is not
 328 given, then an error will occur.
 329 Whether or not the connection is
 330 encrypted depends upon the underlying psql library and the
 331 capabilities of the database server.
 332
 333 The BSIZE subcommand serves only to optimise the speed of data transfer.
 334 It specifies an upper limit on
 335 number of cases to fetch from the database at once.
 336 The default value is 4096.
 337 If your SQL statement fetches a large number of cases but only a small number of
 338 variables, then the data transfer may be faster if you increase this value.
 339 Conversely, if the number of variables is large, or if the machine on which
 340 PSPP is running has only a
 341 small amount of memory, then a smaller value will be better.
 342
 343
 344 The following syntax is an example:
 345 @example
 346 GET DATA /TYPE=PSQL
 347      /CONNECT='host=example.com port=5432 dbname=product user=fred passwd=xxxx'
 348      /SQL='select * from manufacturer'.
 349 @end example
 350
 351
 352 @node GET DATA /TYPE=TXT
 353 @subsection Textual Data Files
 354
 355 @display
 356 GET DATA /TYPE=TXT
 357         /FILE=@{'file-name',file_handle@}
 358         [/ARRANGEMENT=@{DELIMITED,FIXED@}]
 359         [/FIRSTCASE=@{first_case@}]
 360         [/IMPORTCASE=@{ALL,FIRST max_cases,PERCENT percent@}]
 361         @dots{}additional subcommands depending on ARRANGEMENT@dots{}
 362 @end display
 363
 364 @cindex text files
 365 @cindex data files
 366 When TYPE=TXT is specified, GET DATA reads data in a delimited or
 367 fixed columnar format, much like DATA LIST (@pxref{DATA LIST}).
 368
 369 The FILE subcommand is mandatory.  Specify the file to be read as
 370 a string file name or (for textual data
 371 only) a file handle (@pxref{File Handles}).
 372
 373 The ARRANGEMENT subcommand determines the file's basic format.
 374 DELIMITED, the default setting, specifies that fields in the input
 375 data are separated by spaces, tabs, or other user-specified
 376 delimiters.  FIXED specifies that fields in the input data appear at
 377 particular fixed column positions within records of a case.
 378
 379 By default, cases are read from the input file starting from the first
 380 line.  To skip lines at the beginning of an input file, set FIRSTCASE
 381 to the number of the first line to read: 2 to skip the first line, 3
 382 to skip the first two lines, and so on.
 383
 384 IMPORTCASE can be used to limit the number of cases read from the
 385 input file.  With the default setting, ALL, all cases in the file are
 386 read.  Specify FIRST @i{max_cases} to read at most @i{max_cases} cases
 387 from the file.  Use PERCENT @i{percent} to read only @i{percent}
 388 percent, approximately, of the cases contained in the file.  (The
 389 percentage is approximate, because there is no way to accurately count
 390 the number of cases in the file without reading the entire file.  The
 391 number of cases in some kinds of unusual files cannot be estimated;
 392 PSPP will read all cases in such files.)
 393
 394 FIRSTCASE and IMPORTCASE may be used with delimited and fixed-format
 395 data.  The remaining subcommands, which apply only to one of the two  file
 396 arrangements, are described below.
 397
 398 @menu
 399 * GET DATA /TYPE=TXT /ARRANGEMENT=DELIMITED::
 400 * GET DATA /TYPE=TXT /ARRANGEMENT=FIXED::
 401 @end menu
 402
 403 @node GET DATA /TYPE=TXT /ARRANGEMENT=DELIMITED
 404 @subsubsection Reading Delimited Data
 405
 406 @display
 407 GET DATA /TYPE=TXT
 408         /FILE=@{'file-name',file_handle@}
 409         [/ARRANGEMENT=@{DELIMITED,FIXED@}]
 410         [/FIRSTCASE=@{first_case@}]
 411         [/IMPORTCASE=@{ALL,FIRST max_cases,PERCENT percent@}]
 412
 413         /DELIMITERS="delimiters"
 414         [/QUALIFIER="quotes" [/ESCAPE]]
 415         [/DELCASE=@{LINE,VARIABLES n_variables@}]
 416         /VARIABLES=del_var [del_var]@dots{}
 417 where each del_var takes the form:
 418         variable format
 419 @end display
 420
 421 The GET DATA command with TYPE=TXT and ARRANGEMENT=DELIMITED reads
 422 input data from text files in delimited format, where fields are
 423 separated by a set of user-specified delimiters.  Its capabilities are
 424 similar to those of DATA LIST FREE (@pxref{DATA LIST FREE}), with a
 425 few enhancements.
 426
 427 The required FILE subcommand and optional FIRSTCASE and IMPORTCASE
 428 subcommands are described above (@pxref{GET DATA /TYPE=TXT}).
 429
 430 DELIMITERS, which is required, specifies the set of characters that
 431 may separate fields.  Each character in the string specified on
 432 DELIMITERS separates one field from the next.  The end of a line also
 433 separates fields, regardless of DELIMITERS.  Two consecutive
 434 delimiters in the input yield an empty field, as does a delimiter at
 435 the end of a line.  A space character as a delimiter is an exception:
 436 consecutive spaces do not yield an empty field and neither does any
 437 number of spaces at the end of a line.
 438
 439 To use a tab as a delimiter, specify @samp{\t} at the beginning of the
 440 DELIMITERS string.  To use a backslash as a delimiter, specify
 441 @samp{\\} as the first delimiter or, if a tab should also be a
 442 delimiter, immediately following @samp{\t}.  To read a data file in
 443 which each field appears on a separate line, specify the empty string
 444 for DELIMITERS.
 445
 446 The optional QUALIFIER subcommand names one or more characters that
 447 can be used to quote values within fields in the input.  A field that
 448 begins with one of the specified quote characters ends at the next
 449 matching quote.  Intervening delimiters become part of the field,
 450 instead of terminating it.  The ability to specify more than one quote
 451 character is a PSPP extension.
 452
 453 By default, a character specified on QUALIFIER cannot itself be
 454 embedded within a field that it quotes, because the quote character
 455 always terminates the quoted field.  With ESCAPE, however, a doubled
 456 quote character within a quoted field inserts a single instance of the
 457 quote into the field.  For example, if @samp{'} is specified on
 458 QUALIFIER, then without ESCAPE @code{'a''b'} specifies a pair of
 459 fields that contain @samp{a} and @samp{b}, but with ESCAPE it
 460 specifies a single field that contains @samp{a'b}.  ESCAPE is a PSPP
 461 extension.
 462
 463 The DELCASE subcommand controls how data may be broken across lines in
 464 the data file.  With LINE, the default setting, each line must contain
 465 all the data for exactly one case.  For additional flexibility, to
 466 allow a single case to be split among lines or multiple cases to be
 467 contained on a single line, specify VARIABLES @i{n_variables}, where
 468 @i{n_variables} is the number of variables per case.
 469
 470 The VARIABLES subcommand is required and must be the last subcommand.
 471 Specify the name of each variable and its input format (@pxref{Input
 472 and Output Formats}) in the order they should be read from the input
 473 file.
 474
 475 @subsubheading Examples
 476
 477 @noindent
 478 On a Unix-like system, the @samp{/etc/passwd} file has a format
 479 similar to this:
 480
 481 @example
 482 root:$1$nyeSP5gD$pDq/:0:0:,,,:/root:/bin/bash
 483 blp:$1$BrP/pFg4$g7OG:1000:1000:Ben Pfaff,,,:/home/blp:/bin/bash
 484 john:$1$JBuq/Fioq$g4A:1001:1001:John Darrington,,,:/home/john:/bin/bash
 485 jhs:$1$D3li4hPL$88X1:1002:1002:Jason Stover,,,:/home/jhs:/bin/csh
 486 @end example
 487
 488 @noindent
 489 The following syntax reads a file in the format used by
 490 @samp{/etc/passwd}:
 491
 492 @c If you change this example, change the regression test in
 493 @c tests/language/data-io/get-data.at to match.
 494 @example
 495 GET DATA /TYPE=TXT /FILE='/etc/passwd' /DELIMITERS=':'
 496         /VARIABLES=username A20
 497                    password A40
 498                    uid F10
 499                    gid F10
 500                    gecos A40
 501                    home A40
 502                    shell A40.
 503 @end example
 504
 505 @noindent
 506 Consider the following data on used cars:
 507
 508 @example
 509 model   year    mileage price   type    age
 510 Civic   2002    29883   15900   Si      2
 511 Civic   2003    13415   15900   EX      1
 512 Civic   1992    107000  3800    n/a     12
 513 Accord  2002    26613   17900   EX      1
 514 @end example
 515
 516 @noindent
 517 The following syntax can be used to read the used car data:
 518
 519 @c If you change this example, change the regression test in
 520 @c tests/language/data-io/get-data.at to match.
 521 @example
 522 GET DATA /TYPE=TXT /FILE='cars.data' /DELIMITERS=' ' /FIRSTCASE=2
 523         /VARIABLES=model A8
 524                    year F4
 525                    mileage F6
 526                    price F5
 527                    type A4
 528                    age F2.
 529 @end example
 530
 531 @noindent
 532 Consider the following information on animals in a pet store:
 533
 534 @example
 535 'Pet''s Name', "Age", "Color", "Date Received", "Price", "Height", "Type"
 536 , (Years), , , (Dollars), ,
 537 "Rover", 4.5, Brown, "12 Feb 2004", 80, '1''4"', "Dog"
 538 "Charlie", , Gold, "5 Apr 2007", 12.3, "3""", "Fish"
 539 "Molly", 2, Black, "12 Dec 2006", 25, '5"', "Cat"
 540 "Gilly", , White, "10 Apr 2007", 10, "3""", "Guinea Pig"
 541 @end example
 542
 543 @noindent
 544 The following syntax can be used to read the pet store data:
 545
 546 @c If you change this example, change the regression test in
 547 @c tests/language/data-io/get-data.at to match.
 548 @example
 549 GET DATA /TYPE=TXT /FILE='pets.data' /DELIMITERS=', ' /QUALIFIER='''"' /ESCAPE
 550         /FIRSTCASE=3
 551         /VARIABLES=name A10
 552                    age F3.1
 553                    color A5
 554                    received EDATE10
 555                    price F5.2
 556                    height a5
 557                    type a10.
 558 @end example
 559
 560 @node GET DATA /TYPE=TXT /ARRANGEMENT=FIXED
 561 @subsubsection Reading Fixed Columnar Data
 562
 563 @display
 564 GET DATA /TYPE=TXT
 565         /FILE=@{'file-name',file_handle@}
 566         [/ARRANGEMENT=@{DELIMITED,FIXED@}]
 567         [/FIRSTCASE=@{first_case@}]
 568         [/IMPORTCASE=@{ALL,FIRST max_cases,PERCENT percent@}]
 569
 570         [/FIXCASE=n]
 571         /VARIABLES fixed_var [fixed_var]@dots{}
 572             [/rec# fixed_var [fixed_var]@dots{}]@dots{}
 573 where each fixed_var takes the form:
 574         variable start-end format
 575 @end display
 576
 577 The GET DATA command with TYPE=TXT and ARRANGEMENT=FIXED reads input
 578 data from text files in fixed format, where each field is located in
 579 particular fixed column positions within records of a case.  Its
 580 capabilities are similar to those of DATA LIST FIXED (@pxref{DATA LIST
 581 FIXED}), with a few enhancements.
 582
 583 The required FILE subcommand and optional FIRSTCASE and IMPORTCASE
 584 subcommands are described above (@pxref{GET DATA /TYPE=TXT}).
 585
 586 The optional FIXCASE subcommand may be used to specify the positive
 587 integer number of input lines that make up each case.  The default
 588 value is 1.
 589
 590 The VARIABLES subcommand, which is required, specifies the positions
 591 at which each variable can be found.  For each variable, specify its
 592 name, followed by its start and end column separated by @samp{-}
 593 (e.g.@: @samp{0-9}), followed by an input format type (e.g.@:
 594 @samp{F}) or a full format specification (e.g.@: @samp{DOLLAR12.2}).
 595 For this command, columns are numbered starting from 0 at
 596 the left column.  Introduce the variables in the second and later
 597 lines of a case by a slash followed by the number of the line within
 598 the case, e.g.@: @samp{/2} for the second line.
 599
 600 @subsubheading Examples
 601
 602 @noindent
 603 Consider the following data on used cars:
 604
 605 @example
 606 model   year    mileage price   type    age
 607 Civic   2002    29883   15900   Si      2
 608 Civic   2003    13415   15900   EX      1
 609 Civic   1992    107000  3800    n/a     12
 610 Accord  2002    26613   17900   EX      1
 611 @end example
 612
 613 @noindent
 614 The following syntax can be used to read the used car data:
 615
 616 @c If you change this example, change the regression test in
 617 @c tests/language/data-io/get-data.at to match.
 618 @example
 619 GET DATA /TYPE=TXT /FILE='cars.data' /ARRANGEMENT=FIXED /FIRSTCASE=2
 620         /VARIABLES=model 0-7 A
 621                    year 8-15 F
 622                    mileage 16-23 F
 623                    price 24-31 F
 624                    type 32-40 A
 625                    age 40-47 F.
 626 @end example
 627
 628 @node IMPORT
 629 @section IMPORT
 630 @vindex IMPORT
 631
 632 @display
 633 IMPORT
 634         /FILE='file-name'
 635         /TYPE=@{COMM,TAPE@}
 636         /DROP=var_list
 637         /KEEP=var_list
 638         /RENAME=(src_names=target_names)@dots{}
 639 @end display
 640
 641 The @cmd{IMPORT} transformation clears the active dataset dictionary and
 642 data and
 643 replaces them with a dictionary and data from a system file or
 644 portable file.
 645
 646 The FILE subcommand, which is the only required subcommand, specifies
 647 the portable file to be read as a file name string or a file handle
 648 (@pxref{File Handles}).
 649
 650 The TYPE subcommand is currently not used.
 651
 652 DROP, KEEP, and RENAME follow the syntax used by @cmd{GET} (@pxref{GET}).
 653
 654 @cmd{IMPORT} does not cause the data to be read, only the dictionary.  The
 655 data is read later, when a procedure is executed.
 656
 657 Use of @cmd{IMPORT} to read a system file is a PSPP extension.
 658
 659 @node SAVE
 660 @section SAVE
 661 @vindex SAVE
 662
 663 @display
 664 SAVE
 665         /OUTFILE=@{'file-name',file_handle@}
 666         /UNSELECTED=@{RETAIN,DELETE@}
 667         /@{COMPRESSED,UNCOMPRESSED@}
 668         /PERMISSIONS=@{WRITEABLE,READONLY@}
 669         /DROP=var_list
 670         /KEEP=var_list
 671         /VERSION=version
 672         /RENAME=(src_names=target_names)@dots{}
 673         /NAMES
 674         /MAP
 675 @end display
 676
 677 The @cmd{SAVE} procedure causes the dictionary and data in the active
 678 dataset to
 679 be written to a system file.
 680
 681 OUTFILE is the only required subcommand.  Specify the system file
 682 to be written as a string file name or a file handle
 683 (@pxref{File Handles}).
 684
 685 By default, cases excluded with FILTER are written to the system file.
 686 These can be excluded by specifying DELETE on the UNSELECTED
 687 subcommand.  Specifying RETAIN makes the default explicit.
 688
 689 The COMPRESS and UNCOMPRESS subcommand determine whether the saved
 690 system file is compressed.  By default, system files are compressed.
 691 This default can be changed with the SET command (@pxref{SET}).
 692
 693 The PERMISSIONS subcommand specifies permissions for the new system
 694 file.  WRITEABLE, the default, creates the file with read and write
 695 permission.  READONLY creates the file for read-only access.
 696
 697 By default, all the variables in the active dataset dictionary are written
 698 to the system file.  The DROP subcommand can be used to specify a list
 699 of variables not to be written.  In contrast, KEEP specifies variables
 700 to be written, with all variables not specified not written.
 701
 702 Normally variables are saved to a system file under the same names they
 703 have in the active dataset.  Use the RENAME subcommand to change these names.
 704 Specify, within parentheses, a list of variable names followed by an
 705 equals sign (@samp{=}) and the names that they should be renamed to.
 706 Multiple parenthesized groups of variable names can be included on a
 707 single RENAME subcommand.  Variables' names may be swapped using a
 708 RENAME subcommand of the form @samp{/RENAME=(A B=B A)}.
 709
 710 Alternate syntax for the RENAME subcommand allows the parentheses to be
 711 eliminated.  When this is done, only a single variable may be renamed at
 712 once.  For instance, @samp{/RENAME=A=B}.  This alternate syntax is
 713 deprecated.
 714
 715 DROP, KEEP, and RENAME are performed in left-to-right order.  They
 716 each may be present any number of times.  @cmd{SAVE} never modifies
 717 the active dataset.  DROP, KEEP, and RENAME only affect the system file
 718 written to disk.
 719
 720 The VERSION subcommand specifies the version of the file format. Valid
 721 versions are 2 and 3.  The default version is 3.  In version 2 system
 722 files, variable names longer than 8 bytes will be truncated.  The two
 723 versions are otherwise identical.
 724
 725 The NAMES and MAP subcommands are currently ignored.
 726
 727 @cmd{SAVE} causes the data to be read.  It is a procedure.
 728
 729 @node SAVE TRANSLATE
 730 @section SAVE TRANSLATE
 731 @vindex SAVE TRANSLATE
 732
 733 @display
 734 SAVE TRANSLATE
 735         /OUTFILE=@{'file-name',file_handle@}
 736         /TYPE=@{CSV,TAB@}
 737         [/REPLACE]
 738         [/MISSING=@{IGNORE,RECODE@}]
 739
 740         [/DROP=var_list]
 741         [/KEEP=var_list]
 742         [/RENAME=(src_names=target_names)@dots{}]
 743         [/UNSELECTED=@{RETAIN,DELETE@}]
 744         [/MAP]
 745
 746         @dots{}additional subcommands depending on TYPE@dots{}
 747 @end display
 748
 749 The @cmd{SAVE TRANSLATE} command is used to save data into various
 750 formats understood by other applications.
 751
 752 The OUTFILE and TYPE subcommands are mandatory.  OUTFILE specifies the
 753 file to be written, as a string file name or a file handle
 754 (@pxref{File Handles}).  TYPE determines the type of the file or
 755 source to read.  It must be one of the following:
 756
 757 @table @asis
 758 @item CSV
 759 Comma-separated value format,
 760
 761 @item TAB
 762 Tab-delimited format.
 763 @end table
 764
 765 By default, SAVE TRANSLATE will not overwrite an existing file.  Use
 766 REPLACE to force an existing file to be overwritten.
 767
 768 With MISSING=IGNORE, the default, SAVE TRANSLATE treats user-missing
 769 values as if they were not missing.  Specify MISSING=RECODE to output
 770 numeric user-missing values like system-missing values and string
 771 user-missing values as all spaces.
 772
 773 By default, all the variables in the active dataset dictionary are saved
 774 to the system file, but DROP or KEEP can select a subset of variable
 775 to save.  The RENAME subcommand can also be used to change the names
 776 under which variables are saved.  UNSELECTED determines whether cases
 777 filtered out by the FILTER command are written to the output file.
 778 These subcommands have the same syntax and meaning as on the
 779 @cmd{SAVE} command (@pxref{SAVE}).
 780
 781 Each supported file type has additional subcommands, explained in
 782 separate sections below.
 783
 784 @cmd{SAVE TRANSLATE} causes the data to be read.  It is a procedure.
 785
 786 @menu
 787 * SAVE TRANSLATE /TYPE=CSV and TYPE=TAB::
 788 @end menu
 789
 790 @node SAVE TRANSLATE /TYPE=CSV and TYPE=TAB
 791 @subsection Writing Comma- and Tab-Separated Data Files
 792
 793 @display
 794 SAVE TRANSLATE
 795         /OUTFILE=@{'file-name',file_handle@}
 796         /TYPE=CSV
 797         [/REPLACE]
 798         [/MISSING=@{IGNORE,RECODE@}]
 799
 800         [/DROP=var_list]
 801         [/KEEP=var_list]
 802         [/RENAME=(src_names=target_names)@dots{}]
 803         [/UNSELECTED=@{RETAIN,DELETE@}]
 804
 805         [/FIELDNAMES]
 806         [/CELLS=@{VALUES,LABELS@}]
 807         [/TEXTOPTIONS DELIMITER='delimiter']
 808         [/TEXTOPTIONS QUALIFIER='qualifier']
 809         [/TEXTOPTIONS DECIMAL=@{DOT,COMMA@}]
 810         [/TEXTOPTIONS FORMAT=@{PLAIN,VARIABLE@}]
 811 @end display
 812
 813 The SAVE TRANSLATE command with TYPE=CSV or TYPE=TAB writes data in a
 814 comma- or tab-separated value format similar to that described by
 815 RFC@tie{}4180.  Each variable becomes one output column, and each case
 816 becomes one line of output.  If FIELDNAMES is specified, an additional
 817 line at the top of the output file lists variable names.
 818
 819 The CELLS and TEXTOPTIONS FORMAT settings determine how values are
 820 written to the output file:
 821
 822 @table @asis
 823 @item CELLS=VALUES FORMAT=PLAIN (the default settings)
 824 Writes variables to the output in ``plain'' formats that ignore the
 825 details of variable formats.  Numeric values are written as plain
 826 decimal numbers with enough digits to indicate their exact values in
 827 machine representation.  Numeric values include @samp{e} followed by
 828 an exponent if the exponent value would be less than -4 or greater
 829 than 16.  Dates are written in MM/DD/YYYY format and times in HH:MM:SS
 830 format.  WKDAY and MONTH values are written as decimal numbers.
 831
 832 Numeric values use, by default, the decimal point character set with
 833 SET DECIMAL (@pxref{SET DECIMAL}).  Use DECIMAL=DOT or DECIMAL=COMMA
 834 to force a particular decimal point character.
 835
 836 @item CELLS=VALUES FORMAT=VARIABLE
 837 Writes variables using their print formats.  Leading and trailing
 838 spaces are removed from numeric values, and trailing spaces are
 839 removed from string values.
 840
 841 @item CELLS=LABEL FORMAT=PLAIN
 842 @itemx CELLS=LABEL FORMAT=VARIABLE
 843 Writes value labels where they exist, and otherwise writes the values
 844 themselves as described above.
 845 @end table
 846
 847 Regardless of CELLS and TEXTOPTIONS FORMAT, numeric system-missing
 848 values are output as a single space.
 849
 850 For TYPE=TAB, tab characters delimit values.  For TYPE=CSV, the
 851 TEXTOPTIONS DELIMITER and DECIMAL settings determine the character
 852 that separate values within a line.  If DELIMITER is specified, then
 853 the specified string separate values.  If DELIMITER is not specified,
 854 then the default is a comma with DECIMAL=DOT or a semicolon with
 855 DECIMAL=COMMA.  If DECIMAL is not given either, it is implied by the
 856 decimal point character set with SET DECIMAL (@pxref{SET DECIMAL}).
 857
 858 The TEXTOPTIONS QUALIFIER setting specifies a character that is output
 859 before and after a value that contains the delimiter character or the
 860 qualifier character.  The default is a double quote (@samp{@@}).  A
 861 qualifier character that appears within a value is doubled.
 862
 863 @node SYSFILE INFO
 864 @section SYSFILE INFO
 865 @vindex SYSFILE INFO
 866
 867 @display
 868 SYSFILE INFO FILE='file-name'.
 869 @end display
 870
 871 @cmd{SYSFILE INFO} reads the dictionary in a system file and
 872 displays the information in its dictionary.
 873
 874 Specify a file name or file handle.  @cmd{SYSFILE INFO} reads that file as
 875 a system file and displays information on its dictionary.
 876
 877 @cmd{SYSFILE INFO} does not affect the current active dataset.
 878
 879 @node XEXPORT
 880 @section XEXPORT
 881 @vindex XEXPORT
 882
 883 @display
 884 XEXPORT
 885         /OUTFILE='file-name'
 886         /DIGITS=n
 887         /DROP=var_list
 888         /KEEP=var_list
 889         /RENAME=(src_names=target_names)@dots{}
 890         /TYPE=@{COMM,TAPE@}
 891         /MAP
 892 @end display
 893
 894 The @cmd{EXPORT} transformation writes the active dataset dictionary and
 895 data to a specified portable file.
 896
 897 This transformation is a PSPP extension.
 898
 899 It is similar to the @cmd{EXPORT} procedure, with two differences:
 900
 901 @itemize
 902 @item
 903 @cmd{XEXPORT} is a transformation, not a procedure.  It is executed when
 904 the data is read by a procedure or procedure-like command.
 905
 906 @item
 907 @cmd{XEXPORT} does not support the UNSELECTED subcommand.
 908 @end itemize
 909
 910 @xref{EXPORT}, for more information.
 911
 912 @node XSAVE
 913 @section XSAVE
 914 @vindex XSAVE
 915
 916 @display
 917 XSAVE
 918         /OUTFILE='file-name'
 919         /@{COMPRESSED,UNCOMPRESSED@}
 920         /PERMISSIONS=@{WRITEABLE,READONLY@}
 921         /DROP=var_list
 922         /KEEP=var_list
 923         /VERSION=version
 924         /RENAME=(src_names=target_names)@dots{}
 925         /NAMES
 926         /MAP
 927 @end display
 928
 929 The @cmd{XSAVE} transformation writes the active dataset's dictionary and
 930 data to a system file.  It is similar to the @cmd{SAVE}
 931 procedure, with two differences:
 932
 933 @itemize
 934 @item
 935 @cmd{XSAVE} is a transformation, not a procedure.  It is executed when
 936 the data is read by a procedure or procedure-like command.
 937
 938 @item
 939 @cmd{XSAVE} does not support the UNSELECTED subcommand.
 940 @end itemize
 941
 942 @xref{SAVE}, for more information.