pintos-os.org Git - pspp/blob - doc/files.texi

   1 @node System and Portable File IO
   2 @chapter System and Portable File I/O
   3
   4 The commands in this chapter read, write, and examine system files and
   5 portable files.
   6
   7 @menu
   8 * APPLY DICTIONARY::            Apply system file dictionary to active dataset.
   9 * EXPORT::                      Write to a portable file.
  10 * GET::                         Read from a system file.
  11 * GET DATA::                    Read from foreign files.
  12 * IMPORT::                      Read from a portable file.
  13 * SAVE::                        Write to a system file.
  14 * SAVE TRANSLATE::              Write data in foreign file formats.
  15 * SYSFILE INFO::                Display system file dictionary.
  16 * XEXPORT::                     Write to a portable file, as a transformation.
  17 * XSAVE::                       Write to a system file, as a transformation.
  18 @end menu
  19
  20 @node APPLY DICTIONARY
  21 @section APPLY DICTIONARY
  22 @vindex APPLY DICTIONARY
  23
  24 @display
  25 APPLY DICTIONARY FROM=@{'@var{file_name}',@var{file_handle}@}.
  26 @end display
  27
  28 @cmd{APPLY DICTIONARY} applies the variable labels, value labels,
  29 and missing values taken from a file to corresponding
  30 variables in the active dataset.  In some cases it also updates the
  31 weighting variable.
  32
  33 Specify a system file or portable file's name, a data set name
  34 (@pxref{Datasets}), or a file handle name (@pxref{File Handles}).  The
  35 dictionary in the file will be read, but it will not replace the
  36 active dataset's dictionary.  The file's data will not be read.
  37
  38 Only variables with names that exist in both the active dataset and the
  39 system file are considered.  Variables with the same name but different
  40 types (numeric, string) will cause an error message.  Otherwise, the
  41 system file variables' attributes will replace those in their matching
  42 active dataset variables:
  43
  44 @itemize @bullet
  45 @item
  46 If a system file variable has a variable label, then it will replace
  47 the variable label of the active dataset variable.  If the system
  48 file variable does not have a variable label, then the active dataset
  49 variable's variable label, if any, will be retained.
  50
  51 @item
  52 If the system file variable has custom attributes (@pxref{VARIABLE
  53 ATTRIBUTE}), then those attributes replace the active dataset variable's
  54 custom attributes.  If the system file variable does not have custom
  55 attributes, then the active dataset variable's custom attributes, if any,
  56 will be retained.
  57
  58 @item
  59 If the active dataset variable is numeric or short string, then value
  60 labels and missing values, if any, will be copied to the active dataset
  61 variable.  If the system file variable does not have value labels or
  62 missing values, then those in the active dataset variable, if any, will not
  63 be disturbed.
  64 @end itemize
  65
  66 In addition to properties of variables, some properties of the active
  67 file dictionary as a whole are updated:
  68
  69 @itemize @bullet
  70 @item
  71 If the system file has custom attributes (@pxref{DATAFILE ATTRIBUTE}),
  72 then those attributes replace the active dataset variable's custom
  73 attributes.
  74
  75 @item
  76 If the active dataset has a weighting variable (@pxref{WEIGHT}), and the
  77 system file does not, or if the weighting variable in the system file
  78 does not exist in the active dataset, then the active dataset weighting
  79 variable, if any, is retained.  Otherwise, the weighting variable in
  80 the system file becomes the active dataset weighting variable.
  81 @end itemize
  82
  83 @cmd{APPLY DICTIONARY} takes effect immediately.  It does not read the
  84 active dataset.  The system file is not modified.
  85
  86 @node EXPORT
  87 @section EXPORT
  88 @vindex EXPORT
  89
  90 @display
  91 EXPORT
  92         /OUTFILE='@var{file_name}'
  93         /UNSELECTED=@{RETAIN,DELETE@}
  94         /DIGITS=@var{n}
  95         /DROP=@var{var_list}
  96         /KEEP=@var{var_list}
  97         /RENAME=(@var{src_names}=@var{target_names})@dots{}
  98         /TYPE=@{COMM,TAPE@}
  99         /MAP
 100 @end display
 101
 102 The @cmd{EXPORT} procedure writes the active dataset's dictionary and
 103 data to a specified portable file.
 104
 105 By default, cases excluded with FILTER are written to the
 106 file.  These can be excluded by specifying DELETE on the @subcmd{UNSELECTED}
 107 subcommand.  Specifying RETAIN makes the default explicit.
 108
 109 Portable files express real numbers in base 30.  Integers are always
 110 expressed to the maximum precision needed to make them exact.
 111 Non-integers are, by default, expressed to the machine's maximum
 112 natural precision (approximately 15 decimal digits on many machines).
 113 If many numbers require this many digits, the portable file may
 114 significantly increase in size.  As an alternative, the @subcmd{DIGITS}
 115 subcommand may be used to specify the number of decimal digits of
 116 precision to write.  @subcmd{DIGITS} applies only to non-integers.
 117
 118 The @subcmd{OUTFILE} subcommand, which is the only required subcommand, specifies
 119 the portable file to be written as a file name string or
 120 a file handle (@pxref{File Handles}).
 121
 122 @subcmd{DROP}, @subcmd{KEEP}, and @subcmd{RENAME} follow the same format as the
 123 @subcmd{SAVE} procedure (@pxref{SAVE}).
 124
 125 The @subcmd{TYPE} subcommand specifies the character set for use in the
 126 portable file.  Its value is currently not used.
 127
 128 The @subcmd{MAP} subcommand is currently ignored.
 129
 130 @cmd{EXPORT} is a procedure.  It causes the active dataset to be read.
 131
 132 @node GET
 133 @section GET
 134 @vindex GET
 135
 136 @display
 137 GET
 138         /FILE=@{'@var{file_name}',@var{file_handle}@}
 139         /DROP=@var{var_list}
 140         /KEEP=@var{var_list}
 141         /RENAME=(@var{src_names}=@var{target_names})@dots{}
 142         /ENCODING='@var{encoding}'
 143 @end display
 144
 145 @cmd{GET} clears the current dictionary and active dataset and
 146 replaces them with the dictionary and data from a specified file.
 147
 148 The @subcmd{FILE} subcommand is the only required subcommand.
 149 Specify the system
 150 file or portable file to be read as a string file name or
 151 a file handle (@pxref{File Handles}).
 152
 153 By default, all the variables in a file are read.  The DROP
 154 subcommand can be used to specify a list of variables that are not to be
 155 read.  By contrast, the @subcmd{KEEP} subcommand can be used to specify
 156 variable that are to be read, with all other variables not read.
 157
 158 Normally variables in a file retain the names that they were
 159 saved under.  Use the @subcmd{RENAME} subcommand to change these names.
 160 Specify,
 161 within parentheses, a list of variable names followed by an equals sign
 162 (@samp{=}) and the names that they should be renamed to.  Multiple
 163 parenthesized groups of variable names can be included on a single
 164 @subcmd{RENAME} subcommand.
 165 Variables' names may be swapped using a @subcmd{RENAME}
 166 subcommand of the form @subcmd{/RENAME=(@var{A} @var{B}=@var{B} @var{A})}.
 167
 168 Alternate syntax for the @subcmd{RENAME} subcommand allows the parentheses to be
 169 eliminated.  When this is done, only a single variable may be renamed at
 170 once.  For instance, @subcmd{/RENAME=@var{A}=@var{B}}.  This alternate syntax is
 171 deprecated.
 172
 173 @subcmd{DROP}, @subcmd{KEEP}, and @subcmd{RENAME} are executed in left-to-right order.
 174 Each may be present any number of times.  @cmd{GET} never modifies a
 175 file on disk.  Only the active dataset read from the file
 176 is affected by these subcommands.
 177
 178 @pspp{} tries to automatically detect the encoding of string data in the
 179 file.  Sometimes, however, this does not work well encoding,
 180 especially for files written by old versions of SPSS or @pspp{}.  Specify
 181 the @subcmd{ENCODING} subcommand with an @acronym{IANA} character set name as its string
 182 argument to override the default.  The @subcmd{ENCODING} subcommand is a @pspp{}
 183 extension.
 184
 185 @cmd{GET} does not cause the data to be read, only the dictionary.  The data
 186 is read later, when a procedure is executed.
 187
 188 Use of @cmd{GET} to read a portable file is a @pspp{} extension.
 189
 190 @node GET DATA
 191 @section GET DATA
 192 @vindex GET DATA
 193
 194 @display
 195 GET DATA
 196         /TYPE=@{GNM,ODS,PSQL,TXT@}
 197         @dots{}additional subcommands depending on TYPE@dots{}
 198 @end display
 199
 200 The @cmd{GET DATA} command is used to read files and other data
 201 sources created by other applications.  When this command is executed,
 202 the current dictionary and active dataset are replaced with variables
 203 and data read from the specified source.
 204
 205 The @subcmd{TYPE} subcommand is mandatory and must be the first subcommand
 206 specified.  It determines the type of the file or source to read.
 207 @pspp{} currently supports the following file types:
 208
 209 @table @asis
 210 @item GNM
 211 Spreadsheet files created by Gnumeric (@url{http://gnumeric.org}).
 212
 213 @item ODS
 214 Spreadsheet files in OpenDocument format.
 215
 216 @item PSQL
 217 Relations from PostgreSQL databases (@url{http://postgresql.org}).
 218
 219 @item TXT
 220 Textual data files in columnar and delimited formats.
 221 @end table
 222
 223 Each supported file type has additional subcommands, explained in
 224 separate sections below.
 225
 226 @menu
 227 * GET DATA /TYPE=GNM/ODS::     Spreadsheets
 228 * GET DATA /TYPE=PSQL::        Databases
 229 * GET DATA /TYPE=TXT::         Delimited Text Files
 230 @end menu
 231
 232 @node GET DATA /TYPE=GNM/ODS
 233 @subsection Spreadsheet Files
 234
 235 @display
 236 GET DATA /TYPE=@{GNM, ODS@}
 237         /FILE=@{'@var{file_name}'@}
 238         /SHEET=@{NAME '@var{sheet_name}', INDEX @var{n}@}
 239         /CELLRANGE=@{RANGE '@var{range}', FULL@}
 240         /READNAMES=@{ON, OFF@}
 241         /ASSUMEDVARWIDTH=@var{n}.
 242 @end display
 243
 244 @cindex Gnumeric
 245 @cindex OpenDocument
 246 @cindex spreadsheet files
 247
 248 Gnumeric spreadsheets (@url{http://gnumeric.org}), and spreadsheets
 249 in OpenDocument format
 250 (@url{http://libreplanet.org/wiki/Group:OpenDocument/Software})
 251 can be read using the @cmd{GET DATA} command.
 252 Use the @subcmd{TYPE} subcommand to indicate the file's format.
 253 /TYPE=GNM indicates Gnumeric files,
 254 /TYPE=ODS indicates OpenDocument.
 255 The @subcmd{FILE} subcommand is mandatory.
 256 Use it to specify the name file to be read.
 257 All other subcommands are optional.
 258
 259 The format of each variable is determined by the format of the spreadsheet
 260 cell containing the first datum for the variable.
 261 If this cell is of string (text) format, then the width of the variable is
 262 determined from the length of the string it contains, unless the
 263 @subcmd{ASSUMEDVARWIDTH} subcommand is given.
 264
 265 The @subcmd{SHEET} subcommand specifies the sheet within the spreadsheet file to read.
 266 There are two forms of the @subcmd{SHEET} subcommand.
 267 In the first form,
 268 @subcmd{/SHEET=name @var{sheet_name}}, the string @var{sheet_name} is the
 269 name of the sheet to read.
 270 In the second form, @subcmd{/SHEET=index @var{idx}}, @var{idx} is a
 271 integer which is the index of the sheet to read.
 272 The first sheet has the index 1.
 273 If the @subcmd{SHEET} subcommand is omitted, then the command will read the
 274 first sheet in the file.
 275
 276 The @subcmd{CELLRANGE} subcommand specifies the range of cells within the sheet to read.
 277 If the subcommand is given as @subcmd{/CELLRANGE=FULL}, then the entire
 278 sheet  is read.
 279 To read only part of a sheet, use the form
 280 @subcmd{/CELLRANGE=range '@var{top_left_cell}:@var{bottom_right_cell}'}.
 281 For example, the subcommand @subcmd{/CELLRANGE=range 'C3:P19'} reads
 282 columns C--P, and rows 3--19 inclusive.
 283 If no @subcmd{CELLRANGE} subcommand is given, then the entire sheet is read.
 284
 285 If @subcmd{/READNAMES=ON} is specified, then the contents of cells of
 286 the first row are used as the names of the variables in which to store
 287 the data from subsequent rows.  This is the default.
 288 If @subcmd{/READNAMES=OFF} is
 289 used, then the variables  receive automatically assigned names.
 290
 291 The @subcmd{ASSUMEDVARWIDTH} subcommand specifies the maximum width of string
 292 variables read  from the file.
 293 If omitted, the default value is determined from the length of the
 294 string in the first spreadsheet cell for each variable.
 295
 296
 297 @node GET DATA /TYPE=PSQL
 298 @subsection Postgres Database Queries
 299
 300 @display
 301 GET DATA /TYPE=PSQL
 302          /CONNECT=@{@var{connection info}@}
 303          /SQL=@{@var{query}@}
 304          [/ASSUMEDVARWIDTH=@var{w}]
 305          [/UNENCRYPTED]
 306          [/BSIZE=@var{n}].
 307 @end display
 308
 309 @cindex postgres
 310 @cindex databases
 311
 312 The PSQL type is used to import data from a postgres database server.
 313 The server may be located locally or remotely.
 314 Variables are automatically created based on the table column names
 315 or the names specified in the SQL query.
 316 Postgres data types of high precision, will loose precision when
 317 imported into @pspp{}.
 318 Not all the postgres data types are able to be represented in @pspp{}.
 319 If a datum cannot be represented a warning will be issued and that
 320 datum will be set to SYSMIS.
 321
 322 The @subcmd{CONNECT} subcommand is mandatory.
 323 It is a string specifying the parameters of the database server from
 324 which the data should be fetched.
 325 The format of the string is given in the postgres manual
 326 @url{http://www.postgresql.org/docs/8.0/static/libpq.html#LIBPQ-CONNECT}.
 327
 328 The @subcmd{SQL} subcommand is mandatory.
 329 It must be a valid SQL string to retrieve data from the database.
 330
 331 The @subcmd{ASSUMEDVARWIDTH} subcommand specifies the maximum width of string
 332 variables read  from the database.
 333 If omitted, the default value is determined from the length of the
 334 string in the first value read for each variable.
 335
 336 The @subcmd{UNENCRYPTED} subcommand allows data to be retrieved over an insecure
 337 connection.
 338 If the connection is not encrypted, and the @subcmd{UNENCRYPTED} subcommand is
 339 not given, then an error will occur.
 340 Whether or not the connection is
 341 encrypted depends upon the underlying psql library and the
 342 capabilities of the database server.
 343
 344 The @subcmd{BSIZE} subcommand serves only to optimise the speed of data transfer.
 345 It specifies an upper limit on
 346 number of cases to fetch from the database at once.
 347 The default value is 4096.
 348 If your SQL statement fetches a large number of cases but only a small number of
 349 variables, then the data transfer may be faster if you increase this value.
 350 Conversely, if the number of variables is large, or if the machine on which
 351 @pspp{} is running has only a
 352 small amount of memory, then a smaller value will be better.
 353
 354
 355 The following syntax is an example:
 356 @example
 357 GET DATA /TYPE=PSQL
 358      /CONNECT='host=example.com port=5432 dbname=product user=fred passwd=xxxx'
 359      /SQL='select * from manufacturer'.
 360 @end example
 361
 362
 363 @node GET DATA /TYPE=TXT
 364 @subsection Textual Data Files
 365
 366 @display
 367 GET DATA /TYPE=TXT
 368         /FILE=@{'@var{file_name}',@var{file_handle}@}
 369         [ENCODING='@var{encoding}']
 370         [/ARRANGEMENT=@{DELIMITED,FIXED@}]
 371         [/FIRSTCASE=@{@var{first_case}@}]
 372         [/IMPORTCASE=@{ALL,FIRST @var{max_cases},PERCENT @var{percent}@}]
 373         @dots{}additional subcommands depending on ARRANGEMENT@dots{}
 374 @end display
 375
 376 @cindex text files
 377 @cindex data files
 378 When TYPE=TXT is specified, GET DATA reads data in a delimited or
 379 fixed columnar format, much like DATA LIST (@pxref{DATA LIST}).
 380
 381 The @subcmd{FILE} subcommand is mandatory.  Specify the file to be read as
 382 a string file name or (for textual data only) a
 383 file handle (@pxref{File Handles}).
 384
 385 The @subcmd{ENCODING} subcommand specifies the character encoding of
 386 the file to be read.  @xref{INSERT}, for information on supported
 387 encodings.
 388
 389 The @subcmd{ARRANGEMENT} subcommand determines the file's basic format.
 390 DELIMITED, the default setting, specifies that fields in the input
 391 data are separated by spaces, tabs, or other user-specified
 392 delimiters.  FIXED specifies that fields in the input data appear at
 393 particular fixed column positions within records of a case.
 394
 395 By default, cases are read from the input file starting from the first
 396 line.  To skip lines at the beginning of an input file, set @subcmd{FIRSTCASE}
 397 to the number of the first line to read: 2 to skip the first line, 3
 398 to skip the first two lines, and so on.
 399
 400 @subcmd{IMPORTCASE} can be used to limit the number of cases read from the
 401 input file.  With the default setting, ALL, all cases in the file are
 402 read.  Specify FIRST @var{max_cases} to read at most @var{max_cases} cases
 403 from the file.  Use @subcmd{PERCENT @var{percent}} to read only @var{percent}
 404 percent, approximately, of the cases contained in the file.  (The
 405 percentage is approximate, because there is no way to accurately count
 406 the number of cases in the file without reading the entire file.  The
 407 number of cases in some kinds of unusual files cannot be estimated;
 408 @pspp{} will read all cases in such files.)
 409
 410 @subcmd{FIRSTCASE} and @subcmd{IMPORTCASE} may be used with delimited and fixed-format
 411 data.  The remaining subcommands, which apply only to one of the two  file
 412 arrangements, are described below.
 413
 414 @menu
 415 * GET DATA /TYPE=TXT /ARRANGEMENT=DELIMITED::
 416 * GET DATA /TYPE=TXT /ARRANGEMENT=FIXED::
 417 @end menu
 418
 419 @node GET DATA /TYPE=TXT /ARRANGEMENT=DELIMITED
 420 @subsubsection Reading Delimited Data
 421
 422 @display
 423 GET DATA /TYPE=TXT
 424         /FILE=@{'@var{file_name}',@var{file_handle}@}
 425         [/ARRANGEMENT=@{DELIMITED,FIXED@}]
 426         [/FIRSTCASE=@{@var{first_case}@}]
 427         [/IMPORTCASE=@{ALL,FIRST @var{max_cases},PERCENT @var{percent}@}]
 428
 429         /DELIMITERS="@var{delimiters}"
 430         [/QUALIFIER="@var{quotes}" [/ESCAPE]]
 431         [/DELCASE=@{LINE,VARIABLES @var{n_variables}@}]
 432         /VARIABLES=@var{del_var1} [@var{del_var2}]@dots{}
 433 where each @var{del_var} takes the form:
 434         variable format
 435 @end display
 436
 437 The GET DATA command with TYPE=TXT and ARRANGEMENT=DELIMITED reads
 438 input data from text files in delimited format, where fields are
 439 separated by a set of user-specified delimiters.  Its capabilities are
 440 similar to those of DATA LIST FREE (@pxref{DATA LIST FREE}), with a
 441 few enhancements.
 442
 443 The required @subcmd{FILE} subcommand and optional @subcmd{FIRSTCASE} and @subcmd{IMPORTCASE}
 444 subcommands are described above (@pxref{GET DATA /TYPE=TXT}).
 445
 446 @subcmd{DELIMITERS}, which is required, specifies the set of characters that
 447 may separate fields.  Each character in the string specified on
 448 @subcmd{DELIMITERS} separates one field from the next.  The end of a line also
 449 separates fields, regardless of @subcmd{DELIMITERS}.  Two consecutive
 450 delimiters in the input yield an empty field, as does a delimiter at
 451 the end of a line.  A space character as a delimiter is an exception:
 452 consecutive spaces do not yield an empty field and neither does any
 453 number of spaces at the end of a line.
 454
 455 To use a tab as a delimiter, specify @samp{\t} at the beginning of the
 456 @subcmd{DELIMITERS} string.  To use a backslash as a delimiter, specify
 457 @samp{\\} as the first delimiter or, if a tab should also be a
 458 delimiter, immediately following @samp{\t}.  To read a data file in
 459 which each field appears on a separate line, specify the empty string
 460 for @subcmd{DELIMITERS}.
 461
 462 The optional @subcmd{QUALIFIER} subcommand names one or more characters that
 463 can be used to quote values within fields in the input.  A field that
 464 begins with one of the specified quote characters ends at the next
 465 matching quote.  Intervening delimiters become part of the field,
 466 instead of terminating it.  The ability to specify more than one quote
 467 character is a @pspp{} extension.
 468
 469 By default, a character specified on @subcmd{QUALIFIER} cannot itself be
 470 embedded within a field that it quotes, because the quote character
 471 always terminates the quoted field.  With ESCAPE, however, a doubled
 472 quote character within a quoted field inserts a single instance of the
 473 quote into the field.  For example, if @samp{'} is specified on
 474 @subcmd{QUALIFIER}, then without ESCAPE @code{'a''b'} specifies a pair of
 475 fields that contain @samp{a} and @samp{b}, but with ESCAPE it
 476 specifies a single field that contains @samp{a'b}.  ESCAPE is a @pspp{}
 477 extension.
 478
 479 The @subcmd{DELCASE} subcommand controls how data may be broken across lines in
 480 the data file.  With LINE, the default setting, each line must contain
 481 all the data for exactly one case.  For additional flexibility, to
 482 allow a single case to be split among lines or multiple cases to be
 483 contained on a single line, specify VARIABLES @i{n_variables}, where
 484 @i{n_variables} is the number of variables per case.
 485
 486 The @subcmd{VARIABLES} subcommand is required and must be the last subcommand.
 487 Specify the name of each variable and its input format (@pxref{Input
 488 and Output Formats}) in the order they should be read from the input
 489 file.
 490
 491 @subsubheading Examples
 492
 493 @noindent
 494 On a Unix-like system, the @samp{/etc/passwd} file has a format
 495 similar to this:
 496
 497 @example
 498 root:$1$nyeSP5gD$pDq/:0:0:,,,:/root:/bin/bash
 499 blp:$1$BrP/pFg4$g7OG:1000:1000:Ben Pfaff,,,:/home/blp:/bin/bash
 500 john:$1$JBuq/Fioq$g4A:1001:1001:John Darrington,,,:/home/john:/bin/bash
 501 jhs:$1$D3li4hPL$88X1:1002:1002:Jason Stover,,,:/home/jhs:/bin/csh
 502 @end example
 503
 504 @noindent
 505 The following syntax reads a file in the format used by
 506 @samp{/etc/passwd}:
 507
 508 @c If you change this example, change the regression test in
 509 @c tests/language/data-io/get-data.at to match.
 510 @example
 511 GET DATA /TYPE=TXT /FILE='/etc/passwd' /DELIMITERS=':'
 512         /VARIABLES=username A20
 513                    password A40
 514                    uid F10
 515                    gid F10
 516                    gecos A40
 517                    home A40
 518                    shell A40.
 519 @end example
 520
 521 @noindent
 522 Consider the following data on used cars:
 523
 524 @example
 525 model   year    mileage price   type    age
 526 Civic   2002    29883   15900   Si      2
 527 Civic   2003    13415   15900   EX      1
 528 Civic   1992    107000  3800    n/a     12
 529 Accord  2002    26613   17900   EX      1
 530 @end example
 531
 532 @noindent
 533 The following syntax can be used to read the used car data:
 534
 535 @c If you change this example, change the regression test in
 536 @c tests/language/data-io/get-data.at to match.
 537 @example
 538 GET DATA /TYPE=TXT /FILE='cars.data' /DELIMITERS=' ' /FIRSTCASE=2
 539         /VARIABLES=model A8
 540                    year F4
 541                    mileage F6
 542                    price F5
 543                    type A4
 544                    age F2.
 545 @end example
 546
 547 @noindent
 548 Consider the following information on animals in a pet store:
 549
 550 @example
 551 'Pet''s Name', "Age", "Color", "Date Received", "Price", "Height", "Type"
 552 , (Years), , , (Dollars), ,
 553 "Rover", 4.5, Brown, "12 Feb 2004", 80, '1''4"', "Dog"
 554 "Charlie", , Gold, "5 Apr 2007", 12.3, "3""", "Fish"
 555 "Molly", 2, Black, "12 Dec 2006", 25, '5"', "Cat"
 556 "Gilly", , White, "10 Apr 2007", 10, "3""", "Guinea Pig"
 557 @end example
 558
 559 @noindent
 560 The following syntax can be used to read the pet store data:
 561
 562 @c If you change this example, change the regression test in
 563 @c tests/language/data-io/get-data.at to match.
 564 @example
 565 GET DATA /TYPE=TXT /FILE='pets.data' /DELIMITERS=', ' /QUALIFIER='''"' /ESCAPE
 566         /FIRSTCASE=3
 567         /VARIABLES=name A10
 568                    age F3.1
 569                    color A5
 570                    received EDATE10
 571                    price F5.2
 572                    height a5
 573                    type a10.
 574 @end example
 575
 576 @node GET DATA /TYPE=TXT /ARRANGEMENT=FIXED
 577 @subsubsection Reading Fixed Columnar Data
 578
 579 @c (modify-syntax-entry ?_ "w")
 580 @c (modify-syntax-entry ?' "'")
 581 @c (modify-syntax-entry ?@ "'")
 582
 583 @display
 584 GET DATA /TYPE=TXT
 585         /FILE=@{'file_name',@var{file_handle}@}
 586         [/ARRANGEMENT=@{DELIMITED,FIXED@}]
 587         [/FIRSTCASE=@{@var{first_case}@}]
 588         [/IMPORTCASE=@{ALL,FIRST @var{max_cases},PERCENT @var{percent}@}]
 589
 590         [/FIXCASE=@var{n}]
 591         /VARIABLES @var{fixed_var} [@var{fixed_var}]@dots{}
 592             [/rec# @var{fixed_var} [@var{fixed_var}]@dots{}]@dots{}
 593 where each @var{fixed_var} takes the form:
 594         @var{variable} @var{start}-@var{end} @var{format}
 595 @end display
 596
 597 The @cmd{GET DATA} command with TYPE=TXT and ARRANGEMENT=FIXED reads input
 598 data from text files in fixed format, where each field is located in
 599 particular fixed column positions within records of a case.  Its
 600 capabilities are similar to those of DATA LIST FIXED (@pxref{DATA LIST
 601 FIXED}), with a few enhancements.
 602
 603 The required @subcmd{FILE} subcommand and optional @subcmd{FIRSTCASE} and @subcmd{IMPORTCASE}
 604 subcommands are described above (@pxref{GET DATA /TYPE=TXT}).
 605
 606 The optional @subcmd{FIXCASE} subcommand may be used to specify the positive
 607 integer number of input lines that make up each case.  The default
 608 value is 1.
 609
 610 The @subcmd{VARIABLES} subcommand, which is required, specifies the positions
 611 at which each variable can be found.  For each variable, specify its
 612 name, followed by its start and end column separated by @samp{-}
 613 (e.g.@: @samp{0-9}), followed by an input format type (e.g.@:
 614 @samp{F}) or a full format specification (e.g.@: @samp{DOLLAR12.2}).
 615 For this command, columns are numbered starting from 0 at
 616 the left column.  Introduce the variables in the second and later
 617 lines of a case by a slash followed by the number of the line within
 618 the case, e.g.@: @samp{/2} for the second line.
 619
 620 @subsubheading Examples
 621
 622 @noindent
 623 Consider the following data on used cars:
 624
 625 @example
 626 model   year    mileage price   type    age
 627 Civic   2002    29883   15900   Si      2
 628 Civic   2003    13415   15900   EX      1
 629 Civic   1992    107000  3800    n/a     12
 630 Accord  2002    26613   17900   EX      1
 631 @end example
 632
 633 @noindent
 634 The following syntax can be used to read the used car data:
 635
 636 @c If you change this example, change the regression test in
 637 @c tests/language/data-io/get-data.at to match.
 638 @example
 639 GET DATA /TYPE=TXT /FILE='cars.data' /ARRANGEMENT=FIXED /FIRSTCASE=2
 640         /VARIABLES=model 0-7 A
 641                    year 8-15 F
 642                    mileage 16-23 F
 643                    price 24-31 F
 644                    type 32-40 A
 645                    age 40-47 F.
 646 @end example
 647
 648 @node IMPORT
 649 @section IMPORT
 650 @vindex IMPORT
 651
 652 @display
 653 IMPORT
 654         /FILE='@var{file_name}'
 655         /TYPE=@{COMM,TAPE@}
 656         /DROP=@var{var_list}
 657         /KEEP=@var{var_list}
 658         /RENAME=(@var{src_names}=@var{target_names})@dots{}
 659 @end display
 660
 661 The @cmd{IMPORT} transformation clears the active dataset dictionary and
 662 data and
 663 replaces them with a dictionary and data from a system file or
 664 portable file.
 665
 666 The @subcmd{FILE} subcommand, which is the only required subcommand, specifies
 667 the portable file to be read as a file name string or a file handle
 668 (@pxref{File Handles}).
 669
 670 The @subcmd{TYPE} subcommand is currently not used.
 671
 672 @subcmd{DROP}, @subcmd{KEEP}, and @subcmd{RENAME} follow the syntax used by @cmd{GET} (@pxref{GET}).
 673
 674 @cmd{IMPORT} does not cause the data to be read; only the dictionary.  The
 675 data is read later, when a procedure is executed.
 676
 677 Use of @cmd{IMPORT} to read a system file is a @pspp{} extension.
 678
 679 @node SAVE
 680 @section SAVE
 681 @vindex SAVE
 682
 683 @display
 684 SAVE
 685         /OUTFILE=@{'@var{file_name}',@var{file_handle}@}
 686         /UNSELECTED=@{RETAIN,DELETE@}
 687         /@{COMPRESSED,UNCOMPRESSED@}
 688         /PERMISSIONS=@{WRITEABLE,READONLY@}
 689         /DROP=@var{var_list}
 690         /KEEP=@var{var_list}
 691         /VERSION=@var{version}
 692         /RENAME=(@var{src_names}=@var{target_names})@dots{}
 693         /NAMES
 694         /MAP
 695 @end display
 696
 697 The @cmd{SAVE} procedure causes the dictionary and data in the active
 698 dataset to
 699 be written to a system file.
 700
 701 OUTFILE is the only required subcommand.  Specify the system file
 702 to be written as a string file name or a file handle
 703 (@pxref{File Handles}).
 704
 705 By default, cases excluded with FILTER are written to the system file.
 706 These can be excluded by specifying @subcmd{DELETE} on the @subcmd{UNSELECTED}
 707 subcommand.  Specifying @subcmd{RETAIN} makes the default explicit.
 708
 709 The @subcmd{COMPRESS} and @subcmd{UNCOMPRESS} subcommand determine whether
 710 the saved system file is compressed.  By default, system files are compressed.
 711 This default can be changed with the SET command (@pxref{SET}).
 712
 713 The @subcmd{PERMISSIONS} subcommand specifies permissions for the new system
 714 file.  WRITEABLE, the default, creates the file with read and write
 715 permission.  READONLY creates the file for read-only access.
 716
 717 By default, all the variables in the active dataset dictionary are written
 718 to the system file.  The @subcmd{DROP} subcommand can be used to specify a list
 719 of variables not to be written.  In contrast, KEEP specifies variables
 720 to be written, with all variables not specified not written.
 721
 722 Normally variables are saved to a system file under the same names they
 723 have in the active dataset.  Use the @subcmd{RENAME} subcommand to change these names.
 724 Specify, within parentheses, a list of variable names followed by an
 725 equals sign (@samp{=}) and the names that they should be renamed to.
 726 Multiple parenthesized groups of variable names can be included on a
 727 single @subcmd{RENAME} subcommand.  Variables' names may be swapped using a
 728 @subcmd{RENAME} subcommand of the
 729 form @subcmd{/RENAME=(@var{A} @var{B}=@var{B} @var{A})}.
 730
 731 Alternate syntax for the @subcmd{RENAME} subcommand allows the parentheses to be
 732 eliminated.  When this is done, only a single variable may be renamed at
 733 once.  For instance, @subcmd{/RENAME=@var{A}=@var{B}}.  This alternate syntax is
 734 deprecated.
 735
 736 @subcmd{DROP}, @subcmd{KEEP}, and @subcmd{RENAME} are performed in
 737 left-to-right order.  They
 738 each may be present any number of times.  @cmd{SAVE} never modifies
 739 the active dataset.  @subcmd{DROP}, @subcmd{KEEP}, and @subcmd{RENAME} only
 740 affect the system file written to disk.
 741
 742 The @subcmd{VERSION} subcommand specifies the version of the file format. Valid
 743 versions are 2 and 3.  The default version is 3.  In version 2 system
 744 files, variable names longer than 8 bytes will be truncated.  The two
 745 versions are otherwise identical.
 746
 747 The @subcmd{NAMES} and @subcmd{MAP} subcommands are currently ignored.
 748
 749 @cmd{SAVE} causes the data to be read.  It is a procedure.
 750
 751 @node SAVE TRANSLATE
 752 @section SAVE TRANSLATE
 753 @vindex SAVE TRANSLATE
 754
 755 @display
 756 SAVE TRANSLATE
 757         /OUTFILE=@{'@var{file_name}',@var{file_handle}@}
 758         /TYPE=@{CSV,TAB@}
 759         [/REPLACE]
 760         [/MISSING=@{IGNORE,RECODE@}]
 761
 762         [/DROP=@var{var_list}]
 763         [/KEEP=@var{var_list}]
 764         [/RENAME=(@var{src_names}=@var{target_names})@dots{}]
 765         [/UNSELECTED=@{RETAIN,DELETE@}]
 766         [/MAP]
 767
 768         @dots{}additional subcommands depending on TYPE@dots{}
 769 @end display
 770
 771 The @cmd{SAVE TRANSLATE} command is used to save data into various
 772 formats understood by other applications.
 773
 774 The @subcmd{OUTFILE} and @subcmd{TYPE} subcommands are mandatory.
 775 @subcmd{OUTFILE} specifies the file to be written, as a string file name or a file handle
 776 (@pxref{File Handles}).  @subcmd{TYPE} determines the type of the file or
 777 source to read.  It must be one of the following:
 778
 779 @table @asis
 780 @item CSV
 781 Comma-separated value format,
 782
 783 @item TAB
 784 Tab-delimited format.
 785 @end table
 786
 787 By default, @cmd{SAVE TRANSLATE} will not overwrite an existing file.  Use
 788 @subcmd{REPLACE} to force an existing file to be overwritten.
 789
 790 With MISSING=IGNORE, the default, @subcmd{SAVE TRANSLATE} treats user-missing
 791 values as if they were not missing.  Specify MISSING=RECODE to output
 792 numeric user-missing values like system-missing values and string
 793 user-missing values as all spaces.
 794
 795 By default, all the variables in the active dataset dictionary are saved
 796 to the system file, but @subcmd{DROP} or @subcmd{KEEP} can select a subset of variable
 797 to save.  The @subcmd{RENAME} subcommand can also be used to change the names
 798 under which variables are saved.  @subcmd{UNSELECTED} determines whether cases
 799 filtered out by the @cmd{FILTER} command are written to the output file.
 800 These subcommands have the same syntax and meaning as on the
 801 @cmd{SAVE} command (@pxref{SAVE}).
 802
 803 Each supported file type has additional subcommands, explained in
 804 separate sections below.
 805
 806 @cmd{SAVE TRANSLATE} causes the data to be read.  It is a procedure.
 807
 808 @menu
 809 * SAVE TRANSLATE /TYPE=CSV and TYPE=TAB::
 810 @end menu
 811
 812 @node SAVE TRANSLATE /TYPE=CSV and TYPE=TAB
 813 @subsection Writing Comma- and Tab-Separated Data Files
 814
 815 @display
 816 SAVE TRANSLATE
 817         /OUTFILE=@{'@var{file_name}',@var{file_handle}@}
 818         /TYPE=CSV
 819         [/REPLACE]
 820         [/MISSING=@{IGNORE,RECODE@}]
 821
 822         [/DROP=@var{var_list}]
 823         [/KEEP=@var{var_list}]
 824         [/RENAME=(@var{src_names}=@var{target_names})@dots{}]
 825         [/UNSELECTED=@{RETAIN,DELETE@}]
 826
 827         [/FIELDNAMES]
 828         [/CELLS=@{VALUES,LABELS@}]
 829         [/TEXTOPTIONS DELIMITER='@var{delimiter}']
 830         [/TEXTOPTIONS QUALIFIER='@var{qualifier}']
 831         [/TEXTOPTIONS DECIMAL=@{DOT,COMMA@}]
 832         [/TEXTOPTIONS FORMAT=@{PLAIN,VARIABLE@}]
 833 @end display
 834
 835 The SAVE TRANSLATE command with TYPE=CSV or TYPE=TAB writes data in a
 836 comma- or tab-separated value format similar to that described by
 837 RFC@tie{}4180.  Each variable becomes one output column, and each case
 838 becomes one line of output.  If FIELDNAMES is specified, an additional
 839 line at the top of the output file lists variable names.
 840
 841 The CELLS and TEXTOPTIONS FORMAT settings determine how values are
 842 written to the output file:
 843
 844 @table @asis
 845 @item CELLS=VALUES FORMAT=PLAIN (the default settings)
 846 Writes variables to the output in ``plain'' formats that ignore the
 847 details of variable formats.  Numeric values are written as plain
 848 decimal numbers with enough digits to indicate their exact values in
 849 machine representation.  Numeric values include @samp{e} followed by
 850 an exponent if the exponent value would be less than -4 or greater
 851 than 16.  Dates are written in MM/DD/YYYY format and times in HH:MM:SS
 852 format.  WKDAY and MONTH values are written as decimal numbers.
 853
 854 Numeric values use, by default, the decimal point character set with
 855 SET DECIMAL (@pxref{SET DECIMAL}).  Use DECIMAL=DOT or DECIMAL=COMMA
 856 to force a particular decimal point character.
 857
 858 @item CELLS=VALUES FORMAT=VARIABLE
 859 Writes variables using their print formats.  Leading and trailing
 860 spaces are removed from numeric values, and trailing spaces are
 861 removed from string values.
 862
 863 @item CELLS=LABEL FORMAT=PLAIN
 864 @itemx CELLS=LABEL FORMAT=VARIABLE
 865 Writes value labels where they exist, and otherwise writes the values
 866 themselves as described above.
 867 @end table
 868
 869 Regardless of CELLS and TEXTOPTIONS FORMAT, numeric system-missing
 870 values are output as a single space.
 871
 872 For TYPE=TAB, tab characters delimit values.  For TYPE=CSV, the
 873 TEXTOPTIONS DELIMITER and DECIMAL settings determine the character
 874 that separate values within a line.  If DELIMITER is specified, then
 875 the specified string separate values.  If DELIMITER is not specified,
 876 then the default is a comma with DECIMAL=DOT or a semicolon with
 877 DECIMAL=COMMA.  If DECIMAL is not given either, it is implied by the
 878 decimal point character set with SET DECIMAL (@pxref{SET DECIMAL}).
 879
 880 The TEXTOPTIONS QUALIFIER setting specifies a character that is output
 881 before and after a value that contains the delimiter character or the
 882 qualifier character.  The default is a double quote (@samp{@@}).  A
 883 qualifier character that appears within a value is doubled.
 884
 885 @node SYSFILE INFO
 886 @section SYSFILE INFO
 887 @vindex SYSFILE INFO
 888
 889 @display
 890 SYSFILE INFO FILE='@var{file_name}'.
 891 @end display
 892
 893 @cmd{SYSFILE INFO} reads the dictionary in a system file and
 894 displays the information in its dictionary.
 895
 896 Specify a file name or file handle.  @cmd{SYSFILE INFO} reads that file as
 897 a system file and displays information on its dictionary.
 898
 899 @cmd{SYSFILE INFO} does not affect the current active dataset.
 900
 901 @node XEXPORT
 902 @section XEXPORT
 903 @vindex XEXPORT
 904
 905 @display
 906 XEXPORT
 907         /OUTFILE='@var{file_name}'
 908         /DIGITS=@var{n}
 909         /DROP=@var{var_list}
 910         /KEEP=@var{var_list}
 911         /RENAME=(@var{src_names}=@var{target_names})@dots{}
 912         /TYPE=@{COMM,TAPE@}
 913         /MAP
 914 @end display
 915
 916 The @cmd{EXPORT} transformation writes the active dataset dictionary and
 917 data to a specified portable file.
 918
 919 This transformation is a @pspp{} extension.
 920
 921 It is similar to the @cmd{EXPORT} procedure, with two differences:
 922
 923 @itemize
 924 @item
 925 @cmd{XEXPORT} is a transformation, not a procedure.  It is executed when
 926 the data is read by a procedure or procedure-like command.
 927
 928 @item
 929 @cmd{XEXPORT} does not support the @subcmd{UNSELECTED} subcommand.
 930 @end itemize
 931
 932 @xref{EXPORT}, for more information.
 933
 934 @node XSAVE
 935 @section XSAVE
 936 @vindex XSAVE
 937
 938 @display
 939 XSAVE
 940         /OUTFILE='@var{file_name}'
 941         /@{COMPRESSED,UNCOMPRESSED@}
 942         /PERMISSIONS=@{WRITEABLE,READONLY@}
 943         /DROP=@var{var_list}
 944         /KEEP=@var{var_list}
 945         /VERSION=@var{version}
 946         /RENAME=(@var{src_names}=@var{target_names})@dots{}
 947         /NAMES
 948         /MAP
 949 @end display
 950
 951 The @cmd{XSAVE} transformation writes the active dataset's dictionary and
 952 data to a system file.  It is similar to the @cmd{SAVE}
 953 procedure, with two differences:
 954
 955 @itemize
 956 @item
 957 @cmd{XSAVE} is a transformation, not a procedure.  It is executed when
 958 the data is read by a procedure or procedure-like command.
 959
 960 @item
 961 @cmd{XSAVE} does not support the @subcmd{UNSELECTED} subcommand.
 962 @end itemize
 963
 964 @xref{SAVE}, for more information.