pintos-os.org Git - pspp/blob - doc/files.texi

   1 @c PSPP - a program for statistical analysis.
   2 @c Copyright (C) 2017, 2020 Free Software Foundation, Inc.
   3 @c Permission is granted to copy, distribute and/or modify this document
   4 @c under the terms of the GNU Free Documentation License, Version 1.3
   5 @c or any later version published by the Free Software Foundation;
   6 @c with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
   7 @c A copy of the license is included in the section entitled "GNU
   8 @c Free Documentation License".
   9 @c
  10 @node System and Portable File IO
  11 @chapter System and Portable File I/O
  12
  13 The commands in this chapter read, write, and examine system files and
  14 portable files.
  15
  16 @menu
  17 * APPLY DICTIONARY::            Apply system file dictionary to active dataset.
  18 * EXPORT::                      Write to a portable file.
  19 * GET::                         Read from a system file.
  20 * GET DATA::                    Read from foreign files.
  21 * IMPORT::                      Read from a portable file.
  22 * SAVE::                        Write to a system file.
  23 * SAVE DATA COLLECTION::        Write to a system file and metadata file.
  24 * SAVE TRANSLATE::              Write data in foreign file formats.
  25 * SYSFILE INFO::                Display system file dictionary.
  26 * XEXPORT::                     Write to a portable file, as a transformation.
  27 * XSAVE::                       Write to a system file, as a transformation.
  28 @end menu
  29
  30 @node APPLY DICTIONARY
  31 @section APPLY DICTIONARY
  32 @vindex APPLY DICTIONARY
  33
  34 @display
  35 APPLY DICTIONARY FROM=@{'@var{file_name}',@var{file_handle}@}.
  36 @end display
  37
  38 @cmd{APPLY DICTIONARY} applies the variable labels, value labels,
  39 and missing values taken from a file to corresponding
  40 variables in the active dataset.  In some cases it also updates the
  41 weighting variable.
  42
  43 The @subcmd{FROM} clause is mandatory.  Use it to specify a system
  44 file or portable file's name in single quotes, a data set name
  45 (@pxref{Datasets}), or a file handle name (@pxref{File Handles}).
  46 The dictionary in the file is be read, but it does not replace the active
  47 dataset's dictionary. The file's data is not read.
  48
  49 Only variables with names that exist in both the active dataset and the
  50 system file are considered.  Variables with the same name but different
  51 types (numeric, string) cause an error message.  Otherwise, the
  52 system file variables' attributes replace those in their matching
  53 active dataset variables:
  54
  55 @itemize @bullet
  56 @item
  57 If a system file variable has a variable label, then it replaces
  58 the variable label of the active dataset variable.  If the system
  59 file variable does not have a variable label, then the active dataset
  60 variable's variable label, if any, is retained.
  61
  62 @item
  63 If the system file variable has custom attributes (@pxref{VARIABLE
  64 ATTRIBUTE}), then those attributes replace the active dataset variable's
  65 custom attributes.  If the system file variable does not have custom
  66 attributes, then the active dataset variable's custom attributes, if any,
  67 is retained.
  68
  69 @item
  70 If the active dataset variable is numeric or short string, then value
  71 labels and missing values, if any, are copied to the active dataset
  72 variable.  If the system file variable does not have value labels or
  73 missing values, then those in the active dataset variable, if any, are not
  74 disturbed.
  75 @end itemize
  76
  77 In addition to properties of variables, some properties of the active
  78 file dictionary as a whole are updated:
  79
  80 @itemize @bullet
  81 @item
  82 If the system file has custom attributes (@pxref{DATAFILE ATTRIBUTE}),
  83 then those attributes replace the active dataset variable's custom
  84 attributes.
  85
  86 @item
  87 If the active dataset has a weighting variable (@pxref{WEIGHT}), and the
  88 system file does not, or if the weighting variable in the system file
  89 does not exist in the active dataset, then the active dataset weighting
  90 variable, if any, is retained.  Otherwise, the weighting variable in
  91 the system file becomes the active dataset weighting variable.
  92 @end itemize
  93
  94 @cmd{APPLY DICTIONARY} takes effect immediately.  It does not read the
  95 active dataset.  The system file is not modified.
  96
  97 @node EXPORT
  98 @section EXPORT
  99 @vindex EXPORT
 100
 101 @display
 102 EXPORT
 103         /OUTFILE='@var{file_name}'
 104         /UNSELECTED=@{RETAIN,DELETE@}
 105         /DIGITS=@var{n}
 106         /DROP=@var{var_list}
 107         /KEEP=@var{var_list}
 108         /RENAME=(@var{src_names}=@var{target_names})@dots{}
 109         /TYPE=@{COMM,TAPE@}
 110         /MAP
 111 @end display
 112
 113 The @cmd{EXPORT} procedure writes the active dataset's dictionary and
 114 data to a specified portable file.
 115
 116 By default, cases excluded with FILTER are written to the
 117 file.  These can be excluded by specifying DELETE on the @subcmd{UNSELECTED}
 118 subcommand.  Specifying RETAIN makes the default explicit.
 119
 120 Portable files express real numbers in base 30.  Integers are always
 121 expressed to the maximum precision needed to make them exact.
 122 Non-integers are, by default, expressed to the machine's maximum
 123 natural precision (approximately 15 decimal digits on many machines).
 124 If many numbers require this many digits, the portable file may
 125 significantly increase in size.  As an alternative, the @subcmd{DIGITS}
 126 subcommand may be used to specify the number of decimal digits of
 127 precision to write.  @subcmd{DIGITS} applies only to non-integers.
 128
 129 The @subcmd{OUTFILE} subcommand, which is the only required subcommand, specifies
 130 the portable file to be written as a file name string or
 131 a file handle (@pxref{File Handles}).
 132
 133 @subcmd{DROP}, @subcmd{KEEP}, and @subcmd{RENAME} follow the same format as the
 134 @subcmd{SAVE} procedure (@pxref{SAVE}).
 135
 136 The @subcmd{TYPE} subcommand specifies the character set for use in the
 137 portable file.  Its value is currently not used.
 138
 139 The @subcmd{MAP} subcommand is currently ignored.
 140
 141 @cmd{EXPORT} is a procedure.  It causes the active dataset to be read.
 142
 143 @node GET
 144 @section GET
 145 @vindex GET
 146
 147 @display
 148 GET
 149         /FILE=@{'@var{file_name}',@var{file_handle}@}
 150         /DROP=@var{var_list}
 151         /KEEP=@var{var_list}
 152         /RENAME=(@var{src_names}=@var{target_names})@dots{}
 153         /ENCODING='@var{encoding}'
 154 @end display
 155
 156 @cmd{GET} clears the current dictionary and active dataset and
 157 replaces them with the dictionary and data from a specified file.
 158
 159 The @subcmd{FILE} subcommand is the only required subcommand.  Specify
 160 the SPSS system file, SPSS/PC+ system file, or SPSS portable file to
 161 be read as a string file name or a file handle (@pxref{File Handles}).
 162
 163 By default, all the variables in a file are read.  The DROP
 164 subcommand can be used to specify a list of variables that are not to be
 165 read.  By contrast, the @subcmd{KEEP} subcommand can be used to specify
 166 variable that are to be read, with all other variables not read.
 167
 168 Normally variables in a file retain the names that they were
 169 saved under.  Use the @subcmd{RENAME} subcommand to change these names.
 170 Specify,
 171 within parentheses, a list of variable names followed by an equals sign
 172 (@samp{=}) and the names that they should be renamed to.  Multiple
 173 parenthesized groups of variable names can be included on a single
 174 @subcmd{RENAME} subcommand.
 175 Variables' names may be swapped using a @subcmd{RENAME}
 176 subcommand of the form @subcmd{/RENAME=(@var{A} @var{B}=@var{B} @var{A})}.
 177
 178 Alternate syntax for the @subcmd{RENAME} subcommand allows the parentheses to be
 179 eliminated.  When this is done, only a single variable may be renamed at
 180 once.  For instance, @subcmd{/RENAME=@var{A}=@var{B}}.  This alternate syntax is
 181 deprecated.
 182
 183 @subcmd{DROP}, @subcmd{KEEP}, and @subcmd{RENAME} are executed in left-to-right order.
 184 Each may be present any number of times.  @cmd{GET} never modifies a
 185 file on disk.  Only the active dataset read from the file
 186 is affected by these subcommands.
 187
 188 @pspp{} automatically detects the encoding of string data in the file,
 189 when possible.  The character encoding of old SPSS system files cannot
 190 always be guessed correctly, and SPSS/PC+ system files do not include
 191 any indication of their encoding.  Specify the @subcmd{ENCODING}
 192 subcommand with an @acronym{IANA} character set name as its string
 193 argument to override the default.  Use @cmd{SYSFILE INFO} to analyze
 194 the encodings that might be valid for a system file.  The
 195 @subcmd{ENCODING} subcommand is a @pspp{} extension.
 196
 197 @cmd{GET} does not cause the data to be read, only the dictionary.  The data
 198 is read later, when a procedure is executed.
 199
 200 Use of @cmd{GET} to read a portable file is a @pspp{} extension.
 201
 202 @node GET DATA
 203 @section GET DATA
 204 @vindex GET DATA
 205
 206 @display
 207 GET DATA
 208         /TYPE=@{GNM,ODS,PSQL,TXT@}
 209         @dots{}additional subcommands depending on TYPE@dots{}
 210 @end display
 211
 212 The @cmd{GET DATA} command is used to read files and other data
 213 sources created by other applications.  When this command is executed,
 214 the current dictionary and active dataset are replaced with variables
 215 and data read from the specified source.
 216
 217 The @subcmd{TYPE} subcommand is mandatory and must be the first subcommand
 218 specified.  It determines the type of the file or source to read.
 219 @pspp{} currently supports the following file types:
 220
 221 @table @asis
 222 @item GNM
 223 Spreadsheet files created by Gnumeric (@url{http://gnumeric.org}).
 224
 225 @item ODS
 226 Spreadsheet files in OpenDocument format (@url{http://opendocumentformat.org}).
 227
 228 @item PSQL
 229 Relations from PostgreSQL databases (@url{http://postgresql.org}).
 230
 231 @item TXT
 232 Textual data files in columnar and delimited formats.
 233 @end table
 234
 235 Each supported file type has additional subcommands, explained in
 236 separate sections below.
 237
 238 @menu
 239 * GET DATA /TYPE=GNM/ODS::     Spreadsheets
 240 * GET DATA /TYPE=PSQL::        Databases
 241 * GET DATA /TYPE=TXT::         Delimited Text Files
 242 @end menu
 243
 244 @node GET DATA /TYPE=GNM/ODS
 245 @subsection Spreadsheet Files
 246
 247 @display
 248 GET DATA /TYPE=@{GNM, ODS@}
 249         /FILE=@{'@var{file_name}'@}
 250         /SHEET=@{NAME '@var{sheet_name}', INDEX @var{n}@}
 251         /CELLRANGE=@{RANGE '@var{range}', FULL@}
 252         /READNAMES=@{ON, OFF@}
 253         /ASSUMEDSTRWIDTH=@var{n}.
 254 @end display
 255
 256 @cindex Gnumeric
 257 @cindex OpenDocument
 258 @cindex spreadsheet files
 259
 260 Gnumeric spreadsheets (@url{http://gnumeric.org}), and spreadsheets
 261 in OpenDocument format
 262 (@url{http://libreplanet.org/wiki/Group:OpenDocument/Software})
 263 can be read using the @cmd{GET DATA} command.
 264 Use the @subcmd{TYPE} subcommand to indicate the file's format.
 265 /TYPE=GNM indicates Gnumeric files,
 266 /TYPE=ODS indicates OpenDocument.
 267 The @subcmd{FILE} subcommand is mandatory.
 268 Use it to specify the name file to be read.
 269 All other subcommands are optional.
 270
 271 The format of each variable is determined by the format of the spreadsheet
 272 cell containing the first datum for the variable.
 273 If this cell is of string (text) format, then the width of the variable is
 274 determined from the length of the string it contains, unless the
 275 @subcmd{ASSUMEDSTRWIDTH} subcommand is given.
 276
 277 The @subcmd{SHEET} subcommand specifies the sheet within the spreadsheet file to read.
 278 There are two forms of the @subcmd{SHEET} subcommand.
 279 In the first form,
 280 @subcmd{/SHEET=name @var{sheet_name}}, the string @var{sheet_name} is the
 281 name of the sheet to read.
 282 In the second form, @subcmd{/SHEET=index @var{idx}}, @var{idx} is a
 283 integer which is the index of the sheet to read.
 284 The first sheet has the index 1.
 285 If the @subcmd{SHEET} subcommand is omitted, then the command reads the
 286 first sheet in the file.
 287
 288 The @subcmd{CELLRANGE} subcommand specifies the range of cells within the sheet to read.
 289 If the subcommand is given as @subcmd{/CELLRANGE=FULL}, then the entire
 290 sheet  is read.
 291 To read only part of a sheet, use the form
 292 @subcmd{/CELLRANGE=range '@var{top_left_cell}:@var{bottom_right_cell}'}.
 293 For example, the subcommand @subcmd{/CELLRANGE=range 'C3:P19'} reads
 294 columns C--P, and rows 3--19 inclusive.
 295 If no @subcmd{CELLRANGE} subcommand is given, then the entire sheet is read.
 296
 297 If @subcmd{/READNAMES=ON} is specified, then the contents of cells of
 298 the first row are used as the names of the variables in which to store
 299 the data from subsequent rows.  This is the default.
 300 If @subcmd{/READNAMES=OFF} is
 301 used, then the variables  receive automatically assigned names.
 302
 303 The @subcmd{ASSUMEDSTRWIDTH} subcommand specifies the maximum width of string
 304 variables read  from the file.
 305 If omitted, the default value is determined from the length of the
 306 string in the first spreadsheet cell for each variable.
 307
 308
 309 @node GET DATA /TYPE=PSQL
 310 @subsection Postgres Database Queries
 311
 312 @display
 313 GET DATA /TYPE=PSQL
 314          /CONNECT=@{@var{connection info}@}
 315          /SQL=@{@var{query}@}
 316          [/ASSUMEDSTRWIDTH=@var{w}]
 317          [/UNENCRYPTED]
 318          [/BSIZE=@var{n}].
 319 @end display
 320
 321 @cindex postgres
 322 @cindex databases
 323
 324 The PSQL type is used to import data from a postgres database server.
 325 The server may be located locally or remotely.
 326 Variables are automatically created based on the table column names
 327 or the names specified in the SQL query.
 328 Postgres data types of high precision, loose precision when
 329 imported into @pspp{}.
 330 Not all the postgres data types are able to be represented in @pspp{}.
 331 If a datum cannot be represented then @cmd{GET DATA} issues a warning
 332 and that datum is set to SYSMIS.
 333
 334 The @subcmd{CONNECT} subcommand is mandatory.
 335 It is a string specifying the parameters of the database server from
 336 which the data should be fetched.
 337 The format of the string is given in the postgres manual
 338 @url{http://www.postgresql.org/docs/8.0/static/libpq.html#LIBPQ-CONNECT}.
 339
 340 The @subcmd{SQL} subcommand is mandatory.
 341 It must be a valid SQL string to retrieve data from the database.
 342
 343 The @subcmd{ASSUMEDSTRWIDTH} subcommand specifies the maximum width of string
 344 variables read  from the database.
 345 If omitted, the default value is determined from the length of the
 346 string in the first value read for each variable.
 347
 348 The @subcmd{UNENCRYPTED} subcommand allows data to be retrieved over an insecure
 349 connection.
 350 If the connection is not encrypted, and the @subcmd{UNENCRYPTED} subcommand is
 351 not given, then an error occurs.
 352 Whether or not the connection is
 353 encrypted depends upon the underlying psql library and the
 354 capabilities of the database server.
 355
 356 The @subcmd{BSIZE} subcommand serves only to optimise the speed of data transfer.
 357 It specifies an upper limit on
 358 number of cases to fetch from the database at once.
 359 The default value is 4096.
 360 If your SQL statement fetches a large number of cases but only a small number of
 361 variables, then the data transfer may be faster if you increase this value.
 362 Conversely, if the number of variables is large, or if the machine on which
 363 @pspp{} is running has only a
 364 small amount of memory, then a smaller value is probably better.
 365
 366
 367 The following syntax is an example:
 368 @example
 369 GET DATA /TYPE=PSQL
 370      /CONNECT='host=example.com port=5432 dbname=product user=fred passwd=xxxx'
 371      /SQL='select * from manufacturer'.
 372 @end example
 373
 374
 375 @node GET DATA /TYPE=TXT
 376 @subsection Textual Data Files
 377
 378 @display
 379 GET DATA /TYPE=TXT
 380         /FILE=@{'@var{file_name}',@var{file_handle}@}
 381         [ENCODING='@var{encoding}']
 382         [/ARRANGEMENT=@{DELIMITED,FIXED@}]
 383         [/FIRSTCASE=@{@var{first_case}@}]
 384         [/IMPORTCASES=...]
 385         @dots{}additional subcommands depending on ARRANGEMENT@dots{}
 386 @end display
 387
 388 @cindex text files
 389 @cindex data files
 390 When TYPE=TXT is specified, GET DATA reads data in a delimited or
 391 fixed columnar format, much like DATA LIST (@pxref{DATA LIST}).
 392
 393 The @subcmd{FILE} subcommand is mandatory.  Specify the file to be read as
 394 a string file name or (for textual data only) a
 395 file handle (@pxref{File Handles}).
 396
 397 The @subcmd{ENCODING} subcommand specifies the character encoding of
 398 the file to be read.  @xref{INSERT}, for information on supported
 399 encodings.
 400
 401 The @subcmd{ARRANGEMENT} subcommand determines the file's basic format.
 402 DELIMITED, the default setting, specifies that fields in the input
 403 data are separated by spaces, tabs, or other user-specified
 404 delimiters.  FIXED specifies that fields in the input data appear at
 405 particular fixed column positions within records of a case.
 406
 407 By default, cases are read from the input file starting from the first
 408 line.  To skip lines at the beginning of an input file, set @subcmd{FIRSTCASE}
 409 to the number of the first line to read: 2 to skip the first line, 3
 410 to skip the first two lines, and so on.
 411
 412 @subcmd{IMPORTCASES} is ignored, for compatibility.  Use @cmd{N OF
 413 CASES} to limit the number of cases read from a file (@pxref{N OF
 414 CASES}), or @cmd{SAMPLE} to obtain a random sample of cases
 415 (@pxref{SAMPLE}).
 416
 417 The remaining subcommands apply only to one of the two file
 418 arrangements, described below.
 419
 420 @menu
 421 * GET DATA /TYPE=TXT /ARRANGEMENT=DELIMITED::
 422 * GET DATA /TYPE=TXT /ARRANGEMENT=FIXED::
 423 @end menu
 424
 425 @node GET DATA /TYPE=TXT /ARRANGEMENT=DELIMITED
 426 @subsubsection Reading Delimited Data
 427
 428 @display
 429 GET DATA /TYPE=TXT
 430         /FILE=@{'@var{file_name}',@var{file_handle}@}
 431         [/ARRANGEMENT=@{DELIMITED,FIXED@}]
 432         [/FIRSTCASE=@{@var{first_case}@}]
 433         [/IMPORTCASE=@{ALL,FIRST @var{max_cases},PERCENT @var{percent}@}]
 434
 435         /DELIMITERS="@var{delimiters}"
 436         [/QUALIFIER="@var{quotes}"
 437         [/DELCASE=@{LINE,VARIABLES @var{n_variables}@}]
 438         /VARIABLES=@var{del_var1} [@var{del_var2}]@dots{}
 439 where each @var{del_var} takes the form:
 440         variable format
 441 @end display
 442
 443 The GET DATA command with TYPE=TXT and ARRANGEMENT=DELIMITED reads
 444 input data from text files in delimited format, where fields are
 445 separated by a set of user-specified delimiters.  Its capabilities are
 446 similar to those of DATA LIST FREE (@pxref{DATA LIST FREE}), with a
 447 few enhancements.
 448
 449 The required @subcmd{FILE} subcommand and optional @subcmd{FIRSTCASE} and @subcmd{IMPORTCASE}
 450 subcommands are described above (@pxref{GET DATA /TYPE=TXT}).
 451
 452 @subcmd{DELIMITERS}, which is required, specifies the set of characters that
 453 may separate fields.  Each character in the string specified on
 454 @subcmd{DELIMITERS} separates one field from the next.  The end of a line also
 455 separates fields, regardless of @subcmd{DELIMITERS}.  Two consecutive
 456 delimiters in the input yield an empty field, as does a delimiter at
 457 the end of a line.  A space character as a delimiter is an exception:
 458 consecutive spaces do not yield an empty field and neither does any
 459 number of spaces at the end of a line.
 460
 461 To use a tab as a delimiter, specify @samp{\t} at the beginning of the
 462 @subcmd{DELIMITERS} string.  To use a backslash as a delimiter, specify
 463 @samp{\\} as the first delimiter or, if a tab should also be a
 464 delimiter, immediately following @samp{\t}.  To read a data file in
 465 which each field appears on a separate line, specify the empty string
 466 for @subcmd{DELIMITERS}.
 467
 468 The optional @subcmd{QUALIFIER} subcommand names one or more characters that
 469 can be used to quote values within fields in the input.  A field that
 470 begins with one of the specified quote characters ends at the next
 471 matching quote.  Intervening delimiters become part of the field,
 472 instead of terminating it.  The ability to specify more than one quote
 473 character is a @pspp{} extension.
 474
 475 The character specified on @subcmd{QUALIFIER} can be embedded within a
 476 field that it quotes by doubling the qualifier.  For example, if
 477 @samp{'} is specified on @subcmd{QUALIFIER}, then @code{'a''b'}
 478 specifies a field that contains @samp{a'b}.
 479
 480 The @subcmd{DELCASE} subcommand controls how data may be broken across lines in
 481 the data file.  With LINE, the default setting, each line must contain
 482 all the data for exactly one case.  For additional flexibility, to
 483 allow a single case to be split among lines or multiple cases to be
 484 contained on a single line, specify VARIABLES @i{n_variables}, where
 485 @i{n_variables} is the number of variables per case.
 486
 487 The @subcmd{VARIABLES} subcommand is required and must be the last subcommand.
 488 Specify the name of each variable and its input format (@pxref{Input
 489 and Output Formats}) in the order they should be read from the input
 490 file.
 491
 492 @subsubheading Examples
 493
 494 @noindent
 495 On a Unix-like system, the @samp{/etc/passwd} file has a format
 496 similar to this:
 497
 498 @example
 499 root:$1$nyeSP5gD$pDq/:0:0:,,,:/root:/bin/bash
 500 blp:$1$BrP/pFg4$g7OG:1000:1000:Ben Pfaff,,,:/home/blp:/bin/bash
 501 john:$1$JBuq/Fioq$g4A:1001:1001:John Darrington,,,:/home/john:/bin/bash
 502 jhs:$1$D3li4hPL$88X1:1002:1002:Jason Stover,,,:/home/jhs:/bin/csh
 503 @end example
 504
 505 @noindent
 506 The following syntax reads a file in the format used by
 507 @samp{/etc/passwd}:
 508
 509 @c If you change this example, change the regression test in
 510 @c tests/language/data-io/get-data.at to match.
 511 @example
 512 GET DATA /TYPE=TXT /FILE='/etc/passwd' /DELIMITERS=':'
 513         /VARIABLES=username A20
 514                    password A40
 515                    uid F10
 516                    gid F10
 517                    gecos A40
 518                    home A40
 519                    shell A40.
 520 @end example
 521
 522 @noindent
 523 Consider the following data on used cars:
 524
 525 @example
 526 model   year    mileage price   type    age
 527 Civic   2002    29883   15900   Si      2
 528 Civic   2003    13415   15900   EX      1
 529 Civic   1992    107000  3800    n/a     12
 530 Accord  2002    26613   17900   EX      1
 531 @end example
 532
 533 @noindent
 534 The following syntax can be used to read the used car data:
 535
 536 @c If you change this example, change the regression test in
 537 @c tests/language/data-io/get-data.at to match.
 538 @example
 539 GET DATA /TYPE=TXT /FILE='cars.data' /DELIMITERS=' ' /FIRSTCASE=2
 540         /VARIABLES=model A8
 541                    year F4
 542                    mileage F6
 543                    price F5
 544                    type A4
 545                    age F2.
 546 @end example
 547
 548 @noindent
 549 Consider the following information on animals in a pet store:
 550
 551 @example
 552 'Pet''s Name', "Age", "Color", "Date Received", "Price", "Height", "Type"
 553 , (Years), , , (Dollars), ,
 554 "Rover", 4.5, Brown, "12 Feb 2004", 80, '1''4"', "Dog"
 555 "Charlie", , Gold, "5 Apr 2007", 12.3, "3""", "Fish"
 556 "Molly", 2, Black, "12 Dec 2006", 25, '5"', "Cat"
 557 "Gilly", , White, "10 Apr 2007", 10, "3""", "Guinea Pig"
 558 @end example
 559
 560 @noindent
 561 The following syntax can be used to read the pet store data:
 562
 563 @c If you change this example, change the regression test in
 564 @c tests/language/data-io/get-data.at to match.
 565 @example
 566 GET DATA /TYPE=TXT /FILE='pets.data' /DELIMITERS=', ' /QUALIFIER='''"' /ESCAPE
 567         /FIRSTCASE=3
 568         /VARIABLES=name A10
 569                    age F3.1
 570                    color A5
 571                    received EDATE10
 572                    price F5.2
 573                    height a5
 574                    type a10.
 575 @end example
 576
 577 @node GET DATA /TYPE=TXT /ARRANGEMENT=FIXED
 578 @subsubsection Reading Fixed Columnar Data
 579
 580 @c (modify-syntax-entry ?_ "w")
 581 @c (modify-syntax-entry ?' "'")
 582 @c (modify-syntax-entry ?@ "'")
 583
 584 @display
 585 GET DATA /TYPE=TXT
 586         /FILE=@{'file_name',@var{file_handle}@}
 587         [/ARRANGEMENT=@{DELIMITED,FIXED@}]
 588         [/FIRSTCASE=@{@var{first_case}@}]
 589         [/IMPORTCASE=@{ALL,FIRST @var{max_cases},PERCENT @var{percent}@}]
 590
 591         [/FIXCASE=@var{n}]
 592         /VARIABLES @var{fixed_var} [@var{fixed_var}]@dots{}
 593             [/rec# @var{fixed_var} [@var{fixed_var}]@dots{}]@dots{}
 594 where each @var{fixed_var} takes the form:
 595         @var{variable} @var{start}-@var{end} @var{format}
 596 @end display
 597
 598 The @cmd{GET DATA} command with TYPE=TXT and ARRANGEMENT=FIXED reads input
 599 data from text files in fixed format, where each field is located in
 600 particular fixed column positions within records of a case.  Its
 601 capabilities are similar to those of DATA LIST FIXED (@pxref{DATA LIST
 602 FIXED}), with a few enhancements.
 603
 604 The required @subcmd{FILE} subcommand and optional @subcmd{FIRSTCASE} and @subcmd{IMPORTCASE}
 605 subcommands are described above (@pxref{GET DATA /TYPE=TXT}).
 606
 607 The optional @subcmd{FIXCASE} subcommand may be used to specify the positive
 608 integer number of input lines that make up each case.  The default
 609 value is 1.
 610
 611 The @subcmd{VARIABLES} subcommand, which is required, specifies the positions
 612 at which each variable can be found.  For each variable, specify its
 613 name, followed by its start and end column separated by @samp{-}
 614 (@i{e.g.}@: @samp{0-9}), followed by an input format type (@i{e.g.}@:
 615 @samp{F}) or a full format specification (@i{e.g.}@: @samp{DOLLAR12.2}).
 616 For this command, columns are numbered starting from 0 at
 617 the left column.  Introduce the variables in the second and later
 618 lines of a case by a slash followed by the number of the line within
 619 the case, @i{e.g.}@: @samp{/2} for the second line.
 620
 621 @subsubheading Examples
 622
 623 @noindent
 624 Consider the following data on used cars:
 625
 626 @example
 627 model   year    mileage price   type    age
 628 Civic   2002    29883   15900   Si      2
 629 Civic   2003    13415   15900   EX      1
 630 Civic   1992    107000  3800    n/a     12
 631 Accord  2002    26613   17900   EX      1
 632 @end example
 633
 634 @noindent
 635 The following syntax can be used to read the used car data:
 636
 637 @c If you change this example, change the regression test in
 638 @c tests/language/data-io/get-data.at to match.
 639 @example
 640 GET DATA /TYPE=TXT /FILE='cars.data' /ARRANGEMENT=FIXED /FIRSTCASE=2
 641         /VARIABLES=model 0-7 A
 642                    year 8-15 F
 643                    mileage 16-23 F
 644                    price 24-31 F
 645                    type 32-40 A
 646                    age 40-47 F.
 647 @end example
 648
 649 @node IMPORT
 650 @section IMPORT
 651 @vindex IMPORT
 652
 653 @display
 654 IMPORT
 655         /FILE='@var{file_name}'
 656         /TYPE=@{COMM,TAPE@}
 657         /DROP=@var{var_list}
 658         /KEEP=@var{var_list}
 659         /RENAME=(@var{src_names}=@var{target_names})@dots{}
 660 @end display
 661
 662 The @cmd{IMPORT} transformation clears the active dataset dictionary and
 663 data and
 664 replaces them with a dictionary and data from a system file or
 665 portable file.
 666
 667 The @subcmd{FILE} subcommand, which is the only required subcommand, specifies
 668 the portable file to be read as a file name string or a file handle
 669 (@pxref{File Handles}).
 670
 671 The @subcmd{TYPE} subcommand is currently not used.
 672
 673 @subcmd{DROP}, @subcmd{KEEP}, and @subcmd{RENAME} follow the syntax used by @cmd{GET} (@pxref{GET}).
 674
 675 @cmd{IMPORT} does not cause the data to be read; only the dictionary.  The
 676 data is read later, when a procedure is executed.
 677
 678 Use of @cmd{IMPORT} to read a system file is a @pspp{} extension.
 679
 680 @node SAVE
 681 @section SAVE
 682 @vindex SAVE
 683
 684 @display
 685 SAVE
 686         /OUTFILE=@{'@var{file_name}',@var{file_handle}@}
 687         /UNSELECTED=@{RETAIN,DELETE@}
 688         /@{UNCOMPRESSED,COMPRESSED,ZCOMPRESSED@}
 689         /PERMISSIONS=@{WRITEABLE,READONLY@}
 690         /DROP=@var{var_list}
 691         /KEEP=@var{var_list}
 692         /VERSION=@var{version}
 693         /RENAME=(@var{src_names}=@var{target_names})@dots{}
 694         /NAMES
 695         /MAP
 696 @end display
 697
 698 The @cmd{SAVE} procedure causes the dictionary and data in the active
 699 dataset to
 700 be written to a system file.
 701
 702 OUTFILE is the only required subcommand.  Specify the system file
 703 to be written as a string file name or a file handle
 704 (@pxref{File Handles}).
 705
 706 By default, cases excluded with FILTER are written to the system file.
 707 These can be excluded by specifying @subcmd{DELETE} on the @subcmd{UNSELECTED}
 708 subcommand.  Specifying @subcmd{RETAIN} makes the default explicit.
 709
 710 The @subcmd{UNCOMPRESSED}, @subcmd{COMPRESSED}, and
 711 @subcmd{ZCOMPRESSED} subcommand determine the system file's
 712 compression level:
 713
 714 @table @code
 715 @item UNCOMPRESSED
 716 Data is not compressed.  Each numeric value uses 8 bytes of disk
 717 space.  Each string value uses one byte per column width, rounded up
 718 to a multiple of 8 bytes.
 719
 720 @item COMPRESSED
 721 Data is compressed with a simple algorithm.  Each integer numeric
 722 value between @minus{}99 and 151, inclusive, or system missing value
 723 uses one byte of disk space.  Each 8-byte segment of a string that
 724 consists only of spaces uses 1 byte.  Any other numeric value or
 725 8-byte string segment uses 9 bytes of disk space.
 726
 727 @item ZCOMPRESSED
 728 Data is compressed with the ``deflate'' compression algorithm
 729 specified in RFC@tie{}1951 (the same algorithm used by
 730 @command{gzip}).  Files written with this compression level cannot be
 731 read by PSPP 0.8.1 or earlier or by SPSS 20 or earlier.
 732 @end table
 733
 734 @subcmd{COMPRESSED} is the default compression level.  The SET command
 735 (@pxref{SET}) can change this default.
 736
 737 The @subcmd{PERMISSIONS} subcommand specifies permissions for the new system
 738 file.  WRITEABLE, the default, creates the file with read and write
 739 permission.  READONLY creates the file for read-only access.
 740
 741 By default, all the variables in the active dataset dictionary are written
 742 to the system file.  The @subcmd{DROP} subcommand can be used to specify a list
 743 of variables not to be written.  In contrast, KEEP specifies variables
 744 to be written, with all variables not specified not written.
 745
 746 Normally variables are saved to a system file under the same names they
 747 have in the active dataset.  Use the @subcmd{RENAME} subcommand to change these names.
 748 Specify, within parentheses, a list of variable names followed by an
 749 equals sign (@samp{=}) and the names that they should be renamed to.
 750 Multiple parenthesized groups of variable names can be included on a
 751 single @subcmd{RENAME} subcommand.  Variables' names may be swapped using a
 752 @subcmd{RENAME} subcommand of the
 753 form @subcmd{/RENAME=(@var{A} @var{B}=@var{B} @var{A})}.
 754
 755 Alternate syntax for the @subcmd{RENAME} subcommand allows the parentheses to be
 756 eliminated.  When this is done, only a single variable may be renamed at
 757 once.  For instance, @subcmd{/RENAME=@var{A}=@var{B}}.  This alternate syntax is
 758 deprecated.
 759
 760 @subcmd{DROP}, @subcmd{KEEP}, and @subcmd{RENAME} are performed in
 761 left-to-right order.  They
 762 each may be present any number of times.  @cmd{SAVE} never modifies
 763 the active dataset.  @subcmd{DROP}, @subcmd{KEEP}, and @subcmd{RENAME} only
 764 affect the system file written to disk.
 765
 766 The @subcmd{VERSION} subcommand specifies the version of the file format. Valid
 767 versions are 2 and 3.  The default version is 3.  In version 2 system
 768 files, variable names longer than 8 bytes are truncated.  The two
 769 versions are otherwise identical.
 770
 771 The @subcmd{NAMES} and @subcmd{MAP} subcommands are currently ignored.
 772
 773 @cmd{SAVE} causes the data to be read.  It is a procedure.
 774
 775 @node SAVE DATA COLLECTION
 776 @section SAVE DATA COLLECTION
 777 @vindex SAVE DATA COLLECTION
 778
 779 @display
 780 SAVE DATA COLLECTION
 781         /OUTFILE=@{'@var{file_name}',@var{file_handle}@}
 782         /METADATA=@{'@var{file_name}',@var{file_handle}@}
 783         /@{UNCOMPRESSED,COMPRESSED,ZCOMPRESSED@}
 784         /PERMISSIONS=@{WRITEABLE,READONLY@}
 785         /DROP=@var{var_list}
 786         /KEEP=@var{var_list}
 787         /VERSION=@var{version}
 788         /RENAME=(@var{src_names}=@var{target_names})@dots{}
 789         /NAMES
 790         /MAP
 791 @end display
 792
 793 Like @cmd{SAVE}, @cmd{SAVE DATA COLLECTION} writes the dictionary and
 794 data in the active dataset to a system file.  In addition, it writes
 795 metadata to an additional XML metadata file.
 796
 797 OUTFILE is required.  Specify the system file to be written as a
 798 string file name or a file handle (@pxref{File Handles}).
 799
 800 METADATA is also required.  Specify the metadata file to be written as
 801 a string file name or a file handle.  Metadata files customarily use a
 802 @file{.mdd} extension.
 803
 804 The current implementation of this command is experimental.  It only
 805 outputs an approximation of the metadata file format.  Please report
 806 bugs.
 807
 808 Other subcommands are optional.  They have the same meanings as in the
 809 @cmd{SAVE} command.
 810
 811 @cmd{SAVE DATA COLLECTION} causes the data to be read.  It is a
 812 procedure.
 813
 814 @node SAVE TRANSLATE
 815 @section SAVE TRANSLATE
 816 @vindex SAVE TRANSLATE
 817
 818 @display
 819 SAVE TRANSLATE
 820         /OUTFILE=@{'@var{file_name}',@var{file_handle}@}
 821         /TYPE=@{CSV,TAB@}
 822         [/REPLACE]
 823         [/MISSING=@{IGNORE,RECODE@}]
 824
 825         [/DROP=@var{var_list}]
 826         [/KEEP=@var{var_list}]
 827         [/RENAME=(@var{src_names}=@var{target_names})@dots{}]
 828         [/UNSELECTED=@{RETAIN,DELETE@}]
 829         [/MAP]
 830
 831         @dots{}additional subcommands depending on TYPE@dots{}
 832 @end display
 833
 834 The @cmd{SAVE TRANSLATE} command is used to save data into various
 835 formats understood by other applications.
 836
 837 The @subcmd{OUTFILE} and @subcmd{TYPE} subcommands are mandatory.
 838 @subcmd{OUTFILE} specifies the file to be written, as a string file name or a file handle
 839 (@pxref{File Handles}).  @subcmd{TYPE} determines the type of the file or
 840 source to read.  It must be one of the following:
 841
 842 @table @asis
 843 @item CSV
 844 Comma-separated value format,
 845
 846 @item TAB
 847 Tab-delimited format.
 848 @end table
 849
 850 By default, @cmd{SAVE TRANSLATE} does not overwrite an existing file.  Use
 851 @subcmd{REPLACE} to force an existing file to be overwritten.
 852
 853 With MISSING=IGNORE, the default, @subcmd{SAVE TRANSLATE} treats user-missing
 854 values as if they were not missing.  Specify MISSING=RECODE to output
 855 numeric user-missing values like system-missing values and string
 856 user-missing values as all spaces.
 857
 858 By default, all the variables in the active dataset dictionary are
 859 saved to the system file, but @subcmd{DROP} or @subcmd{KEEP} can
 860 select a subset of variable to save.  The @subcmd{RENAME} subcommand
 861 can also be used to change the names under which variables are saved;
 862 because they are used only in the output, these names do not have to
 863 conform to the usual PSPP variable naming rules.  @subcmd{UNSELECTED}
 864 determines whether cases filtered out by the @cmd{FILTER} command are
 865 written to the output file.  These subcommands have the same syntax
 866 and meaning as on the @cmd{SAVE} command (@pxref{SAVE}).
 867
 868 Each supported file type has additional subcommands, explained in
 869 separate sections below.
 870
 871 @cmd{SAVE TRANSLATE} causes the data to be read.  It is a procedure.
 872
 873 @menu
 874 * SAVE TRANSLATE /TYPE=CSV and TYPE=TAB::
 875 @end menu
 876
 877 @node SAVE TRANSLATE /TYPE=CSV and TYPE=TAB
 878 @subsection Writing Comma- and Tab-Separated Data Files
 879
 880 @display
 881 SAVE TRANSLATE
 882         /OUTFILE=@{'@var{file_name}',@var{file_handle}@}
 883         /TYPE=CSV
 884         [/REPLACE]
 885         [/MISSING=@{IGNORE,RECODE@}]
 886
 887         [/DROP=@var{var_list}]
 888         [/KEEP=@var{var_list}]
 889         [/RENAME=(@var{src_names}=@var{target_names})@dots{}]
 890         [/UNSELECTED=@{RETAIN,DELETE@}]
 891
 892         [/FIELDNAMES]
 893         [/CELLS=@{VALUES,LABELS@}]
 894         [/TEXTOPTIONS DELIMITER='@var{delimiter}']
 895         [/TEXTOPTIONS QUALIFIER='@var{qualifier}']
 896         [/TEXTOPTIONS DECIMAL=@{DOT,COMMA@}]
 897         [/TEXTOPTIONS FORMAT=@{PLAIN,VARIABLE@}]
 898 @end display
 899
 900 The SAVE TRANSLATE command with TYPE=CSV or TYPE=TAB writes data in a
 901 comma- or tab-separated value format similar to that described by
 902 RFC@tie{}4180.  Each variable becomes one output column, and each case
 903 becomes one line of output.  If FIELDNAMES is specified, an additional
 904 line at the top of the output file lists variable names.
 905
 906 The CELLS and TEXTOPTIONS FORMAT settings determine how values are
 907 written to the output file:
 908
 909 @table @asis
 910 @item CELLS=VALUES FORMAT=PLAIN (the default settings)
 911 Writes variables to the output in ``plain'' formats that ignore the
 912 details of variable formats.  Numeric values are written as plain
 913 decimal numbers with enough digits to indicate their exact values in
 914 machine representation.  Numeric values include @samp{e} followed by
 915 an exponent if the exponent value would be less than -4 or greater
 916 than 16.  Dates are written in MM/DD/YYYY format and times in HH:MM:SS
 917 format.  WKDAY and MONTH values are written as decimal numbers.
 918
 919 Numeric values use, by default, the decimal point character set with
 920 SET DECIMAL (@pxref{SET DECIMAL}).  Use DECIMAL=DOT or DECIMAL=COMMA
 921 to force a particular decimal point character.
 922
 923 @item CELLS=VALUES FORMAT=VARIABLE
 924 Writes variables using their print formats.  Leading and trailing
 925 spaces are removed from numeric values, and trailing spaces are
 926 removed from string values.
 927
 928 @item CELLS=LABEL FORMAT=PLAIN
 929 @itemx CELLS=LABEL FORMAT=VARIABLE
 930 Writes value labels where they exist, and otherwise writes the values
 931 themselves as described above.
 932 @end table
 933
 934 Regardless of CELLS and TEXTOPTIONS FORMAT, numeric system-missing
 935 values are output as a single space.
 936
 937 For TYPE=TAB, tab characters delimit values.  For TYPE=CSV, the
 938 TEXTOPTIONS DELIMITER and DECIMAL settings determine the character
 939 that separate values within a line.  If DELIMITER is specified, then
 940 the specified string separate values.  If DELIMITER is not specified,
 941 then the default is a comma with DECIMAL=DOT or a semicolon with
 942 DECIMAL=COMMA.  If DECIMAL is not given either, it is implied by the
 943 decimal point character set with SET DECIMAL (@pxref{SET DECIMAL}).
 944
 945 The TEXTOPTIONS QUALIFIER setting specifies a character that is output
 946 before and after a value that contains the delimiter character or the
 947 qualifier character.  The default is a double quote (@samp{"}).  A
 948 qualifier character that appears within a value is doubled.
 949
 950 @node SYSFILE INFO
 951 @section SYSFILE INFO
 952 @vindex SYSFILE INFO
 953
 954 @display
 955 SYSFILE INFO FILE='@var{file_name}' [ENCODING='@var{encoding}'].
 956 @end display
 957
 958 @cmd{SYSFILE INFO} reads the dictionary in an SPSS system file,
 959 SPSS/PC+ system file, or SPSS portable file, and displays the
 960 information in its dictionary.
 961
 962 Specify a file name or file handle.  @cmd{SYSFILE INFO} reads that
 963 file and displays information on its dictionary.
 964
 965 @pspp{} automatically detects the encoding of string data in the file,
 966 when possible.  The character encoding of old SPSS system files cannot
 967 always be guessed correctly, and SPSS/PC+ system files do not include
 968 any indication of their encoding.  Specify the @subcmd{ENCODING}
 969 subcommand with an @acronym{IANA} character set name as its string
 970 argument to override the default, or specify @code{ENCODING='DETECT'}
 971 to analyze and report possibly valid encodings for the system file.
 972 The @subcmd{ENCODING} subcommand is a @pspp{} extension.
 973
 974 @cmd{SYSFILE INFO} does not affect the current active dataset.
 975
 976 @node XEXPORT
 977 @section XEXPORT
 978 @vindex XEXPORT
 979
 980 @display
 981 XEXPORT
 982         /OUTFILE='@var{file_name}'
 983         /DIGITS=@var{n}
 984         /DROP=@var{var_list}
 985         /KEEP=@var{var_list}
 986         /RENAME=(@var{src_names}=@var{target_names})@dots{}
 987         /TYPE=@{COMM,TAPE@}
 988         /MAP
 989 @end display
 990
 991 The @cmd{XEXPORT} transformation writes the active dataset dictionary and
 992 data to a specified portable file.
 993
 994 This transformation is a @pspp{} extension.
 995
 996 It is similar to the @cmd{EXPORT} procedure, with two differences:
 997
 998 @itemize
 999 @item
1000 @cmd{XEXPORT} is a transformation, not a procedure.  It is executed when
1001 the data is read by a procedure or procedure-like command.
1002
1003 @item
1004 @cmd{XEXPORT} does not support the @subcmd{UNSELECTED} subcommand.
1005 @end itemize
1006
1007 @xref{EXPORT}, for more information.
1008
1009 @node XSAVE
1010 @section XSAVE
1011 @vindex XSAVE
1012
1013 @display
1014 XSAVE
1015         /OUTFILE='@var{file_name}'
1016         /@{UNCOMPRESSED,COMPRESSED,ZCOMPRESSED@}
1017         /PERMISSIONS=@{WRITEABLE,READONLY@}
1018         /DROP=@var{var_list}
1019         /KEEP=@var{var_list}
1020         /VERSION=@var{version}
1021         /RENAME=(@var{src_names}=@var{target_names})@dots{}
1022         /NAMES
1023         /MAP
1024 @end display
1025
1026 The @cmd{XSAVE} transformation writes the active dataset's dictionary and
1027 data to a system file.  It is similar to the @cmd{SAVE}
1028 procedure, with two differences:
1029
1030 @itemize
1031 @item
1032 @cmd{XSAVE} is a transformation, not a procedure.  It is executed when
1033 the data is read by a procedure or procedure-like command.
1034
1035 @item
1036 @cmd{XSAVE} does not support the @subcmd{UNSELECTED} subcommand.
1037 @end itemize
1038
1039 @xref{SAVE}, for more information.