1 @c PSPP - a program for statistical analysis.
2 @c Copyright (C) 2017, 2020 Free Software Foundation, Inc.
3 @c Permission is granted to copy, distribute and/or modify this document
4 @c under the terms of the GNU Free Documentation License, Version 1.3
5 @c or any later version published by the Free Software Foundation;
6 @c with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
7 @c A copy of the license is included in the section entitled "GNU
8 @c Free Documentation License".
10 @node System and Portable File IO
11 @chapter System and Portable File I/O
13 The commands in this chapter read, write, and examine system files and
17 * APPLY DICTIONARY:: Apply system file dictionary to active dataset.
18 * EXPORT:: Write to a portable file.
19 * GET:: Read from a system file.
20 * GET DATA:: Read from foreign files.
21 * IMPORT:: Read from a portable file.
22 * SAVE:: Write to a system file.
23 * SAVE DATA COLLECTION:: Write to a system file and metadata file.
24 * SAVE TRANSLATE:: Write data in foreign file formats.
25 * SYSFILE INFO:: Display system file dictionary.
26 * XEXPORT:: Write to a portable file, as a transformation.
27 * XSAVE:: Write to a system file, as a transformation.
30 @node APPLY DICTIONARY
31 @section APPLY DICTIONARY
32 @vindex APPLY DICTIONARY
35 APPLY DICTIONARY FROM=@{'@var{file_name}',@var{file_handle}@}.
38 @cmd{APPLY DICTIONARY} applies the variable labels, value labels,
39 and missing values taken from a file to corresponding
40 variables in the active dataset. In some cases it also updates the
43 The @subcmd{FROM} clause is mandatory. Use it to specify a system
44 file or portable file's name in single quotes, a data set name
45 (@pxref{Datasets}), or a file handle name (@pxref{File Handles}).
46 The dictionary in the file is be read, but it does not replace the active
47 dataset's dictionary. The file's data is not read.
49 Only variables with names that exist in both the active dataset and the
50 system file are considered. Variables with the same name but different
51 types (numeric, string) cause an error message. Otherwise, the
52 system file variables' attributes replace those in their matching
53 active dataset variables:
57 If a system file variable has a variable label, then it replaces
58 the variable label of the active dataset variable. If the system
59 file variable does not have a variable label, then the active dataset
60 variable's variable label, if any, is retained.
63 If the system file variable has custom attributes (@pxref{VARIABLE
64 ATTRIBUTE}), then those attributes replace the active dataset variable's
65 custom attributes. If the system file variable does not have custom
66 attributes, then the active dataset variable's custom attributes, if any,
70 If the active dataset variable is numeric or short string, then value
71 labels and missing values, if any, are copied to the active dataset
72 variable. If the system file variable does not have value labels or
73 missing values, then those in the active dataset variable, if any, are not
77 In addition to properties of variables, some properties of the active
78 file dictionary as a whole are updated:
82 If the system file has custom attributes (@pxref{DATAFILE ATTRIBUTE}),
83 then those attributes replace the active dataset variable's custom
87 If the active dataset has a weighting variable (@pxref{WEIGHT}), and the
88 system file does not, or if the weighting variable in the system file
89 does not exist in the active dataset, then the active dataset weighting
90 variable, if any, is retained. Otherwise, the weighting variable in
91 the system file becomes the active dataset weighting variable.
94 @cmd{APPLY DICTIONARY} takes effect immediately. It does not read the
95 active dataset. The system file is not modified.
103 /OUTFILE='@var{file_name}'
104 /UNSELECTED=@{RETAIN,DELETE@}
108 /RENAME=(@var{src_names}=@var{target_names})@dots{}
113 The @cmd{EXPORT} procedure writes the active dataset's dictionary and
114 data to a specified portable file.
116 By default, cases excluded with FILTER are written to the
117 file. These can be excluded by specifying DELETE on the @subcmd{UNSELECTED}
118 subcommand. Specifying RETAIN makes the default explicit.
120 Portable files express real numbers in base 30. Integers are always
121 expressed to the maximum precision needed to make them exact.
122 Non-integers are, by default, expressed to the machine's maximum
123 natural precision (approximately 15 decimal digits on many machines).
124 If many numbers require this many digits, the portable file may
125 significantly increase in size. As an alternative, the @subcmd{DIGITS}
126 subcommand may be used to specify the number of decimal digits of
127 precision to write. @subcmd{DIGITS} applies only to non-integers.
129 The @subcmd{OUTFILE} subcommand, which is the only required subcommand, specifies
130 the portable file to be written as a file name string or
131 a file handle (@pxref{File Handles}).
133 @subcmd{DROP}, @subcmd{KEEP}, and @subcmd{RENAME} follow the same format as the
134 @subcmd{SAVE} procedure (@pxref{SAVE}).
136 The @subcmd{TYPE} subcommand specifies the character set for use in the
137 portable file. Its value is currently not used.
139 The @subcmd{MAP} subcommand is currently ignored.
141 @cmd{EXPORT} is a procedure. It causes the active dataset to be read.
149 /FILE=@{'@var{file_name}',@var{file_handle}@}
152 /RENAME=(@var{src_names}=@var{target_names})@dots{}
153 /ENCODING='@var{encoding}'
156 @cmd{GET} clears the current dictionary and active dataset and
157 replaces them with the dictionary and data from a specified file.
159 The @subcmd{FILE} subcommand is the only required subcommand. Specify
160 the SPSS system file, SPSS/PC+ system file, or SPSS portable file to
161 be read as a string file name or a file handle (@pxref{File Handles}).
163 By default, all the variables in a file are read. The DROP
164 subcommand can be used to specify a list of variables that are not to be
165 read. By contrast, the @subcmd{KEEP} subcommand can be used to specify
166 variable that are to be read, with all other variables not read.
168 Normally variables in a file retain the names that they were
169 saved under. Use the @subcmd{RENAME} subcommand to change these names.
171 within parentheses, a list of variable names followed by an equals sign
172 (@samp{=}) and the names that they should be renamed to. Multiple
173 parenthesized groups of variable names can be included on a single
174 @subcmd{RENAME} subcommand.
175 Variables' names may be swapped using a @subcmd{RENAME}
176 subcommand of the form @subcmd{/RENAME=(@var{A} @var{B}=@var{B} @var{A})}.
178 Alternate syntax for the @subcmd{RENAME} subcommand allows the parentheses to be
179 eliminated. When this is done, only a single variable may be renamed at
180 once. For instance, @subcmd{/RENAME=@var{A}=@var{B}}. This alternate syntax is
183 @subcmd{DROP}, @subcmd{KEEP}, and @subcmd{RENAME} are executed in left-to-right order.
184 Each may be present any number of times. @cmd{GET} never modifies a
185 file on disk. Only the active dataset read from the file
186 is affected by these subcommands.
188 @pspp{} automatically detects the encoding of string data in the file,
189 when possible. The character encoding of old SPSS system files cannot
190 always be guessed correctly, and SPSS/PC+ system files do not include
191 any indication of their encoding. Specify the @subcmd{ENCODING}
192 subcommand with an @acronym{IANA} character set name as its string
193 argument to override the default. Use @cmd{SYSFILE INFO} to analyze
194 the encodings that might be valid for a system file. The
195 @subcmd{ENCODING} subcommand is a @pspp{} extension.
197 @cmd{GET} does not cause the data to be read, only the dictionary. The data
198 is read later, when a procedure is executed.
200 Use of @cmd{GET} to read a portable file is a @pspp{} extension.
208 /TYPE=@{GNM,ODS,PSQL,TXT@}
209 @dots{}additional subcommands depending on TYPE@dots{}
212 The @cmd{GET DATA} command is used to read files and other data
213 sources created by other applications. When this command is executed,
214 the current dictionary and active dataset are replaced with variables
215 and data read from the specified source.
217 The @subcmd{TYPE} subcommand is mandatory and must be the first subcommand
218 specified. It determines the type of the file or source to read.
219 @pspp{} currently supports the following file types:
223 Spreadsheet files created by Gnumeric (@url{http://gnumeric.org}).
226 Spreadsheet files in OpenDocument format (@url{http://opendocumentformat.org}).
229 Relations from PostgreSQL databases (@url{http://postgresql.org}).
232 Textual data files in columnar and delimited formats.
235 Each supported file type has additional subcommands, explained in
236 separate sections below.
239 * GET DATA /TYPE=GNM/ODS:: Spreadsheets
240 * GET DATA /TYPE=PSQL:: Databases
241 * GET DATA /TYPE=TXT:: Delimited Text Files
244 @node GET DATA /TYPE=GNM/ODS
245 @subsection Spreadsheet Files
248 GET DATA /TYPE=@{GNM, ODS@}
249 /FILE=@{'@var{file_name}'@}
250 /SHEET=@{NAME '@var{sheet_name}', INDEX @var{n}@}
251 /CELLRANGE=@{RANGE '@var{range}', FULL@}
252 /READNAMES=@{ON, OFF@}
253 /ASSUMEDSTRWIDTH=@var{n}.
258 @cindex spreadsheet files
260 Gnumeric spreadsheets (@url{http://gnumeric.org}), and spreadsheets
261 in OpenDocument format
262 (@url{http://libreplanet.org/wiki/Group:OpenDocument/Software})
263 can be read using the @cmd{GET DATA} command.
264 Use the @subcmd{TYPE} subcommand to indicate the file's format.
265 /TYPE=GNM indicates Gnumeric files,
266 /TYPE=ODS indicates OpenDocument.
267 The @subcmd{FILE} subcommand is mandatory.
268 Use it to specify the name file to be read.
269 All other subcommands are optional.
271 The format of each variable is determined by the format of the spreadsheet
272 cell containing the first datum for the variable.
273 If this cell is of string (text) format, then the width of the variable is
274 determined from the length of the string it contains, unless the
275 @subcmd{ASSUMEDSTRWIDTH} subcommand is given.
277 The @subcmd{SHEET} subcommand specifies the sheet within the spreadsheet file to read.
278 There are two forms of the @subcmd{SHEET} subcommand.
280 @subcmd{/SHEET=name @var{sheet_name}}, the string @var{sheet_name} is the
281 name of the sheet to read.
282 In the second form, @subcmd{/SHEET=index @var{idx}}, @var{idx} is a
283 integer which is the index of the sheet to read.
284 The first sheet has the index 1.
285 If the @subcmd{SHEET} subcommand is omitted, then the command reads the
286 first sheet in the file.
288 The @subcmd{CELLRANGE} subcommand specifies the range of cells within the sheet to read.
289 If the subcommand is given as @subcmd{/CELLRANGE=FULL}, then the entire
291 To read only part of a sheet, use the form
292 @subcmd{/CELLRANGE=range '@var{top_left_cell}:@var{bottom_right_cell}'}.
293 For example, the subcommand @subcmd{/CELLRANGE=range 'C3:P19'} reads
294 columns C--P, and rows 3--19 inclusive.
295 If no @subcmd{CELLRANGE} subcommand is given, then the entire sheet is read.
297 If @subcmd{/READNAMES=ON} is specified, then the contents of cells of
298 the first row are used as the names of the variables in which to store
299 the data from subsequent rows. This is the default.
300 If @subcmd{/READNAMES=OFF} is
301 used, then the variables receive automatically assigned names.
303 The @subcmd{ASSUMEDSTRWIDTH} subcommand specifies the maximum width of string
304 variables read from the file.
305 If omitted, the default value is determined from the length of the
306 string in the first spreadsheet cell for each variable.
309 @node GET DATA /TYPE=PSQL
310 @subsection Postgres Database Queries
314 /CONNECT=@{@var{connection info}@}
316 [/ASSUMEDSTRWIDTH=@var{w}]
324 The PSQL type is used to import data from a postgres database server.
325 The server may be located locally or remotely.
326 Variables are automatically created based on the table column names
327 or the names specified in the SQL query.
328 Postgres data types of high precision, loose precision when
329 imported into @pspp{}.
330 Not all the postgres data types are able to be represented in @pspp{}.
331 If a datum cannot be represented then @cmd{GET DATA} issues a warning
332 and that datum is set to SYSMIS.
334 The @subcmd{CONNECT} subcommand is mandatory.
335 It is a string specifying the parameters of the database server from
336 which the data should be fetched.
337 The format of the string is given in the postgres manual
338 @url{http://www.postgresql.org/docs/8.0/static/libpq.html#LIBPQ-CONNECT}.
340 The @subcmd{SQL} subcommand is mandatory.
341 It must be a valid SQL string to retrieve data from the database.
343 The @subcmd{ASSUMEDSTRWIDTH} subcommand specifies the maximum width of string
344 variables read from the database.
345 If omitted, the default value is determined from the length of the
346 string in the first value read for each variable.
348 The @subcmd{UNENCRYPTED} subcommand allows data to be retrieved over an insecure
350 If the connection is not encrypted, and the @subcmd{UNENCRYPTED} subcommand is
351 not given, then an error occurs.
352 Whether or not the connection is
353 encrypted depends upon the underlying psql library and the
354 capabilities of the database server.
356 The @subcmd{BSIZE} subcommand serves only to optimise the speed of data transfer.
357 It specifies an upper limit on
358 number of cases to fetch from the database at once.
359 The default value is 4096.
360 If your SQL statement fetches a large number of cases but only a small number of
361 variables, then the data transfer may be faster if you increase this value.
362 Conversely, if the number of variables is large, or if the machine on which
363 @pspp{} is running has only a
364 small amount of memory, then a smaller value is probably better.
367 The following syntax is an example:
370 /CONNECT='host=example.com port=5432 dbname=product user=fred passwd=xxxx'
371 /SQL='select * from manufacturer'.
375 @node GET DATA /TYPE=TXT
376 @subsection Textual Data Files
380 /FILE=@{'@var{file_name}',@var{file_handle}@}
381 [ENCODING='@var{encoding}']
382 [/ARRANGEMENT=@{DELIMITED,FIXED@}]
383 [/FIRSTCASE=@{@var{first_case}@}]
385 @dots{}additional subcommands depending on ARRANGEMENT@dots{}
390 When TYPE=TXT is specified, GET DATA reads data in a delimited or
391 fixed columnar format, much like DATA LIST (@pxref{DATA LIST}).
393 The @subcmd{FILE} subcommand is mandatory. Specify the file to be read as
394 a string file name or (for textual data only) a
395 file handle (@pxref{File Handles}).
397 The @subcmd{ENCODING} subcommand specifies the character encoding of
398 the file to be read. @xref{INSERT}, for information on supported
401 The @subcmd{ARRANGEMENT} subcommand determines the file's basic format.
402 DELIMITED, the default setting, specifies that fields in the input
403 data are separated by spaces, tabs, or other user-specified
404 delimiters. FIXED specifies that fields in the input data appear at
405 particular fixed column positions within records of a case.
407 By default, cases are read from the input file starting from the first
408 line. To skip lines at the beginning of an input file, set @subcmd{FIRSTCASE}
409 to the number of the first line to read: 2 to skip the first line, 3
410 to skip the first two lines, and so on.
412 @subcmd{IMPORTCASES} is ignored, for compatibility. Use @cmd{N OF
413 CASES} to limit the number of cases read from a file (@pxref{N OF
414 CASES}), or @cmd{SAMPLE} to obtain a random sample of cases
417 The remaining subcommands apply only to one of the two file
418 arrangements, described below.
421 * GET DATA /TYPE=TXT /ARRANGEMENT=DELIMITED::
422 * GET DATA /TYPE=TXT /ARRANGEMENT=FIXED::
425 @node GET DATA /TYPE=TXT /ARRANGEMENT=DELIMITED
426 @subsubsection Reading Delimited Data
430 /FILE=@{'@var{file_name}',@var{file_handle}@}
431 [/ARRANGEMENT=@{DELIMITED,FIXED@}]
432 [/FIRSTCASE=@{@var{first_case}@}]
433 [/IMPORTCASE=@{ALL,FIRST @var{max_cases},PERCENT @var{percent}@}]
435 /DELIMITERS="@var{delimiters}"
436 [/QUALIFIER="@var{quotes}"
437 [/DELCASE=@{LINE,VARIABLES @var{n_variables}@}]
438 /VARIABLES=@var{del_var1} [@var{del_var2}]@dots{}
439 where each @var{del_var} takes the form:
443 The GET DATA command with TYPE=TXT and ARRANGEMENT=DELIMITED reads
444 input data from text files in delimited format, where fields are
445 separated by a set of user-specified delimiters. Its capabilities are
446 similar to those of DATA LIST FREE (@pxref{DATA LIST FREE}), with a
449 The required @subcmd{FILE} subcommand and optional @subcmd{FIRSTCASE} and @subcmd{IMPORTCASE}
450 subcommands are described above (@pxref{GET DATA /TYPE=TXT}).
452 @subcmd{DELIMITERS}, which is required, specifies the set of characters that
453 may separate fields. Each character in the string specified on
454 @subcmd{DELIMITERS} separates one field from the next. The end of a line also
455 separates fields, regardless of @subcmd{DELIMITERS}. Two consecutive
456 delimiters in the input yield an empty field, as does a delimiter at
457 the end of a line. A space character as a delimiter is an exception:
458 consecutive spaces do not yield an empty field and neither does any
459 number of spaces at the end of a line.
461 To use a tab as a delimiter, specify @samp{\t} at the beginning of the
462 @subcmd{DELIMITERS} string. To use a backslash as a delimiter, specify
463 @samp{\\} as the first delimiter or, if a tab should also be a
464 delimiter, immediately following @samp{\t}. To read a data file in
465 which each field appears on a separate line, specify the empty string
466 for @subcmd{DELIMITERS}.
468 The optional @subcmd{QUALIFIER} subcommand names one or more characters that
469 can be used to quote values within fields in the input. A field that
470 begins with one of the specified quote characters ends at the next
471 matching quote. Intervening delimiters become part of the field,
472 instead of terminating it. The ability to specify more than one quote
473 character is a @pspp{} extension.
475 The character specified on @subcmd{QUALIFIER} can be embedded within a
476 field that it quotes by doubling the qualifier. For example, if
477 @samp{'} is specified on @subcmd{QUALIFIER}, then @code{'a''b'}
478 specifies a field that contains @samp{a'b}.
480 The @subcmd{DELCASE} subcommand controls how data may be broken across lines in
481 the data file. With LINE, the default setting, each line must contain
482 all the data for exactly one case. For additional flexibility, to
483 allow a single case to be split among lines or multiple cases to be
484 contained on a single line, specify VARIABLES @i{n_variables}, where
485 @i{n_variables} is the number of variables per case.
487 The @subcmd{VARIABLES} subcommand is required and must be the last subcommand.
488 Specify the name of each variable and its input format (@pxref{Input
489 and Output Formats}) in the order they should be read from the input
492 @subsubheading Examples
495 On a Unix-like system, the @samp{/etc/passwd} file has a format
499 root:$1$nyeSP5gD$pDq/:0:0:,,,:/root:/bin/bash
500 blp:$1$BrP/pFg4$g7OG:1000:1000:Ben Pfaff,,,:/home/blp:/bin/bash
501 john:$1$JBuq/Fioq$g4A:1001:1001:John Darrington,,,:/home/john:/bin/bash
502 jhs:$1$D3li4hPL$88X1:1002:1002:Jason Stover,,,:/home/jhs:/bin/csh
506 The following syntax reads a file in the format used by
509 @c If you change this example, change the regression test in
510 @c tests/language/data-io/get-data.at to match.
512 GET DATA /TYPE=TXT /FILE='/etc/passwd' /DELIMITERS=':'
513 /VARIABLES=username A20
523 Consider the following data on used cars:
526 model year mileage price type age
527 Civic 2002 29883 15900 Si 2
528 Civic 2003 13415 15900 EX 1
529 Civic 1992 107000 3800 n/a 12
530 Accord 2002 26613 17900 EX 1
534 The following syntax can be used to read the used car data:
536 @c If you change this example, change the regression test in
537 @c tests/language/data-io/get-data.at to match.
539 GET DATA /TYPE=TXT /FILE='cars.data' /DELIMITERS=' ' /FIRSTCASE=2
549 Consider the following information on animals in a pet store:
552 'Pet''s Name', "Age", "Color", "Date Received", "Price", "Height", "Type"
553 , (Years), , , (Dollars), ,
554 "Rover", 4.5, Brown, "12 Feb 2004", 80, '1''4"', "Dog"
555 "Charlie", , Gold, "5 Apr 2007", 12.3, "3""", "Fish"
556 "Molly", 2, Black, "12 Dec 2006", 25, '5"', "Cat"
557 "Gilly", , White, "10 Apr 2007", 10, "3""", "Guinea Pig"
561 The following syntax can be used to read the pet store data:
563 @c If you change this example, change the regression test in
564 @c tests/language/data-io/get-data.at to match.
566 GET DATA /TYPE=TXT /FILE='pets.data' /DELIMITERS=', ' /QUALIFIER='''"' /ESCAPE
577 @node GET DATA /TYPE=TXT /ARRANGEMENT=FIXED
578 @subsubsection Reading Fixed Columnar Data
580 @c (modify-syntax-entry ?_ "w")
581 @c (modify-syntax-entry ?' "'")
582 @c (modify-syntax-entry ?@ "'")
586 /FILE=@{'file_name',@var{file_handle}@}
587 [/ARRANGEMENT=@{DELIMITED,FIXED@}]
588 [/FIRSTCASE=@{@var{first_case}@}]
589 [/IMPORTCASE=@{ALL,FIRST @var{max_cases},PERCENT @var{percent}@}]
592 /VARIABLES @var{fixed_var} [@var{fixed_var}]@dots{}
593 [/rec# @var{fixed_var} [@var{fixed_var}]@dots{}]@dots{}
594 where each @var{fixed_var} takes the form:
595 @var{variable} @var{start}-@var{end} @var{format}
598 The @cmd{GET DATA} command with TYPE=TXT and ARRANGEMENT=FIXED reads input
599 data from text files in fixed format, where each field is located in
600 particular fixed column positions within records of a case. Its
601 capabilities are similar to those of DATA LIST FIXED (@pxref{DATA LIST
602 FIXED}), with a few enhancements.
604 The required @subcmd{FILE} subcommand and optional @subcmd{FIRSTCASE} and @subcmd{IMPORTCASE}
605 subcommands are described above (@pxref{GET DATA /TYPE=TXT}).
607 The optional @subcmd{FIXCASE} subcommand may be used to specify the positive
608 integer number of input lines that make up each case. The default
611 The @subcmd{VARIABLES} subcommand, which is required, specifies the positions
612 at which each variable can be found. For each variable, specify its
613 name, followed by its start and end column separated by @samp{-}
614 (@i{e.g.}@: @samp{0-9}), followed by an input format type (@i{e.g.}@:
615 @samp{F}) or a full format specification (@i{e.g.}@: @samp{DOLLAR12.2}).
616 For this command, columns are numbered starting from 0 at
617 the left column. Introduce the variables in the second and later
618 lines of a case by a slash followed by the number of the line within
619 the case, @i{e.g.}@: @samp{/2} for the second line.
621 @subsubheading Examples
624 Consider the following data on used cars:
627 model year mileage price type age
628 Civic 2002 29883 15900 Si 2
629 Civic 2003 13415 15900 EX 1
630 Civic 1992 107000 3800 n/a 12
631 Accord 2002 26613 17900 EX 1
635 The following syntax can be used to read the used car data:
637 @c If you change this example, change the regression test in
638 @c tests/language/data-io/get-data.at to match.
640 GET DATA /TYPE=TXT /FILE='cars.data' /ARRANGEMENT=FIXED /FIRSTCASE=2
641 /VARIABLES=model 0-7 A
655 /FILE='@var{file_name}'
659 /RENAME=(@var{src_names}=@var{target_names})@dots{}
662 The @cmd{IMPORT} transformation clears the active dataset dictionary and
664 replaces them with a dictionary and data from a system file or
667 The @subcmd{FILE} subcommand, which is the only required subcommand, specifies
668 the portable file to be read as a file name string or a file handle
669 (@pxref{File Handles}).
671 The @subcmd{TYPE} subcommand is currently not used.
673 @subcmd{DROP}, @subcmd{KEEP}, and @subcmd{RENAME} follow the syntax used by @cmd{GET} (@pxref{GET}).
675 @cmd{IMPORT} does not cause the data to be read; only the dictionary. The
676 data is read later, when a procedure is executed.
678 Use of @cmd{IMPORT} to read a system file is a @pspp{} extension.
686 /OUTFILE=@{'@var{file_name}',@var{file_handle}@}
687 /UNSELECTED=@{RETAIN,DELETE@}
688 /@{UNCOMPRESSED,COMPRESSED,ZCOMPRESSED@}
689 /PERMISSIONS=@{WRITEABLE,READONLY@}
692 /VERSION=@var{version}
693 /RENAME=(@var{src_names}=@var{target_names})@dots{}
698 The @cmd{SAVE} procedure causes the dictionary and data in the active
700 be written to a system file.
702 OUTFILE is the only required subcommand. Specify the system file
703 to be written as a string file name or a file handle
704 (@pxref{File Handles}).
706 By default, cases excluded with FILTER are written to the system file.
707 These can be excluded by specifying @subcmd{DELETE} on the @subcmd{UNSELECTED}
708 subcommand. Specifying @subcmd{RETAIN} makes the default explicit.
710 The @subcmd{UNCOMPRESSED}, @subcmd{COMPRESSED}, and
711 @subcmd{ZCOMPRESSED} subcommand determine the system file's
716 Data is not compressed. Each numeric value uses 8 bytes of disk
717 space. Each string value uses one byte per column width, rounded up
718 to a multiple of 8 bytes.
721 Data is compressed with a simple algorithm. Each integer numeric
722 value between @minus{}99 and 151, inclusive, or system missing value
723 uses one byte of disk space. Each 8-byte segment of a string that
724 consists only of spaces uses 1 byte. Any other numeric value or
725 8-byte string segment uses 9 bytes of disk space.
728 Data is compressed with the ``deflate'' compression algorithm
729 specified in RFC@tie{}1951 (the same algorithm used by
730 @command{gzip}). Files written with this compression level cannot be
731 read by PSPP 0.8.1 or earlier or by SPSS 20 or earlier.
734 @subcmd{COMPRESSED} is the default compression level. The SET command
735 (@pxref{SET}) can change this default.
737 The @subcmd{PERMISSIONS} subcommand specifies permissions for the new system
738 file. WRITEABLE, the default, creates the file with read and write
739 permission. READONLY creates the file for read-only access.
741 By default, all the variables in the active dataset dictionary are written
742 to the system file. The @subcmd{DROP} subcommand can be used to specify a list
743 of variables not to be written. In contrast, KEEP specifies variables
744 to be written, with all variables not specified not written.
746 Normally variables are saved to a system file under the same names they
747 have in the active dataset. Use the @subcmd{RENAME} subcommand to change these names.
748 Specify, within parentheses, a list of variable names followed by an
749 equals sign (@samp{=}) and the names that they should be renamed to.
750 Multiple parenthesized groups of variable names can be included on a
751 single @subcmd{RENAME} subcommand. Variables' names may be swapped using a
752 @subcmd{RENAME} subcommand of the
753 form @subcmd{/RENAME=(@var{A} @var{B}=@var{B} @var{A})}.
755 Alternate syntax for the @subcmd{RENAME} subcommand allows the parentheses to be
756 eliminated. When this is done, only a single variable may be renamed at
757 once. For instance, @subcmd{/RENAME=@var{A}=@var{B}}. This alternate syntax is
760 @subcmd{DROP}, @subcmd{KEEP}, and @subcmd{RENAME} are performed in
761 left-to-right order. They
762 each may be present any number of times. @cmd{SAVE} never modifies
763 the active dataset. @subcmd{DROP}, @subcmd{KEEP}, and @subcmd{RENAME} only
764 affect the system file written to disk.
766 The @subcmd{VERSION} subcommand specifies the version of the file format. Valid
767 versions are 2 and 3. The default version is 3. In version 2 system
768 files, variable names longer than 8 bytes are truncated. The two
769 versions are otherwise identical.
771 The @subcmd{NAMES} and @subcmd{MAP} subcommands are currently ignored.
773 @cmd{SAVE} causes the data to be read. It is a procedure.
775 @node SAVE DATA COLLECTION
776 @section SAVE DATA COLLECTION
777 @vindex SAVE DATA COLLECTION
781 /OUTFILE=@{'@var{file_name}',@var{file_handle}@}
782 /METADATA=@{'@var{file_name}',@var{file_handle}@}
783 /@{UNCOMPRESSED,COMPRESSED,ZCOMPRESSED@}
784 /PERMISSIONS=@{WRITEABLE,READONLY@}
787 /VERSION=@var{version}
788 /RENAME=(@var{src_names}=@var{target_names})@dots{}
793 Like @cmd{SAVE}, @cmd{SAVE DATA COLLECTION} writes the dictionary and
794 data in the active dataset to a system file. In addition, it writes
795 metadata to an additional XML metadata file.
797 OUTFILE is required. Specify the system file to be written as a
798 string file name or a file handle (@pxref{File Handles}).
800 METADATA is also required. Specify the metadata file to be written as
801 a string file name or a file handle. Metadata files customarily use a
802 @file{.mdd} extension.
804 The current implementation of this command is experimental. It only
805 outputs an approximation of the metadata file format. Please report
808 Other subcommands are optional. They have the same meanings as in the
811 @cmd{SAVE DATA COLLECTION} causes the data to be read. It is a
815 @section SAVE TRANSLATE
816 @vindex SAVE TRANSLATE
820 /OUTFILE=@{'@var{file_name}',@var{file_handle}@}
823 [/MISSING=@{IGNORE,RECODE@}]
825 [/DROP=@var{var_list}]
826 [/KEEP=@var{var_list}]
827 [/RENAME=(@var{src_names}=@var{target_names})@dots{}]
828 [/UNSELECTED=@{RETAIN,DELETE@}]
831 @dots{}additional subcommands depending on TYPE@dots{}
834 The @cmd{SAVE TRANSLATE} command is used to save data into various
835 formats understood by other applications.
837 The @subcmd{OUTFILE} and @subcmd{TYPE} subcommands are mandatory.
838 @subcmd{OUTFILE} specifies the file to be written, as a string file name or a file handle
839 (@pxref{File Handles}). @subcmd{TYPE} determines the type of the file or
840 source to read. It must be one of the following:
844 Comma-separated value format,
847 Tab-delimited format.
850 By default, @cmd{SAVE TRANSLATE} does not overwrite an existing file. Use
851 @subcmd{REPLACE} to force an existing file to be overwritten.
853 With MISSING=IGNORE, the default, @subcmd{SAVE TRANSLATE} treats user-missing
854 values as if they were not missing. Specify MISSING=RECODE to output
855 numeric user-missing values like system-missing values and string
856 user-missing values as all spaces.
858 By default, all the variables in the active dataset dictionary are
859 saved to the system file, but @subcmd{DROP} or @subcmd{KEEP} can
860 select a subset of variable to save. The @subcmd{RENAME} subcommand
861 can also be used to change the names under which variables are saved;
862 because they are used only in the output, these names do not have to
863 conform to the usual PSPP variable naming rules. @subcmd{UNSELECTED}
864 determines whether cases filtered out by the @cmd{FILTER} command are
865 written to the output file. These subcommands have the same syntax
866 and meaning as on the @cmd{SAVE} command (@pxref{SAVE}).
868 Each supported file type has additional subcommands, explained in
869 separate sections below.
871 @cmd{SAVE TRANSLATE} causes the data to be read. It is a procedure.
874 * SAVE TRANSLATE /TYPE=CSV and TYPE=TAB::
877 @node SAVE TRANSLATE /TYPE=CSV and TYPE=TAB
878 @subsection Writing Comma- and Tab-Separated Data Files
882 /OUTFILE=@{'@var{file_name}',@var{file_handle}@}
885 [/MISSING=@{IGNORE,RECODE@}]
887 [/DROP=@var{var_list}]
888 [/KEEP=@var{var_list}]
889 [/RENAME=(@var{src_names}=@var{target_names})@dots{}]
890 [/UNSELECTED=@{RETAIN,DELETE@}]
893 [/CELLS=@{VALUES,LABELS@}]
894 [/TEXTOPTIONS DELIMITER='@var{delimiter}']
895 [/TEXTOPTIONS QUALIFIER='@var{qualifier}']
896 [/TEXTOPTIONS DECIMAL=@{DOT,COMMA@}]
897 [/TEXTOPTIONS FORMAT=@{PLAIN,VARIABLE@}]
900 The SAVE TRANSLATE command with TYPE=CSV or TYPE=TAB writes data in a
901 comma- or tab-separated value format similar to that described by
902 RFC@tie{}4180. Each variable becomes one output column, and each case
903 becomes one line of output. If FIELDNAMES is specified, an additional
904 line at the top of the output file lists variable names.
906 The CELLS and TEXTOPTIONS FORMAT settings determine how values are
907 written to the output file:
910 @item CELLS=VALUES FORMAT=PLAIN (the default settings)
911 Writes variables to the output in ``plain'' formats that ignore the
912 details of variable formats. Numeric values are written as plain
913 decimal numbers with enough digits to indicate their exact values in
914 machine representation. Numeric values include @samp{e} followed by
915 an exponent if the exponent value would be less than -4 or greater
916 than 16. Dates are written in MM/DD/YYYY format and times in HH:MM:SS
917 format. WKDAY and MONTH values are written as decimal numbers.
919 Numeric values use, by default, the decimal point character set with
920 SET DECIMAL (@pxref{SET DECIMAL}). Use DECIMAL=DOT or DECIMAL=COMMA
921 to force a particular decimal point character.
923 @item CELLS=VALUES FORMAT=VARIABLE
924 Writes variables using their print formats. Leading and trailing
925 spaces are removed from numeric values, and trailing spaces are
926 removed from string values.
928 @item CELLS=LABEL FORMAT=PLAIN
929 @itemx CELLS=LABEL FORMAT=VARIABLE
930 Writes value labels where they exist, and otherwise writes the values
931 themselves as described above.
934 Regardless of CELLS and TEXTOPTIONS FORMAT, numeric system-missing
935 values are output as a single space.
937 For TYPE=TAB, tab characters delimit values. For TYPE=CSV, the
938 TEXTOPTIONS DELIMITER and DECIMAL settings determine the character
939 that separate values within a line. If DELIMITER is specified, then
940 the specified string separate values. If DELIMITER is not specified,
941 then the default is a comma with DECIMAL=DOT or a semicolon with
942 DECIMAL=COMMA. If DECIMAL is not given either, it is implied by the
943 decimal point character set with SET DECIMAL (@pxref{SET DECIMAL}).
945 The TEXTOPTIONS QUALIFIER setting specifies a character that is output
946 before and after a value that contains the delimiter character or the
947 qualifier character. The default is a double quote (@samp{"}). A
948 qualifier character that appears within a value is doubled.
951 @section SYSFILE INFO
955 SYSFILE INFO FILE='@var{file_name}' [ENCODING='@var{encoding}'].
958 @cmd{SYSFILE INFO} reads the dictionary in an SPSS system file,
959 SPSS/PC+ system file, or SPSS portable file, and displays the
960 information in its dictionary.
962 Specify a file name or file handle. @cmd{SYSFILE INFO} reads that
963 file and displays information on its dictionary.
965 @pspp{} automatically detects the encoding of string data in the file,
966 when possible. The character encoding of old SPSS system files cannot
967 always be guessed correctly, and SPSS/PC+ system files do not include
968 any indication of their encoding. Specify the @subcmd{ENCODING}
969 subcommand with an @acronym{IANA} character set name as its string
970 argument to override the default, or specify @code{ENCODING='DETECT'}
971 to analyze and report possibly valid encodings for the system file.
972 The @subcmd{ENCODING} subcommand is a @pspp{} extension.
974 @cmd{SYSFILE INFO} does not affect the current active dataset.
982 /OUTFILE='@var{file_name}'
986 /RENAME=(@var{src_names}=@var{target_names})@dots{}
991 The @cmd{XEXPORT} transformation writes the active dataset dictionary and
992 data to a specified portable file.
994 This transformation is a @pspp{} extension.
996 It is similar to the @cmd{EXPORT} procedure, with two differences:
1000 @cmd{XEXPORT} is a transformation, not a procedure. It is executed when
1001 the data is read by a procedure or procedure-like command.
1004 @cmd{XEXPORT} does not support the @subcmd{UNSELECTED} subcommand.
1007 @xref{EXPORT}, for more information.
1015 /OUTFILE='@var{file_name}'
1016 /@{UNCOMPRESSED,COMPRESSED,ZCOMPRESSED@}
1017 /PERMISSIONS=@{WRITEABLE,READONLY@}
1018 /DROP=@var{var_list}
1019 /KEEP=@var{var_list}
1020 /VERSION=@var{version}
1021 /RENAME=(@var{src_names}=@var{target_names})@dots{}
1026 The @cmd{XSAVE} transformation writes the active dataset's dictionary and
1027 data to a system file. It is similar to the @cmd{SAVE}
1028 procedure, with two differences:
1032 @cmd{XSAVE} is a transformation, not a procedure. It is executed when
1033 the data is read by a procedure or procedure-like command.
1036 @cmd{XSAVE} does not support the @subcmd{UNSELECTED} subcommand.
1039 @xref{SAVE}, for more information.