* automake.mk: Add new tests.
* command/get-data-txt.sh: New test.
* command/get-data-txt-examples.sh: New test.
* command/get-data-txt-importcases.sh: New test.
* data-parser.c: New file.
* data-parser.h: New file.
* data-list.c (struct dls_var_spec): Removed.
(ll_to_dls_var_spec): Removed.
(enum dls_type): Removed.
(struct data_list_pgm): Rename struct data_list_trns. Remove
pool, specs, type, record_cnt, delims, skip_records, value_cnt
members. Add new `parser' member.
(cmd_data_list): Use data-parser infrastructure.
(parse_fixed): Ditto.
(parse_free): Ditto.
(dump_fixed_table): Removed.
(dump_free_table): Removed.
(cut_field): Removed.
(read_from_data_list): Removed.
(read_from_data_list_fixed): Removed.
(read_from_data_list_free): Removed.
(read_from_data_list_list): Removed.
(data_list_trns_free): Rename arguments for clarity.
(data_list_trns_proc): Ditto.
(data_list_casereader_read): Removed.
(data_list_casereader_destroy): Removed.
(data_list_casereader_class): Removed.
* get-data.c (cmd_get_data): Support TXT type.
(set_type): New function.
(parse_get_txt): New function.
Each form of @cmd{DATA LIST} is described in detail below.
+@xref{GET DATA}, for a command that offers a few enhancements over
+DATA LIST and that may be substituted for DATA LIST in many
+situations.
+
@menu
* DATA LIST FIXED:: Fixed columnar locations for data.
* DATA LIST FREE:: Any spacing you like.
@vindex GET DATA
@display
-GET DATA /TYPE=gnm
- /FILE=@{'file-name'@}
-
- /SHEET=@{NAME 'sheet-name', INDEX n@}
- /CELLRANGE=@{RANGE 'range', FULL@}
- /READNAMES=@{ON, OFF@}
- /ASSUMEDVARWIDTH=n.
+GET DATA
+ /TYPE=@{GNM,TXT@}
+ /FILE=@{'file-name',file_handle@}
+ @dots{}additional subcommands depending on TYPE@dots{}
@end display
The @cmd{GET DATA} command is used to read files and other data sources
created by other applications.
When this command is executed, the current dictionary and active file are
replaced with variables and data read from the specified source.
-The TYPE subcommand is mandatory and determines the type of the file or source to read.
-Currently @samp{gnm} is the only supported type.
+
+The TYPE subcommand is mandatory and must be the first subcommand
+specified. It determines the type of the file or source to read.
+PSPP currently supports the following file types:
+
+@table @asis
+@item GNM
+Spreadsheet files created by Gnumeric (@url{http://gnumeric.org}).
+
+@item TXT
+Textual data files in columnar and delimited formats.
+@end table
+
+The FILE subcommand is mandatory for all implemented file types.
+Specify the file to be read as a string file name or (for textual data
+only) a file handle (@pxref{File Handles}).
+
+Each supported file type has additional subcommands, explained in
+separate sections below.
+
+@menu
+* GET DATA /TYPE=GNM::
+* GET DATA /TYPE=TXT::
+@end menu
+
+@node GET DATA /TYPE=GNM
+@subsection Gnumeric Spreadsheet Files
+
+@display
+GET DATA /TYPE=GNM
+ /FILE=@{'file-name'@}
+ /SHEET=@{NAME 'sheet-name', INDEX n@}
+ /CELLRANGE=@{RANGE 'range', FULL@}
+ /READNAMES=@{ON, OFF@}
+ /ASSUMEDVARWIDTH=n.
+@end display
@cindex Gnumeric
@cindex spreadsheet files
-The @samp{gnm} type is used to read spreadsheet files created by
-Gnumeric (@url{http://gnumeric.org}).
-With this type, the FILE subcommand must be used, to specify the
-spreadsheet file to read.
-All other subcommands are optional.
+To use GET DATA to read a spreadsheet file created by Gnumeric
+(@url{http://gnumeric.org}), specify TYPE=GNM to indicate the file's
+format and use FILE to indicate the Gnumeric file to be read. All
+other subcommands are optional.
+
The format of each variable is determined by the format of the spreadsheet
cell containing the first datum for the variable.
If this cell is of string (text) format, then the width of the variable is
If omitted, the default value is determined from the length of the
string in the first spreadsheet cell for each variable.
+@node GET DATA /TYPE=TXT
+@subsection Textual Data Files
+
+@display
+GET DATA /TYPE=TXT
+ /FILE=@{'file-name',file_handle@}
+ [/ARRANGEMENT=@{DELIMITED,FIXED@}]
+ [/FIRSTCASE=@{first_case@}]
+ [/IMPORTCASE=@{ALL,FIRST max_cases,PERCENT percent@}]
+ @dots{}additional subcommands depending on ARRANGEMENT@dots{}
+@end display
+
+@cindex text files
+@cindex data files
+When TYPE=TXT is specified, GET DATA reads data in a delimited or
+fixed columnar format, much like DATA LIST (@pxref{DATA LIST}). The
+FILE subcommand must be specified indicate the name or the file handle
+of the file to be read.
+
+The ARRANGEMENT subcommand determines the file's basic format.
+DELIMITED, the default setting, specifies that fields in the input
+data are separated by spaces, tabs, or other user-specified
+delimiters. FIXED specifies that fields in the input data appear at
+particular fixed column positions within records of a case.
+
+By default, cases are read from the input file starting from the first
+line. To skip lines at the beginning of an input file, set FIRSTCASE
+to the number of the first line to read: 2 to skip the first line, 3
+to skip the first two lines, and so on.
+
+IMPORTCASE can be used to limit the number of cases read from the
+input file. With the default setting, ALL, all cases in the file are
+read. Specify FIRST @i{max_cases} to read at most @i{max_cases} cases
+from the file. Use PERCENT @i{percent} to read only @i{percent}
+percent, approximately, of the cases contained in the file. (The
+percentage is approximate, because there is no way to accurately count
+the number of cases in the file without reading the entire file. The
+number of cases in some kinds of unusual files cannot be estimated;
+PSPP will read all cases in such files.)
+
+FIRSTCASE and IMPORTCASE may be used with delimited and fixed-format
+data. The remaining subcommands, which apply only to one of the two file
+arrangements, are described below.
+
+@menu
+* GET DATA /TYPE=TXT /ARRANGEMENT=DELIMITED::
+* GET DATA /TYPE=TXT /ARRANGEMENT=FIXED::
+@end menu
+
+@node GET DATA /TYPE=TXT /ARRANGEMENT=DELIMITED
+@subsubsection Reading Delimited Data
+
+@display
+GET DATA /TYPE=TXT
+ /FILE=@{'file-name',file_handle@}
+ [/ARRANGEMENT=@{DELIMITED,FIXED@}]
+ [/FIRSTCASE=@{first_case@}]
+ [/IMPORTCASE=@{ALL,FIRST max_cases,PERCENT percent@}]
+
+ /DELIMITERS="delimiters"
+ [/QUALIFIER="quote"]
+ [/DELCASE=@{LINE,VARIABLES n_variables@}]
+ /VARIABLES=del_var [del_var]@dots{}
+where each del_var takes the form:
+ variable format
+@end display
+
+The GET DATA command with TYPE=TXT and ARRANGEMENT=DELIMITED reads
+input data from text files in delimited format, where fields are
+separated by a set of user-specified delimiters. Its capabilities are
+similar to those of DATA LIST FREE (@pxref{DATA LIST FREE}), with a
+few enhancements.
+
+The required FILE subcommand and optional FIRSTCASE and IMPORTCASE
+subcommands are described above (@pxref{GET DATA /TYPE=TXT}).
+
+DELIMITERS, which is required, specifies the set of characters that
+may separate fields. Each character in the string specified on
+DELIMITERS separates one field from the next. The end of a line also
+separates fields, regardless of DELIMITERS. Two consecutive
+delimiters in the input yield an empty field, as does a delimiter at
+the end of a line. A space character as a delimiter is an exception:
+consecutive spaces do not yield an empty field and neither does any
+number of spaces at the end of a line.
+
+To use a tab as a delimiter, specify @samp{\t} at the beginning of the
+DELIMITERS string. To use a backslash as a delimiter, specify
+@samp{\\} as the first delimiter or, if a tab should also be a
+delimiter, immediately following @samp{\t}. To read a data file in
+which each field appears on a separate line, specify the empty string
+for DELIMITERS.
+
+The optional QUALIFIER subcommand names a character that can be used
+to quote values within fields in the input. A field that begins with
+the specified quote character ends at the next match quote.
+Intervening delimiters become part of the field, instead of
+terminating it.
+
+The DELCASE subcommand controls how data may be broken across lines in
+the data file. With LINE, the default setting, each line must contain
+all the data for exactly one case. For additional flexibility, to
+allow a single case to be split among lines or multiple cases to be
+contained on a single line, specify VARIABLES @i{n_variables}, where
+@i{n_variables} is the number of variables per case.
+
+The VARIABLES subcommand is required and must be the last subcommand.
+Specify the name of each variable and its input format (@pxref{Input
+and Output Formats}) in the order they should be read from the input
+file.
+
+@subsubheading Examples
+
+@noindent
+On a Unix-like system, the @samp{/etc/passwd} file has a format
+similar to this:
+
+@example
+root:$1$nyeSP5gD$pDq/:0:0:,,,:/root:/bin/bash
+blp:$1$BrP/pFg4$g7OG:1000:1000:Ben Pfaff,,,:/home/blp:/bin/bash
+john:$1$JBuq/Fioq$g4A:1001:1001:John Darrington,,,:/home/john:/bin/bash
+jhs:$1$D3li4hPL$88X1:1002:1002:Jason Stover,,,:/home/jhs:/bin/csh
+@end example
+
+@noindent
+The following syntax reads a file in the format used by
+@samp{/etc/passwd}:
+
+@c If you change this example, change the regression test in
+@c tests/command/get-data-txt-examples.sh to match.
+@example
+GET DATA /TYPE=TXT /FILE='/etc/passwd' /DELIMITERS=':'
+ /VARIABLES=username A20
+ password A40
+ uid F10
+ gid F10
+ gecos A40
+ home A40
+ shell A40.
+@end example
+
+@noindent
+Consider the following data on used cars:
+
+@example
+model year mileage price type age
+Civic 2002 29883 15900 Si 2
+Civic 2003 13415 15900 EX 1
+Civic 1992 107000 3800 n/a 12
+Accord 2002 26613 17900 EX 1
+@end example
+
+@noindent
+The following syntax can be used to read the used car data:
+
+@c If you change this example, change the regression test in
+@c tests/command/get-data-txt-examples.sh to match.
+@example
+GET DATA /TYPE=TXT /FILE='cars.data' /DELIMITERS=' ' /FIRSTCASE=2
+ /VARIABLES=model A8
+ year F4
+ mileage F6
+ price F5
+ type A4
+ age F2.
+@end example
+
+@noindent
+Consider the following information on animals in a pet store:
+
+@example
+"Pet Name", "Age", "Color", "Date Received", "Price", "Needs Walking", "Type"
+, (Years), , , (Dollars), ,
+"Rover", 4.5, Brown, "12 Feb 2004", 80, True, "Dog"
+"Charlie", , Gold, "5 Apr 2007", 12.3, False, "Fish"
+"Molly", 2, Black, "12 Dec 2006", 25, False, "Cat"
+"Gilly", , White, "10 Apr 2007", 10, False, "Guinea Pig"
+@end example
+
+@noindent
+The following syntax can be used to read the pet store data:
+
+@c If you change this example, change the regression test in
+@c tests/command/get-data-txt-examples.sh to match.
+@example
+GET DATA /TYPE=TXT /FILE='pets.data' /DELIMITERS=', ' /QUALIFIER='"'
+ /FIRSTCASE=3
+ /VARIABLES=name A10
+ age F3.1
+ color A5
+ received EDATE10
+ price F5.2
+ needs_walking A5
+ type A10.
+@end example
+
+@node GET DATA /TYPE=TXT /ARRANGEMENT=FIXED
+@subsubsection Reading Fixed Columnar Data
+
+@display
+GET DATA /TYPE=TXT
+ /FILE=@{'file-name',file_handle@}
+ [/ARRANGEMENT=@{DELIMITED,FIXED@}]
+ [/FIRSTCASE=@{first_case@}]
+ [/IMPORTCASE=@{ALL,FIRST max_cases,PERCENT percent@}]
+
+ [/FIXCASE=n]
+ /VARIABLES fixed_var [fixed_var]@dots{}
+ [/rec# fixed_var [fixed_var]@dots{}]@dots{}
+where each fixed_var takes the form:
+ variable start-end format
+@end display
+
+The GET DATA command with TYPE=TXT and ARRANGEMENT=FIXED reads input
+data from text files in fixed format, where each field is located in
+particular fixed column positions within records of a case. Its
+capabilities are similar to those of DATA LIST FIXED (@pxref{DATA LIST
+FIXED}), with a few enhancements.
+
+The required FILE subcommand and optional FIRSTCASE and IMPORTCASE
+subcommands are described above (@pxref{GET DATA /TYPE=TXT}).
+
+The optional FIXCASE subcommand may be used to specify the positive
+integer number of input lines that make up each case. The default
+value is 1.
+
+The VARIABLES subcommand, which is required, specifies the positions
+at which each variable can be found. For each variable, specify its
+name, followed by its start and end column separated by @samp{-}
+(e.g.@: @samp{0-9}), followed by the input format type (e.g.@:
+@samp{F}). For this command, columns are numbered starting from 0 at
+the left column. Introduce the variables in the second and later
+lines of a case by a slash followed by the number of the line within
+the case, e.g.@: @samp{/2} for the second line.
+
+@subsubheading Examples
+
+@noindent
+Consider the following data on used cars:
+
+@example
+model year mileage price type age
+Civic 2002 29883 15900 Si 2
+Civic 2003 13415 15900 EX 1
+Civic 1992 107000 3800 n/a 12
+Accord 2002 26613 17900 EX 1
+@end example
+
+@noindent
+The following syntax can be used to read the used car data:
+
+@c If you change this example, change the regression test in
+@c tests/command/get-data-txt-examples.sh to match.
+@example
+GET DATA /TYPE=TXT /FILE='cars.data' /ARRANGEMENT=FIXED /FIRSTCASE=2
+ /VARIABLES=model 0-7 A
+ year 8-15 F
+ mileage 16-23 F
+ price 24-31 F
+ type 32-40 A
+ age 40-47 F.
+@end example
@node IMPORT
@section IMPORT
+2007-12-04 Ben Pfaff <blp@gnu.org>
+
+ Move DATA LIST parsing into generic infrastructure, and generalize
+ it slightly. Then, use the same infrastructure to implement GET
+ DATA/TYPE=TXT.
+
+ * data-parser.c: New file.
+
+ * data-parser.h: New file.
+
+ * data-list.c (struct dls_var_spec): Removed.
+ (ll_to_dls_var_spec): Removed.
+ (enum dls_type): Removed.
+ (struct data_list_pgm): Rename struct data_list_trns. Remove
+ pool, specs, type, record_cnt, delims, skip_records, value_cnt
+ members. Add new `parser' member.
+ (cmd_data_list): Use data-parser infrastructure.
+ (parse_fixed): Ditto.
+ (parse_free): Ditto.
+ (dump_fixed_table): Removed.
+ (dump_free_table): Removed.
+ (cut_field): Removed.
+ (read_from_data_list): Removed.
+ (read_from_data_list_fixed): Removed.
+ (read_from_data_list_free): Removed.
+ (read_from_data_list_list): Removed.
+ (data_list_trns_free): Rename arguments for clarity.
+ (data_list_trns_proc): Ditto.
+ (data_list_casereader_read): Removed.
+ (data_list_casereader_destroy): Removed.
+ (data_list_casereader_class): Removed.
+
+ * get-data.c (cmd_get_data): Support TXT type.
+ (set_type): New function.
+ (parse_get_txt): New function.
+
2007-12-04 Ben Pfaff <blp@gnu.org>
* placement-parser.c (parse_column): New function.
language_data_io_sources = \
src/language/data-io/data-list.c \
+ src/language/data-io/data-parser.c \
+ src/language/data-io/data-parser.h \
src/language/data-io/get.c \
src/language/data-io/get-data.c \
src/language/data-io/inpt-pgm.c \
/* PSPP - a program for statistical analysis.
- Copyright (C) 1997-9, 2000, 2006 Free Software Foundation, Inc.
+ Copyright (C) 1997-9, 2000, 2006, 2007 Free Software Foundation, Inc.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
#include <data/case.h>
#include <data/data-in.h>
#include <data/casereader.h>
-#include <data/casereader-provider.h>
#include <data/dictionary.h>
#include <data/format.h>
#include <data/procedure.h>
#include <data/transformations.h>
#include <data/variable.h>
#include <language/command.h>
+#include <language/data-io/data-parser.h>
#include <language/data-io/data-reader.h>
#include <language/data-io/file-handle.h>
#include <language/data-io/inpt-pgm.h>
#include <language/lexer/variable-parser.h>
#include <libpspp/assertion.h>
#include <libpspp/compiler.h>
-#include <libpspp/ll.h>
#include <libpspp/message.h>
#include <libpspp/misc.h>
#include <libpspp/pool.h>
#include <libpspp/str.h>
-#include <output/table.h>
#include "xsize.h"
#include "xalloc.h"
#include "gettext.h"
#define _(msgid) gettext (msgid)
\f
-/* Utility function. */
-
-/* Describes how to parse one variable. */
-struct dls_var_spec
- {
- struct ll ll; /* List element. */
-
- /* All parsers. */
- struct fmt_spec input; /* Input format of this field. */
- int fv; /* First value in case. */
- char name[VAR_NAME_LEN + 1]; /* Var name for error messages and tables. */
-
- /* Fixed format only. */
- int record; /* Record number (1-based). */
- int first_column; /* Column numbers in record. */
- };
-
-static struct dls_var_spec *
-ll_to_dls_var_spec (struct ll *ll)
-{
- return ll_data (ll, struct dls_var_spec, ll);
-}
-
-/* Constants for DATA LIST type. */
-enum dls_type
- {
- DLS_FIXED,
- DLS_FREE,
- DLS_LIST
- };
-
-/* DATA LIST private data structure. */
-struct data_list_pgm
+/* DATA LIST transformation data. */
+struct data_list_trns
{
- struct pool *pool; /* Used for all DATA LIST storage. */
- struct ll_list specs; /* List of dls_var_specs. */
+ struct data_parser *parser; /* Parser. */
struct dfm_reader *reader; /* Data file reader. */
- enum dls_type type; /* Type of DATA LIST construct. */
struct variable *end; /* Variable specified on END subcommand. */
- int record_cnt; /* Number of records. */
- struct string delims; /* Field delimiters. */
- int skip_records; /* Records to skip before first case. */
- size_t value_cnt; /* Number of `union value's in case. */
};
-static const struct casereader_class data_list_casereader_class;
-
-static bool parse_fixed (struct lexer *, struct dictionary *dict,
- struct pool *tmp_pool, struct data_list_pgm *);
-static bool parse_free (struct lexer *, struct dictionary *dict,
- struct pool *tmp_pool, struct data_list_pgm *);
-static void dump_fixed_table (const struct ll_list *,
- const struct file_handle *, int record_cnt);
-static void dump_free_table (const struct data_list_pgm *,
- const struct file_handle *);
+static bool parse_fixed (struct lexer *, struct dictionary *,
+ struct pool *, struct data_parser *);
+static bool parse_free (struct lexer *, struct dictionary *,
+ struct pool *, struct data_parser *);
static trns_free_func data_list_trns_free;
static trns_proc_func data_list_trns_proc;
cmd_data_list (struct lexer *lexer, struct dataset *ds)
{
struct dictionary *dict;
- struct data_list_pgm *dls;
- int table = -1; /* Print table if nonzero, -1=undecided. */
- struct file_handle *fh = NULL;
+ struct data_parser *parser;
+ struct dfm_reader *reader;
+ struct variable *end;
+ struct file_handle *fh;
+
+ int table;
+ enum data_parser_type type;
+ bool has_type;
struct pool *tmp_pool;
bool ok;
dict = in_input_program () ? dataset_dict (ds) : dict_create ();
+ parser = data_parser_create ();
+ reader = NULL;
+ end = NULL;
+ fh = NULL;
- dls = pool_create_container (struct data_list_pgm, pool);
- ll_init (&dls->specs);
- dls->reader = NULL;
- dls->type = -1;
- dls->end = NULL;
- dls->record_cnt = 0;
- dls->skip_records = 0;
- ds_init_empty (&dls->delims);
- ds_register_pool (&dls->delims, dls->pool);
-
- tmp_pool = pool_create_subpool (dls->pool);
+ table = -1; /* Print table if nonzero, -1=undecided. */
+ has_type = false;
while (lex_token (lexer) != '/')
{
lex_match (lexer, '(');
if (!lex_force_int (lexer))
goto error;
- dls->record_cnt = lex_integer (lexer);
+ data_parser_set_records (parser, lex_integer (lexer));
lex_get (lexer);
lex_match (lexer, ')');
}
lex_match (lexer, '=');
if (!lex_force_int (lexer))
goto error;
- dls->skip_records = lex_integer (lexer);
+ data_parser_set_skip (parser, lex_integer (lexer));
lex_get (lexer);
}
else if (lex_match_id (lexer, "END"))
{
- if (dls->end)
+ if (!in_input_program ())
+ {
+ msg (SE, _("The END subcommand may only be used within "
+ "INPUT PROGRAM."));
+ goto error;
+ }
+ if (end)
{
msg (SE, _("The END subcommand may only be specified once."));
goto error;
lex_match (lexer, '=');
if (!lex_force_id (lexer))
goto error;
- dls->end = dict_lookup_var (dict, lex_tokid (lexer));
- if (!dls->end)
- dls->end = dict_create_var_assert (dict, lex_tokid (lexer), 0);
+ end = dict_lookup_var (dict, lex_tokid (lexer));
+ if (!end)
+ end = dict_create_var_assert (dict, lex_tokid (lexer), 0);
lex_get (lexer);
}
+ else if (lex_match_id (lexer, "NOTABLE"))
+ table = 0;
+ else if (lex_match_id (lexer, "TABLE"))
+ table = 1;
else if (lex_token (lexer) == T_ID)
{
- if (lex_match_id (lexer, "NOTABLE"))
- table = 0;
- else if (lex_match_id (lexer, "TABLE"))
- table = 1;
+ if (lex_match_id (lexer, "FIXED"))
+ data_parser_set_type (parser, DP_FIXED);
+ else if (lex_match_id (lexer, "FREE"))
+ {
+ data_parser_set_type (parser, DP_DELIMITED);
+ data_parser_set_span (parser, true);
+ }
+ else if (lex_match_id (lexer, "LIST"))
+ {
+ data_parser_set_type (parser, DP_DELIMITED);
+ data_parser_set_span (parser, false);
+ }
else
{
- int type;
- if (lex_match_id (lexer, "FIXED"))
- type = DLS_FIXED;
- else if (lex_match_id (lexer, "FREE"))
- type = DLS_FREE;
- else if (lex_match_id (lexer, "LIST"))
- type = DLS_LIST;
- else
- {
- lex_error (lexer, NULL);
- goto error;
- }
+ lex_error (lexer, NULL);
+ goto error;
+ }
- if (dls->type != -1)
- {
- msg (SE, _("Only one of FIXED, FREE, or LIST may "
- "be specified."));
- goto error;
- }
- dls->type = type;
+ if (has_type)
+ {
+ msg (SE, _("Only one of FIXED, FREE, or LIST may "
+ "be specified."));
+ goto error;
+ }
+ has_type = true;
- if ((dls->type == DLS_FREE || dls->type == DLS_LIST)
- && lex_match (lexer, '('))
+ if (data_parser_get_type (parser) == DP_DELIMITED)
+ {
+ if (lex_match (lexer, '('))
{
+ struct string delims = DS_EMPTY_INITIALIZER;
+
while (!lex_match (lexer, ')'))
{
int delim;
if (lex_match_id (lexer, "TAB"))
delim = '\t';
- else if (lex_token (lexer) == T_STRING && ds_length (lex_tokstr (lexer)) == 1)
- {
- delim = ds_first (lex_tokstr (lexer));
- lex_get (lexer);
- }
+ else if (lex_token (lexer) == T_STRING
+ && ds_length (lex_tokstr (lexer)) == 1)
+ {
+ delim = ds_first (lex_tokstr (lexer));
+ lex_get (lexer);
+ }
else
{
lex_error (lexer, NULL);
+ ds_destroy (&delims);
goto error;
}
-
- ds_put_char (&dls->delims, delim);
+ ds_put_char (&delims, delim);
lex_match (lexer, ',');
}
+
+ data_parser_set_empty_line_has_field (parser, true);
+ data_parser_set_quotes (parser, ss_empty ());
+ data_parser_set_soft_delimiters (parser, ss_empty ());
+ data_parser_set_hard_delimiters (parser, ds_ss (&delims));
+ ds_destroy (&delims);
+ }
+ else
+ {
+ data_parser_set_empty_line_has_field (parser, false);
+ data_parser_set_quotes (parser, ss_cstr ("'\""));
+ data_parser_set_soft_delimiters (parser,
+ ss_cstr (CC_SPACES));
+ data_parser_set_hard_delimiters (parser, ss_cstr (","));
}
}
}
goto error;
}
}
+ type = data_parser_get_type (parser);
if (fh == NULL)
fh = fh_inline_file ();
fh_set_default_handle (fh);
- if (dls->type == -1)
- dls->type = DLS_FIXED;
-
- if (dls->type != DLS_FIXED && dls->end != NULL)
+ if (type != DP_FIXED && end != NULL)
{
- msg (SE, _("The END keyword may be used only with DATA LIST FIXED."));
+ msg (SE, _("The END subcommand may be used only with DATA LIST FIXED."));
goto error;
}
- if (table == -1)
- table = dls->type != DLS_FREE;
-
- ok = (dls->type == DLS_FIXED ? parse_fixed : parse_free) (lexer, dict, tmp_pool, dls);
+ tmp_pool = pool_create ();
+ if (type == DP_FIXED)
+ ok = parse_fixed (lexer, dict, tmp_pool, parser);
+ else
+ ok = parse_free (lexer, dict, tmp_pool, parser);
+ pool_destroy (tmp_pool);
if (!ok)
goto error;
+ if (!data_parser_any_fields (parser))
+ {
+ msg (SE, _("At least one variable must be specified."));
+ goto error;
+ }
+
if (lex_end_of_command (lexer) != CMD_SUCCESS)
goto error;
+ if (table == -1)
+ table = type == DP_FIXED || !data_parser_get_span (parser);
if (table)
- {
- if (dls->type == DLS_FIXED)
- dump_fixed_table (&dls->specs, fh, dls->record_cnt);
- else
- dump_free_table (dls, fh);
- }
+ data_parser_output_description (parser, fh);
- dls->reader = dfm_open_reader (fh, lexer);
- if (dls->reader == NULL)
+ reader = dfm_open_reader (fh, lexer);
+ if (reader == NULL)
goto error;
- dls->value_cnt = dict_get_next_value_idx (dict);
-
if (in_input_program ())
- add_transformation (ds, data_list_trns_proc, data_list_trns_free, dls);
- else
{
- struct casereader *reader;
- reader = casereader_create_sequential (NULL,
- dict_get_next_value_idx (dict),
- -1, &data_list_casereader_class,
- dls);
- proc_set_active_file (ds, reader, dict);
+ struct data_list_trns *trns = xmalloc (sizeof *trns);
+ trns->parser = parser;
+ trns->reader = reader;
+ trns->end = end;
+ add_transformation (ds, data_list_trns_proc, data_list_trns_free, trns);
}
+ else
+ data_parser_make_active_file (parser, ds, reader, dict);
- pool_destroy (tmp_pool);
fh_unref (fh);
return CMD_SUCCESS;
error:
+ data_parser_destroy (parser);
+ dict_destroy (dict);
fh_unref (fh);
- data_list_trns_free (dls);
return CMD_CASCADING_FAILURE;
}
\f
/* Fixed-format parsing. */
/* Parses all the variable specifications for DATA LIST FIXED,
- storing them into DLS. Uses TMP_POOL for data that is not
- needed once parsing is complete. Returns true only if
+ storing them into DLS. Uses TMP_POOL for temporary storage;
+ the caller may destroy it. Returns true only if
successful. */
static bool
parse_fixed (struct lexer *lexer, struct dictionary *dict,
- struct pool *tmp_pool, struct data_list_pgm *dls)
+ struct pool *tmp_pool, struct data_parser *parser)
{
- int last_nonempty_record;
+ int max_records = data_parser_get_records (parser);
int record = 0;
int column = 1;
char *name;
int width;
struct variable *v;
- struct dls_var_spec *spec;
name = names[name_idx++];
}
}
- /* Create specifier for parsing the variable. */
- spec = pool_alloc (dls->pool, sizeof *spec);
- spec->input = *f;
- spec->fv = var_get_case_index (v);
- spec->record = record;
- spec->first_column = column;
- strcpy (spec->name, var_get_name (v));
- ll_push_tail (&dls->specs, &spec->ll);
+ if (max_records && record > max_records)
+ {
+ msg (SE, _("Cannot place variable %s on record %d when "
+ "RECORDS=%d is specified."),
+ var_get_name (v), record,
+ data_parser_get_records (parser));
+ }
+
+ data_parser_add_fixed_field (parser, f,
+ var_get_case_index (v),
+ var_get_name (v), record, column);
column += f->w;
}
assert (name_idx == name_cnt);
}
- if (ll_is_empty (&dls->specs))
- {
- msg (SE, _("At least one variable must be specified."));
- return false;
- }
-
- last_nonempty_record = ll_to_dls_var_spec (ll_tail (&dls->specs))->record;
- if (dls->record_cnt && last_nonempty_record > dls->record_cnt)
- {
- msg (SE, _("Variables are specified on records that "
- "should not exist according to RECORDS subcommand."));
- return false;
- }
- else if (!dls->record_cnt)
- dls->record_cnt = last_nonempty_record;
return true;
}
-
-/* Displays a table giving information on fixed-format variable
- parsing on DATA LIST. */
-static void
-dump_fixed_table (const struct ll_list *specs,
- const struct file_handle *fh, int record_cnt)
-{
- size_t spec_cnt;
- struct tab_table *t;
- struct dls_var_spec *spec;
- int row;
-
- spec_cnt = ll_count (specs);
- t = tab_create (4, spec_cnt + 1, 0);
- tab_columns (t, TAB_COL_DOWN, 1);
- tab_headers (t, 0, 0, 1, 0);
- tab_text (t, 0, 0, TAB_CENTER | TAT_TITLE, _("Variable"));
- tab_text (t, 1, 0, TAB_CENTER | TAT_TITLE, _("Record"));
- tab_text (t, 2, 0, TAB_CENTER | TAT_TITLE, _("Columns"));
- tab_text (t, 3, 0, TAB_CENTER | TAT_TITLE, _("Format"));
- tab_box (t, TAL_1, TAL_1, TAL_0, TAL_1, 0, 0, 3, spec_cnt);
- tab_hline (t, TAL_2, 0, 3, 1);
- tab_dim (t, tab_natural_dimensions);
-
- row = 1;
- ll_for_each (spec, struct dls_var_spec, ll, specs)
- {
- char fmt_string[FMT_STRING_LEN_MAX + 1];
- tab_text (t, 0, row, TAB_LEFT, spec->name);
- tab_text (t, 1, row, TAT_PRINTF, "%d", spec->record);
- tab_text (t, 2, row, TAT_PRINTF, "%3d-%3d",
- spec->first_column, spec->first_column + spec->input.w - 1);
- tab_text (t, 3, row, TAB_LEFT | TAB_FIX,
- fmt_to_string (&spec->input, fmt_string));
- row++;
- }
-
- tab_title (t, ngettext ("Reading %d record from %s.",
- "Reading %d records from %s.", record_cnt),
- record_cnt, fh_get_name (fh));
- tab_submit (t);
-}
\f
/* Free-format parsing. */
/* Parses variable specifications for DATA LIST FREE and adds
- them to DLS. Uses TMP_POOL for data that is not needed once
- parsing is complete. Returns true only if successful. */
+ them to DLS. Uses TMP_POOL for temporary storage; the caller
+ may destroy it. Returns true only if successful. */
static bool
-parse_free (struct lexer *lexer, struct dictionary *dict, struct pool *tmp_pool,
- struct data_list_pgm *dls)
+parse_free (struct lexer *lexer, struct dictionary *dict,
+ struct pool *tmp_pool, struct data_parser *parser)
{
lex_get (lexer);
while (lex_token (lexer) != '.')
for (i = 0; i < name_cnt; i++)
{
- struct dls_var_spec *spec;
struct variable *v;
v = dict_create_var (dict, name[i], fmt_var_width (&input));
}
var_set_both_formats (v, &output);
- spec = pool_alloc (dls->pool, sizeof *spec);
- spec->input = input;
- spec->fv = var_get_case_index (v);
- strcpy (spec->name, var_get_name (v));
- ll_push_tail (&dls->specs, &spec->ll);
+ data_parser_add_delimited_field (parser,
+ &input, var_get_case_index (v),
+ var_get_name (v));
}
}
return true;
}
-
-/* Displays a table giving information on free-format variable parsing
- on DATA LIST. */
-static void
-dump_free_table (const struct data_list_pgm *dls,
- const struct file_handle *fh)
-{
- struct tab_table *t;
- struct dls_var_spec *spec;
- size_t spec_cnt;
- int row;
-
- spec_cnt = ll_count (&dls->specs);
-
- t = tab_create (2, spec_cnt + 1, 0);
- tab_columns (t, TAB_COL_DOWN, 1);
- tab_headers (t, 0, 0, 1, 0);
- tab_text (t, 0, 0, TAB_CENTER | TAT_TITLE, _("Variable"));
- tab_text (t, 1, 0, TAB_CENTER | TAT_TITLE, _("Format"));
- tab_box (t, TAL_1, TAL_1, TAL_0, TAL_1, 0, 0, 1, spec_cnt);
- tab_hline (t, TAL_2, 0, 1, 1);
- tab_dim (t, tab_natural_dimensions);
- row = 1;
- ll_for_each (spec, struct dls_var_spec, ll, &dls->specs)
- {
- char str[FMT_STRING_LEN_MAX + 1];
- tab_text (t, 0, row, TAB_LEFT, spec->name);
- tab_text (t, 1, row, TAB_LEFT | TAB_FIX,
- fmt_to_string (&spec->input, str));
- row++;
- }
-
- tab_title (t, _("Reading free-form data from %s."), fh_get_name (fh));
-
- tab_submit (t);
-}
\f
/* Input procedure. */
-/* Extracts a field from the current position in the current
- record. Fields can be unquoted or quoted with single- or
- double-quote characters.
-
- *FIELD is set to the field content. The caller must not
- or destroy this constant string.
-
- After parsing the field, sets the current position in the
- record to just past the field and any trailing delimiter.
- Returns 0 on failure or a 1-based column number indicating the
- beginning of the field on success. */
-static bool
-cut_field (const struct data_list_pgm *dls, struct substring *field)
-{
- struct substring line, p;
-
- if (dfm_eof (dls->reader))
- return false;
- if (ds_is_empty (&dls->delims))
- dfm_expand_tabs (dls->reader);
- line = p = dfm_get_record (dls->reader);
-
- if (ds_is_empty (&dls->delims))
- {
- bool missing_quote = false;
-
- /* Skip leading whitespace. */
- ss_ltrim (&p, ss_cstr (CC_SPACES));
- if (ss_is_empty (p))
- return false;
-
- /* Handle actual data, whether quoted or unquoted. */
- if (ss_match_char (&p, '\''))
- missing_quote = !ss_get_until (&p, '\'', field);
- else if (ss_match_char (&p, '"'))
- missing_quote = !ss_get_until (&p, '"', field);
- else
- ss_get_chars (&p, ss_cspan (p, ss_cstr ("," CC_SPACES)), field);
- if (missing_quote)
- msg (SW, _("Quoted string extends beyond end of line."));
-
- /* Skip trailing whitespace and a single comma if present. */
- ss_ltrim (&p, ss_cstr (CC_SPACES));
- ss_match_char (&p, ',');
-
- dfm_forward_columns (dls->reader, ss_length (line) - ss_length (p));
- }
- else
- {
- if (!ss_is_empty (p))
- ss_get_chars (&p, ss_cspan (p, ds_ss (&dls->delims)), field);
- else if (dfm_columns_past_end (dls->reader) == 0)
- {
- /* A blank line or a line that ends in a delimiter has a
- trailing blank field. */
- *field = p;
- }
- else
- return false;
-
- /* Advance past the field.
-
- Also advance past a trailing delimiter, regardless of
- whether one actually existed. If we "skip" a delimiter
- that was not actually there, then we will return
- end-of-line on our next call, which is what we want. */
- dfm_forward_columns (dls->reader, ss_length (line) - ss_length (p) + 1);
- }
- return true;
-}
-
-static bool read_from_data_list_fixed (const struct data_list_pgm *,
- struct ccase *);
-static bool read_from_data_list_free (const struct data_list_pgm *,
- struct ccase *);
-static bool read_from_data_list_list (const struct data_list_pgm *,
- struct ccase *);
-
-/* Reads a case from DLS into C.
- Returns true if successful, false at end of file or on I/O error. */
-static bool
-read_from_data_list (const struct data_list_pgm *dls, struct ccase *c)
-{
- bool retval;
-
- dfm_push (dls->reader);
- switch (dls->type)
- {
- case DLS_FIXED:
- retval = read_from_data_list_fixed (dls, c);
- break;
- case DLS_FREE:
- retval = read_from_data_list_free (dls, c);
- break;
- case DLS_LIST:
- retval = read_from_data_list_list (dls, c);
- break;
- default:
- NOT_REACHED ();
- }
- dfm_pop (dls->reader);
-
- return retval;
-}
-
-/* Reads a case from the data file into C, parsing it according
- to fixed-format syntax rules in DLS.
- Returns true if successful, false at end of file or on I/O error. */
-static bool
-read_from_data_list_fixed (const struct data_list_pgm *dls, struct ccase *c)
-{
- enum legacy_encoding encoding = dfm_reader_get_legacy_encoding (dls->reader);
- struct dls_var_spec *spec;
- int row;
-
- if (dfm_eof (dls->reader))
- return false;
-
- spec = ll_to_dls_var_spec (ll_head (&dls->specs));
- for (row = 1; row <= dls->record_cnt; row++)
- {
- struct substring line;
-
- if (dfm_eof (dls->reader))
- {
- msg (SW, _("Partial case of %d of %d records discarded."),
- row - 1, dls->record_cnt);
- return false;
- }
- dfm_expand_tabs (dls->reader);
- line = dfm_get_record (dls->reader);
-
- ll_for_each_continue (spec, struct dls_var_spec, ll, &dls->specs)
- {
- if (row < spec->record)
- break;
-
- data_in (ss_substr (line, spec->first_column - 1,
- spec->input.w),
- encoding, spec->input.type, spec->input.d,
- spec->first_column, case_data_rw_idx (c, spec->fv),
- fmt_var_width (&spec->input));
- }
-
- dfm_forward_record (dls->reader);
- }
-
- return true;
-}
-
-/* Reads a case from the data file into C, parsing it according
- to free-format syntax rules in DLS.
- Returns true if successful, false at end of file or on I/O error. */
-static bool
-read_from_data_list_free (const struct data_list_pgm *dls, struct ccase *c)
-{
- enum legacy_encoding encoding = dfm_reader_get_legacy_encoding (dls->reader);
- struct dls_var_spec *spec;
-
- ll_for_each (spec, struct dls_var_spec, ll, &dls->specs)
- {
- struct substring field;
-
- /* Cut out a field and read in a new record if necessary. */
- while (!cut_field (dls, &field))
- {
- if (!dfm_eof (dls->reader))
- dfm_forward_record (dls->reader);
- if (dfm_eof (dls->reader))
- {
- if (&spec->ll != ll_head (&dls->specs))
- msg (SW, _("Partial case discarded. The first variable "
- "missing was %s."), spec->name);
- return false;
- }
- }
-
- data_in (field, encoding, spec->input.type, 0,
- dfm_get_column (dls->reader, ss_data (field)),
- case_data_rw_idx (c, spec->fv), fmt_var_width (&spec->input));
- }
- return true;
-}
-
-/* Reads a case from the data file and parses it according to
- list-format syntax rules.
- Returns true if successful, false at end of file or on I/O error. */
-static bool
-read_from_data_list_list (const struct data_list_pgm *dls, struct ccase *c)
-{
- enum legacy_encoding encoding = dfm_reader_get_legacy_encoding (dls->reader);
- struct dls_var_spec *spec;
-
- if (dfm_eof (dls->reader))
- return false;
-
- ll_for_each (spec, struct dls_var_spec, ll, &dls->specs)
- {
- struct substring field;
-
- if (!cut_field (dls, &field))
- {
- if (get_undefined ())
- msg (SW, _("Missing value(s) for all variables from %s onward. "
- "These will be filled with the system-missing value "
- "or blanks, as appropriate."),
- spec->name);
- ll_for_each_continue (spec, struct dls_var_spec, ll, &dls->specs)
- {
- int width = fmt_var_width (&spec->input);
- if (width == 0)
- case_data_rw_idx (c, spec->fv)->f = SYSMIS;
- else
- memset (case_data_rw_idx (c, spec->fv)->s, ' ', width);
- }
- break;
- }
-
- data_in (field, encoding, spec->input.type, 0,
- dfm_get_column (dls->reader, ss_data (field)),
- case_data_rw_idx (c, spec->fv), fmt_var_width (&spec->input));
- }
-
- dfm_forward_record (dls->reader);
- return true;
-}
-
-/* Destroys DATA LIST transformation DLS.
+/* Destroys DATA LIST transformation TRNS.
Returns true if successful, false if an I/O error occurred. */
static bool
-data_list_trns_free (void *dls_)
+data_list_trns_free (void *trns_)
{
- struct data_list_pgm *dls = dls_;
- dfm_close_reader (dls->reader);
- pool_destroy (dls->pool);
+ struct data_list_trns *trns = trns_;
+ data_parser_destroy (trns->parser);
+ dfm_close_reader (trns->reader);
+ free (trns);
return true;
}
-/* Handle DATA LIST transformation DLS, parsing data into C. */
+/* Handle DATA LIST transformation TRNS, parsing data into C. */
static int
-data_list_trns_proc (void *dls_, struct ccase *c, casenumber case_num UNUSED)
+data_list_trns_proc (void *trns_, struct ccase *c, casenumber case_num UNUSED)
{
- struct data_list_pgm *dls = dls_;
+ struct data_list_trns *trns = trns_;
int retval;
- if (read_from_data_list (dls, c))
+ if (data_parser_parse (trns->parser, trns->reader, c))
retval = TRNS_CONTINUE;
- else if (dfm_reader_error (dls->reader) || dfm_eof (dls->reader) > 1)
+ else if (dfm_reader_error (trns->reader) || dfm_eof (trns->reader) > 1)
{
/* An I/O error, or encountering end of file for a second
time, should be escalated into a more serious error. */
retval = TRNS_END_FILE;
/* If there was an END subcommand handle it. */
- if (dls->end != NULL)
+ if (trns->end != NULL)
{
- double *end = &case_data_rw (c, dls->end)->f;
+ double *end = &case_data_rw (c, trns->end)->f;
if (retval == TRNS_END_FILE)
{
*end = 1.0;
return retval;
}
\f
-/* Reads one case into OUTPUT_CASE.
- Returns true if successful, false at end of file or if an
- I/O error occurred. */
-static bool
-data_list_casereader_read (struct casereader *reader UNUSED, void *dls_,
- struct ccase *c)
-{
- struct data_list_pgm *dls = dls_;
- bool ok;
-
- /* Skip the requested number of records before reading the
- first case. */
- while (dls->skip_records > 0)
- {
- if (dfm_eof (dls->reader))
- return false;
- dfm_forward_record (dls->reader);
- dls->skip_records--;
- }
-
- case_create (c, dls->value_cnt);
- ok = read_from_data_list (dls, c);
- if (!ok)
- case_destroy (c);
- return ok;
-}
-
-/* Destroys the casereader. */
-static void
-data_list_casereader_destroy (struct casereader *reader UNUSED, void *dls_)
-{
- struct data_list_pgm *dls = dls_;
- if (dfm_reader_error (dls->reader))
- casereader_force_error (reader);
- data_list_trns_free (dls);
-}
-
-static const struct casereader_class data_list_casereader_class =
- {
- data_list_casereader_read,
- data_list_casereader_destroy,
- NULL,
- NULL,
- };
--- /dev/null
+/* PSPP - a program for statistical analysis.
+ Copyright (C) 2007 Free Software Foundation, Inc.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>. */
+
+#include <config.h>
+
+#include <language/data-io/data-parser.h>
+
+#include <stdint.h>
+#include <stdlib.h>
+
+#include <data/casereader-provider.h>
+#include <data/data-in.h>
+#include <data/dictionary.h>
+#include <data/format.h>
+#include <data/file-handle-def.h>
+#include <data/procedure.h>
+#include <data/settings.h>
+#include <language/data-io/data-reader.h>
+#include <libpspp/message.h>
+#include <libpspp/str.h>
+#include <output/table.h>
+
+#include "xalloc.h"
+
+#include "gettext.h"
+#define _(msgid) gettext (msgid)
+
+/* Data parser for textual data like that read by DATA LIST. */
+struct data_parser
+ {
+ enum data_parser_type type; /* Type of data to parse. */
+ int skip_records; /* Records to skip before first real data. */
+ casenumber max_cases; /* Max number of cases to read. */
+ int percent_cases; /* Approximate percent of cases to read. */
+
+ struct field *fields; /* Fields to parse. */
+ size_t field_cnt; /* Number of fields. */
+ size_t field_allocated; /* Number of fields spaced allocated for. */
+
+ /* DP_DELIMITED parsers only. */
+ bool span; /* May cases span multiple records? */
+ bool empty_line_has_field; /* Does an empty line have an (empty) field? */
+ struct substring quotes; /* Characters that can quote separators. */
+ struct substring soft_seps; /* Two soft separators act like just one. */
+ struct substring hard_seps; /* Two hard separators yield empty fields. */
+ struct string any_sep; /* Concatenation of soft_seps and hard_seps. */
+
+ /* DP_FIXED parsers only. */
+ int records_per_case; /* Number of records in each case. */
+ };
+
+/* How to parse one variable. */
+struct field
+ {
+ struct fmt_spec format; /* Input format of this field. */
+ int case_idx; /* First value in case. */
+ char *name; /* Var name for error messages and tables. */
+
+ /* DP_FIXED only. */
+ int record; /* Record number (1-based). */
+ int first_column; /* First column in record (1-based). */
+ };
+
+static void set_any_sep (struct data_parser *parser);
+
+/* Creates and returns a new data parser. */
+struct data_parser *
+data_parser_create (void)
+{
+ struct data_parser *parser = xmalloc (sizeof *parser);
+
+ parser->type = DP_FIXED;
+ parser->skip_records = 0;
+ parser->max_cases = -1;
+ parser->percent_cases = 100;
+
+ parser->fields = NULL;
+ parser->field_cnt = 0;
+ parser->field_allocated = 0;
+
+ parser->span = true;
+ parser->empty_line_has_field = false;
+ ss_alloc_substring (&parser->quotes, ss_cstr ("\"'"));
+ ss_alloc_substring (&parser->soft_seps, ss_cstr (CC_SPACES));
+ ss_alloc_substring (&parser->hard_seps, ss_cstr (","));
+ ds_init_empty (&parser->any_sep);
+ set_any_sep (parser);
+
+ parser->records_per_case = 0;
+
+ return parser;
+}
+
+/* Destroys PARSER. */
+void
+data_parser_destroy (struct data_parser *parser)
+{
+ if (parser != NULL)
+ {
+ size_t i;
+
+ for (i = 0; i < parser->field_cnt; i++)
+ free (parser->fields[i].name);
+ free (parser->fields);
+ ss_dealloc (&parser->quotes);
+ ss_dealloc (&parser->soft_seps);
+ ss_dealloc (&parser->hard_seps);
+ ds_destroy (&parser->any_sep);
+ free (parser);
+ }
+}
+
+/* Returns the type of PARSER (either DP_DELIMITED or DP_FIXED). */
+enum data_parser_type
+data_parser_get_type (const struct data_parser *parser)
+{
+ return parser->type;
+}
+
+/* Sets the type of PARSER to TYPE (either DP_DELIMITED or
+ DP_FIXED). */
+void
+data_parser_set_type (struct data_parser *parser, enum data_parser_type type)
+{
+ assert (parser->field_cnt == 0);
+ assert (type == DP_FIXED || type == DP_DELIMITED);
+ parser->type = type;
+}
+
+/* Configures PARSER to skip the specified number of
+ INITIAL_RECORDS_TO_SKIP before parsing any data. By default,
+ no records are skipped. */
+void
+data_parser_set_skip (struct data_parser *parser, int initial_records_to_skip)
+{
+ assert (initial_records_to_skip >= 0);
+ parser->skip_records = initial_records_to_skip;
+}
+
+/* Sets the maximum number of cases parsed by PARSER to
+ MAX_CASES. The default is -1, meaning no limit. */
+void
+data_parser_set_case_limit (struct data_parser *parser, casenumber max_cases)
+{
+ parser->max_cases = max_cases;
+}
+
+/* Sets the percentage of cases that PARSER should read from the
+ input file to PERCENT_CASES. By default, all cases are
+ read. */
+void
+data_parser_set_case_percent (struct data_parser *parser, int percent_cases)
+{
+ assert (percent_cases >= 0 && percent_cases <= 100);
+ parser->percent_cases = percent_cases;
+}
+
+/* Returns true if PARSER is configured to allow cases to span
+ multiple records. */
+bool
+data_parser_get_span (const struct data_parser *parser)
+{
+ return parser->span;
+}
+
+/* If MAY_CASES_SPAN_RECORDS is true, configures PARSER to allow
+ a single case to span multiple records and multiple cases to
+ occupy a single record. If MAY_CASES_SPAN_RECORDS is false,
+ configures PARSER to require each record to contain exactly
+ one case.
+
+ This setting affects parsing of DP_DELIMITED files only. */
+void
+data_parser_set_span (struct data_parser *parser, bool may_cases_span_records)
+{
+ parser->span = may_cases_span_records;
+}
+
+/* If EMPTY_LINE_HAS_FIELD is true, configures PARSER to parse an
+ empty line as an empty field and to treat a hard delimiter
+ followed by end-of-line as an empty field. If
+ EMPTY_LINE_HAS_FIELD is false, PARSER will skip empty lines
+ and hard delimiters at the end of lines without emitting empty
+ fields.
+
+ This setting affects parsing of DP_DELIMITED files only. */
+void
+data_parser_set_empty_line_has_field (struct data_parser *parser,
+ bool empty_line_has_field)
+{
+ parser->empty_line_has_field = empty_line_has_field;
+}
+
+/* Sets the characters that may be used for quoting field
+ contents to QUOTES. If QUOTES is empty, quoting will be
+ disabled.
+
+ The caller retains ownership of QUOTES.
+
+ This setting affects parsing of DP_DELIMITED files only. */
+void
+data_parser_set_quotes (struct data_parser *parser, struct substring quotes)
+{
+ ss_dealloc (&parser->quotes);
+ ss_alloc_substring (&parser->quotes, quotes);
+}
+
+/* Sets PARSER's soft delimiters to DELIMITERS. Soft delimiters
+ separate fields, but consecutive soft delimiters do not yield
+ empty fields. (Ordinarily, only white space characters are
+ appropriate soft delimiters.)
+
+ The caller retains ownership of DELIMITERS.
+
+ This setting affects parsing of DP_DELIMITED files only. */
+void
+data_parser_set_soft_delimiters (struct data_parser *parser,
+ struct substring delimiters)
+{
+ ss_dealloc (&parser->soft_seps);
+ ss_alloc_substring (&parser->soft_seps, delimiters);
+ set_any_sep (parser);
+}
+
+/* Sets PARSER's hard delimiters to DELIMITERS. Hard delimiters
+ separate fields. A consecutive pair of hard delimiters yield
+ an empty field.
+
+ The caller retains ownership of DELIMITERS.
+
+ This setting affects parsing of DP_DELIMITED files only. */
+void
+data_parser_set_hard_delimiters (struct data_parser *parser,
+ struct substring delimiters)
+{
+ ss_dealloc (&parser->hard_seps);
+ ss_alloc_substring (&parser->hard_seps, delimiters);
+ set_any_sep (parser);
+}
+
+/* Returns the number of records per case. */
+int
+data_parser_get_records (const struct data_parser *parser)
+{
+ return parser->records_per_case;
+}
+
+/* Sets the number of records per case to RECORDS_PER_CASE.
+
+ This setting affects parsing of DP_FIXED files only. */
+void
+data_parser_set_records (struct data_parser *parser, int records_per_case)
+{
+ assert (records_per_case >= 0);
+ assert (records_per_case >= parser->records_per_case);
+ parser->records_per_case = records_per_case;
+}
+
+static void
+add_field (struct data_parser *p, const struct fmt_spec *format, int case_idx,
+ const char *name, int record, int first_column)
+{
+ struct field *field;
+
+ if (p->field_cnt == p->field_allocated)
+ p->fields = x2nrealloc (p->fields, &p->field_allocated, sizeof *p->fields);
+ field = &p->fields[p->field_cnt++];
+ field->format = *format;
+ field->case_idx = case_idx;
+ field->name = xstrdup (name);
+ field->record = record;
+ field->first_column = first_column;
+}
+
+/* Adds a delimited field to the field parsed by PARSER, which
+ must be configured as a DP_DELIMITED parser. The field is
+ parsed as input format FORMAT. Its data will be stored into case
+ index CASE_INDEX. Errors in input data will be reported
+ against variable NAME. */
+void
+data_parser_add_delimited_field (struct data_parser *parser,
+ const struct fmt_spec *format, int case_idx,
+ const char *name)
+{
+ assert (parser->type == DP_DELIMITED);
+ add_field (parser, format, case_idx, name, 0, 0);
+}
+
+/* Adds a fixed field to the field parsed by PARSER, which
+ must be configured as a DP_FIXED parser. The field is
+ parsed as input format FORMAT. Its data will be stored into case
+ index CASE_INDEX. Errors in input data will be reported
+ against variable NAME. The field will be drawn from the
+ FORMAT->w columns in 1-based RECORD starting at 1-based
+ column FIRST_COLUMN.
+
+ RECORD must be at least as great as that of any field already
+ added; that is, fields must be added in increasing order of
+ record number. If RECORD is greater than the current number
+ of records per case, the number of records per case are
+ increased as needed. */
+void
+data_parser_add_fixed_field (struct data_parser *parser,
+ const struct fmt_spec *format, int case_idx,
+ const char *name,
+ int record, int first_column)
+{
+ assert (parser->type == DP_FIXED);
+ assert (parser->field_cnt == 0
+ || record >= parser->fields[parser->field_cnt - 1].record);
+ if (record > parser->records_per_case)
+ parser->records_per_case = record;
+ add_field (parser, format, case_idx, name, record, first_column);
+}
+
+/* Returns true if any fields have been added to PARSER, false
+ otherwise. */
+bool
+data_parser_any_fields (const struct data_parser *parser)
+{
+ return parser->field_cnt > 0;
+}
+
+static void
+set_any_sep (struct data_parser *parser)
+{
+ ds_assign_substring (&parser->any_sep, parser->soft_seps);
+ ds_put_substring (&parser->any_sep, parser->hard_seps);
+}
+\f
+static bool parse_delimited_span (const struct data_parser *,
+ struct dfm_reader *, struct ccase *);
+static bool parse_delimited_no_span (const struct data_parser *,
+ struct dfm_reader *, struct ccase *);
+static bool parse_fixed (const struct data_parser *,
+ struct dfm_reader *, struct ccase *);
+
+/* Reads a case from DFM into C, parsing it with PARSER.
+ Returns true if successful, false at end of file or on I/O error. */
+bool
+data_parser_parse (struct data_parser *parser, struct dfm_reader *reader,
+ struct ccase *c)
+{
+ bool retval;
+
+ assert (data_parser_any_fields (parser));
+
+ /* Skip the requested number of records before reading the
+ first case. */
+ for (; parser->skip_records > 0; parser->skip_records--)
+ {
+ if (dfm_eof (reader))
+ return false;
+ dfm_forward_record (reader);
+ }
+
+ /* Limit cases. */
+ if (parser->max_cases != -1 && parser->max_cases-- == 0)
+ return false;
+ if (parser->percent_cases < 100
+ && dfm_get_percent_read (reader) >= parser->percent_cases)
+ return false;
+
+ dfm_push (reader);
+ if (parser->type == DP_DELIMITED)
+ {
+ if (parser->span)
+ retval = parse_delimited_span (parser, reader, c);
+ else
+ retval = parse_delimited_no_span (parser, reader, c);
+ }
+ else
+ retval = parse_fixed (parser, reader, c);
+ dfm_pop (reader);
+
+ return retval;
+}
+
+/* Extracts a delimited field from the current position in the
+ current record according to PARSER, reading data from READER.
+
+ *FIELD is set to the field content. The caller must not or
+ destroy this constant string.
+
+ After parsing the field, sets the current position in the
+ record to just past the field and any trailing delimiter.
+ Returns 0 on failure or a 1-based column number indicating the
+ beginning of the field on success. */
+static bool
+cut_field (const struct data_parser *parser, struct dfm_reader *reader,
+ struct substring *field)
+{
+ struct substring line, p;
+
+ if (dfm_eof (reader))
+ return false;
+ if (ss_is_empty (parser->hard_seps))
+ dfm_expand_tabs (reader);
+ line = p = dfm_get_record (reader);
+
+ /* Skip leading soft separators. */
+ ss_ltrim (&p, parser->soft_seps);
+
+ /* Handle empty or completely consumed lines. */
+ if (ss_is_empty (p))
+ {
+ if (!parser->empty_line_has_field || dfm_columns_past_end (reader) > 0)
+ return false;
+ else
+ {
+ *field = p;
+ dfm_forward_columns (reader, 1);
+ return true;
+ }
+ }
+
+ if (ss_find_char (parser->quotes, ss_first (p)) != SIZE_MAX)
+ {
+ /* Quoted field. */
+ if (!ss_get_until (&p, ss_get_char (&p), field))
+ msg (SW, _("Quoted string extends beyond end of line."));
+
+ /* Skip trailing soft separator and a single hard separator
+ if present. */
+ ss_ltrim (&p, parser->soft_seps);
+ if (!ss_is_empty (p)
+ && ss_find_char (parser->hard_seps, ss_first (p)) != SIZE_MAX)
+ ss_advance (&p, 1);
+ }
+ else
+ {
+ /* Regular field. */
+ ss_get_chars (&p, ss_cspan (p, ds_ss (&parser->any_sep)), field);
+ if (!ss_ltrim (&p, parser->soft_seps) || ss_is_empty (p))
+ {
+ /* Advance past a trailing hard separator,
+ regardless of whether one actually existed. If
+ we "skip" a delimiter that was not actually
+ there, then we will return end-of-line on our
+ next call, which is what we want. */
+ dfm_forward_columns (reader, 1);
+ }
+ }
+ dfm_forward_columns (reader, ss_length (line) - ss_length (p));
+
+ return true;
+}
+
+/* Reads a case from READER into C, parsing it according to
+ fixed-format syntax rules in PARSER.
+ Returns true if successful, false at end of file or on I/O error. */
+static bool
+parse_fixed (const struct data_parser *parser, struct dfm_reader *reader,
+ struct ccase *c)
+{
+ enum legacy_encoding encoding = dfm_reader_get_legacy_encoding (reader);
+ struct field *f;
+ int row;
+
+ if (dfm_eof (reader))
+ return false;
+
+ f = parser->fields;
+ for (row = 1; row <= parser->records_per_case; row++)
+ {
+ struct substring line;
+
+ if (dfm_eof (reader))
+ {
+ msg (SW, _("Partial case of %d of %d records discarded."),
+ row - 1, parser->records_per_case);
+ return false;
+ }
+ dfm_expand_tabs (reader);
+ line = dfm_get_record (reader);
+
+ for (; f < &parser->fields[parser->field_cnt] && f->record == row; f++)
+ data_in (ss_substr (line, f->first_column - 1,
+ f->format.w),
+ encoding, f->format.type, f->format.d,
+ f->first_column, case_data_rw_idx (c, f->case_idx),
+ fmt_var_width (&f->format));
+
+ dfm_forward_record (reader);
+ }
+
+ return true;
+}
+
+/* Reads a case from READER into C, parsing it according to
+ free-format syntax rules in PARSER.
+ Returns true if successful, false at end of file or on I/O error. */
+static bool
+parse_delimited_span (const struct data_parser *parser,
+ struct dfm_reader *reader, struct ccase *c)
+{
+ enum legacy_encoding encoding = dfm_reader_get_legacy_encoding (reader);
+ struct field *f;
+
+ for (f = parser->fields; f < &parser->fields[parser->field_cnt]; f++)
+ {
+ struct substring s;
+
+ /* Cut out a field and read in a new record if necessary. */
+ while (!cut_field (parser, reader, &s))
+ {
+ if (!dfm_eof (reader))
+ dfm_forward_record (reader);
+ if (dfm_eof (reader))
+ {
+ if (f > parser->fields)
+ msg (SW, _("Partial case discarded. The first variable "
+ "missing was %s."), f->name);
+ return false;
+ }
+ }
+
+ data_in (s, encoding, f->format.type, 0,
+ dfm_get_column (reader, ss_data (s)),
+ case_data_rw_idx (c, f->case_idx),
+ fmt_var_width (&f->format));
+ }
+ return true;
+}
+
+/* Reads a case from READER into C, parsing it according to
+ delimited syntax rules with one case per record in PARSER.
+ Returns true if successful, false at end of file or on I/O error. */
+static bool
+parse_delimited_no_span (const struct data_parser *parser,
+ struct dfm_reader *reader, struct ccase *c)
+{
+ enum legacy_encoding encoding = dfm_reader_get_legacy_encoding (reader);
+ struct substring s;
+ struct field *f;
+
+ if (dfm_eof (reader))
+ return false;
+
+ for (f = parser->fields; f < &parser->fields[parser->field_cnt]; f++)
+ {
+ if (!cut_field (parser, reader, &s))
+ {
+ if (get_undefined ())
+ msg (SW, _("Missing value(s) for all variables from %s onward. "
+ "These will be filled with the system-missing value "
+ "or blanks, as appropriate."),
+ f->name);
+ for (; f < &parser->fields[parser->field_cnt]; f++)
+ {
+ int width = fmt_var_width (&f->format);
+ if (width == 0)
+ case_data_rw_idx (c, f->case_idx)->f = SYSMIS;
+ else
+ memset (case_data_rw_idx (c, f->case_idx)->s, ' ', width);
+ }
+ goto exit;
+ }
+
+ data_in (s, encoding, f->format.type, 0,
+ dfm_get_column (reader, ss_data (s)),
+ case_data_rw_idx (c, f->case_idx),
+ fmt_var_width (&f->format));
+ }
+
+ s = dfm_get_record (reader);
+ ss_ltrim (&s, parser->soft_seps);
+ if (!ss_is_empty (s))
+ msg (SW, _("Record ends in data not part of any field."));
+
+exit:
+ dfm_forward_record (reader);
+ return true;
+}
+\f
+/* Displays a table giving information on fixed-format variable
+ parsing on DATA LIST. */
+static void
+dump_fixed_table (const struct data_parser *parser,
+ const struct file_handle *fh)
+{
+ struct tab_table *t;
+ size_t i;
+
+ t = tab_create (4, parser->field_cnt + 1, 0);
+ tab_columns (t, TAB_COL_DOWN, 1);
+ tab_headers (t, 0, 0, 1, 0);
+ tab_text (t, 0, 0, TAB_CENTER | TAT_TITLE, _("Variable"));
+ tab_text (t, 1, 0, TAB_CENTER | TAT_TITLE, _("Record"));
+ tab_text (t, 2, 0, TAB_CENTER | TAT_TITLE, _("Columns"));
+ tab_text (t, 3, 0, TAB_CENTER | TAT_TITLE, _("Format"));
+ tab_box (t, TAL_1, TAL_1, TAL_0, TAL_1, 0, 0, 3, parser->field_cnt);
+ tab_hline (t, TAL_2, 0, 3, 1);
+ tab_dim (t, tab_natural_dimensions);
+
+ for (i = 0; i < parser->field_cnt; i++)
+ {
+ struct field *f = &parser->fields[i];
+ char fmt_string[FMT_STRING_LEN_MAX + 1];
+ int row = i + 1;
+
+ tab_text (t, 0, row, TAB_LEFT, f->name);
+ tab_text (t, 1, row, TAT_PRINTF, "%d", f->record);
+ tab_text (t, 2, row, TAT_PRINTF, "%3d-%3d",
+ f->first_column, f->first_column + f->format.w - 1);
+ tab_text (t, 3, row, TAB_LEFT | TAB_FIX,
+ fmt_to_string (&f->format, fmt_string));
+ }
+
+ tab_title (t, ngettext ("Reading %d record from %s.",
+ "Reading %d records from %s.",
+ parser->records_per_case),
+ parser->records_per_case, fh_get_name (fh));
+ tab_submit (t);
+}
+
+/* Displays a table giving information on free-format variable parsing
+ on DATA LIST. */
+static void
+dump_delimited_table (const struct data_parser *parser,
+ const struct file_handle *fh)
+{
+ struct tab_table *t;
+ size_t i;
+
+ t = tab_create (2, parser->field_cnt + 1, 0);
+ tab_columns (t, TAB_COL_DOWN, 1);
+ tab_headers (t, 0, 0, 1, 0);
+ tab_text (t, 0, 0, TAB_CENTER | TAT_TITLE, _("Variable"));
+ tab_text (t, 1, 0, TAB_CENTER | TAT_TITLE, _("Format"));
+ tab_box (t, TAL_1, TAL_1, TAL_0, TAL_1, 0, 0, 1, parser->field_cnt);
+ tab_hline (t, TAL_2, 0, 1, 1);
+ tab_dim (t, tab_natural_dimensions);
+
+ for (i = 0; i < parser->field_cnt; i++)
+ {
+ struct field *f = &parser->fields[i];
+ char str[FMT_STRING_LEN_MAX + 1];
+ int row = i + 1;
+
+ tab_text (t, 0, row, TAB_LEFT, f->name);
+ tab_text (t, 1, row, TAB_LEFT | TAB_FIX,
+ fmt_to_string (&f->format, str));
+ }
+
+ tab_title (t, _("Reading free-form data from %s."), fh_get_name (fh));
+
+ tab_submit (t);
+}
+
+/* Displays a table giving information on how PARSER will read
+ data from FH. */
+void
+data_parser_output_description (struct data_parser *parser,
+ const struct file_handle *fh)
+{
+ if (parser->type == DP_FIXED)
+ dump_fixed_table (parser, fh);
+ else
+ dump_delimited_table (parser, fh);
+}
+\f
+/* Data parser input program. */
+struct data_parser_casereader
+ {
+ struct data_parser *parser; /* Parser. */
+ struct dfm_reader *reader; /* Data file reader. */
+ size_t value_cnt; /* Number of `union value's in case. */
+ };
+
+static const struct casereader_class data_parser_casereader_class;
+
+/* Replaces DS's active file by an input program that reads data
+ from READER according to the rules in PARSER, using DICT as
+ the underlying dictionary. Ownership of PARSER and READER is
+ transferred to the input program, and ownership of DICT is
+ transferred to the dataset. */
+void
+data_parser_make_active_file (struct data_parser *parser, struct dataset *ds,
+ struct dfm_reader *reader,
+ struct dictionary *dict)
+{
+ struct data_parser_casereader *r;
+ struct casereader *casereader;
+
+ r = xmalloc (sizeof *r);
+ r->parser = parser;
+ r->reader = reader;
+ r->value_cnt = dict_get_next_value_idx (dict);
+ casereader = casereader_create_sequential (NULL, r->value_cnt,
+ -1, &data_parser_casereader_class,
+ r);
+ proc_set_active_file (ds, casereader, dict);
+}
+
+static bool
+data_parser_casereader_read (struct casereader *reader UNUSED, void *r_,
+ struct ccase *c)
+{
+ struct data_parser_casereader *r = r_;
+ bool ok;
+
+ case_create (c, r->value_cnt);
+ ok = data_parser_parse (r->parser, r->reader, c);
+ if (!ok)
+ case_destroy (c);
+ return ok;
+}
+
+static void
+data_parser_casereader_destroy (struct casereader *reader UNUSED, void *r_)
+{
+ struct data_parser_casereader *r = r_;
+ if (dfm_reader_error (r->reader))
+ casereader_force_error (reader);
+ data_parser_destroy (r->parser);
+ dfm_close_reader (r->reader);
+ free (r);
+}
+
+static const struct casereader_class data_parser_casereader_class =
+ {
+ data_parser_casereader_read,
+ data_parser_casereader_destroy,
+ NULL,
+ NULL,
+ };
--- /dev/null
+/* PSPP - a program for statistical analysis.
+ Copyright (C) 2007 Free Software Foundation, Inc.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>. */
+
+#ifndef LANGUAGE_DATA_IO_DATA_PARSER_H
+#define LANGUAGE_DATA_IO_DATA_PARSER_H
+
+/* Abstraction of a DATA LIST or GET DATA TYPE=TXT data parser. */
+
+#include <stdbool.h>
+#include <data/case.h>
+
+struct dataset;
+struct dfm_reader;
+struct dictionary;
+struct file_handle;
+struct fmt_spec;
+struct substring;
+
+/* Type of data read by a data parser. */
+enum data_parser_type
+ {
+ DP_FIXED, /* Fields in fixed column positions. */
+ DP_DELIMITED /* Fields delimited by e.g. commas. */
+ };
+
+/* Creating and configuring any parser. */
+struct data_parser *data_parser_create (void);
+void data_parser_destroy (struct data_parser *);
+
+enum data_parser_type data_parser_get_type (const struct data_parser *);
+void data_parser_set_type (struct data_parser *, enum data_parser_type);
+
+void data_parser_set_skip (struct data_parser *, int initial_records_to_skip);
+void data_parser_set_case_limit (struct data_parser *, casenumber max_cases);
+void data_parser_set_case_percent (struct data_parser *, int case_percent);
+
+/* For configuring delimited parsers only. */
+bool data_parser_get_span (const struct data_parser *);
+void data_parser_set_span (struct data_parser *, bool may_cases_span_records);
+
+void data_parser_set_empty_line_has_field (struct data_parser *,
+ bool empty_line_has_field);
+void data_parser_set_quotes (struct data_parser *, struct substring);
+void data_parser_set_soft_delimiters (struct data_parser *, struct substring);
+void data_parser_set_hard_delimiters (struct data_parser *, struct substring);
+
+/* For configuring fixed parsers only. */
+int data_parser_get_records (const struct data_parser *);
+void data_parser_set_records (struct data_parser *, int records_per_case);
+
+/* Field setup and parsing. */
+void data_parser_add_delimited_field (struct data_parser *,
+ const struct fmt_spec *, int fv,
+ const char *name);
+void data_parser_add_fixed_field (struct data_parser *,
+ const struct fmt_spec *, int fv,
+ const char *name,
+ int record, int first_column);
+bool data_parser_any_fields (const struct data_parser *);
+bool data_parser_parse (struct data_parser *,
+ struct dfm_reader *, struct ccase *);
+
+/* Uses for a configured parser. */
+void data_parser_output_description (struct data_parser *,
+ const struct file_handle *);
+void data_parser_make_active_file (struct data_parser *, struct dataset *,
+ struct dfm_reader *, struct dictionary *);
+
+#endif /* language/data-io/data-parser.h */
#include <config.h>
+#include <stdlib.h>
-#include <libpspp/message.h>
#include <data/gnumeric-reader.h>
+#include <data/dictionary.h>
+#include <data/format.h>
+#include <data/procedure.h>
#include <language/command.h>
+#include <language/data-io/data-parser.h>
+#include <language/data-io/data-reader.h>
+#include <language/data-io/file-handle.h>
+#include <language/data-io/placement-parser.h>
+#include <language/lexer/format-parser.h>
#include <language/lexer/lexer.h>
-#include <stdlib.h>
-#include <data/procedure.h>
+#include <libpspp/message.h>
#include "gettext.h"
#define _(msgid) gettext (msgid)
#define N_(msgid) (msgid)
static int parse_get_gnm (struct lexer *lexer, struct dataset *);
+static int parse_get_txt (struct lexer *lexer, struct dataset *);
int
cmd_get_data (struct lexer *lexer, struct dataset *ds)
if (lex_match_id (lexer, "GNM"))
return parse_get_gnm (lexer, ds);
+ else if (lex_match_id (lexer, "TXT"))
+ return parse_get_txt (lexer, ds);
msg (SE, _("Unsupported TYPE %s"), lex_tokid (lexer));
return CMD_FAILURE;
free (gri.cell_range);
return CMD_FAILURE;
}
+
+static bool
+set_type (struct data_parser *parser, const char *subcommand,
+ enum data_parser_type type, bool *has_type)
+{
+ if (!*has_type)
+ {
+ data_parser_set_type (parser, type);
+ *has_type = true;
+ }
+ else if (type != data_parser_get_type (parser))
+ {
+ msg (SE, _("%s is allowed only with %s arrangement, but %s arrangement "
+ "was stated or implied earlier in this command."),
+ subcommand,
+ type == DP_FIXED ? "FIXED" : "DELIMITED",
+ type == DP_FIXED ? "DELIMITED" : "FIXED");
+ return false;
+ }
+ return true;
+}
+
+static int
+parse_get_txt (struct lexer *lexer, struct dataset *ds)
+{
+ struct data_parser *parser = NULL;
+ struct dictionary *dict = NULL;
+ struct file_handle *fh = NULL;
+ struct dfm_reader *reader = NULL;
+
+ int record;
+ enum data_parser_type type;
+ bool has_type;
+
+ lex_force_match (lexer, '/');
+
+ if (!lex_force_match_id (lexer, "FILE"))
+ goto error;
+ lex_force_match (lexer, '=');
+ fh = fh_parse (lexer, FH_REF_FILE | FH_REF_INLINE);
+ if (fh == NULL)
+ goto error;
+
+ parser = data_parser_create ();
+ has_type = false;
+ data_parser_set_type (parser, DP_DELIMITED);
+ data_parser_set_span (parser, false);
+ data_parser_set_quotes (parser, ss_empty ());
+ data_parser_set_empty_line_has_field (parser, true);
+
+ for (;;)
+ {
+ if (!lex_force_match (lexer, '/'))
+ goto error;
+
+ if (lex_match_id (lexer, "ARRANGEMENT"))
+ {
+ bool ok;
+
+ lex_match (lexer, '=');
+ if (lex_match_id (lexer, "FIXED"))
+ ok = set_type (parser, "ARRANGEMENT=FIXED", DP_FIXED, &has_type);
+ else if (lex_match_id (lexer, "DELIMITED"))
+ ok = set_type (parser, "ARRANGEMENT=DELIMITED",
+ DP_DELIMITED, &has_type);
+ else
+ {
+ lex_error (lexer, _("expecting FIXED or DELIMITED"));
+ goto error;
+ }
+ if (!ok)
+ goto error;
+ }
+ else if (lex_match_id (lexer, "FIRSTCASE"))
+ {
+ lex_match (lexer, '=');
+ if (!lex_force_int (lexer))
+ goto error;
+ if (lex_integer (lexer) < 1)
+ {
+ msg (SE, _("Value of FIRSTCASE must be 1 or greater."));
+ goto error;
+ }
+ data_parser_set_skip (parser, lex_integer (lexer) - 1);
+ lex_get (lexer);
+ }
+ else if (lex_match_id_n (lexer, "DELCASE", 4))
+ {
+ if (!set_type (parser, "DELCASE", DP_DELIMITED, &has_type))
+ goto error;
+ lex_match (lexer, '=');
+ if (lex_match_id (lexer, "LINE"))
+ data_parser_set_span (parser, false);
+ else if (lex_match_id (lexer, "VARIABLES"))
+ {
+ data_parser_set_span (parser, true);
+
+ /* VARIABLES takes an integer argument, but for no
+ good reason. We just ignore it. */
+ if (!lex_force_int (lexer))
+ goto error;
+ lex_get (lexer);
+ }
+ else
+ {
+ lex_error (lexer, _("expecting LINE or VARIABLES"));
+ goto error;
+ }
+ }
+ else if (lex_match_id (lexer, "FIXCASE"))
+ {
+ if (!set_type (parser, "FIXCASE", DP_FIXED, &has_type))
+ goto error;
+ lex_match (lexer, '=');
+ if (!lex_force_int (lexer))
+ goto error;
+ if (lex_integer (lexer) < 1)
+ {
+ msg (SE, _("Value of FIXCASE must be at least 1."));
+ goto error;
+ }
+ data_parser_set_records (parser, lex_integer (lexer));
+ lex_get (lexer);
+ }
+ else if (lex_match_id (lexer, "IMPORTCASES"))
+ {
+ lex_match (lexer, '=');
+ if (lex_match (lexer, T_ALL))
+ {
+ data_parser_set_case_limit (parser, -1);
+ data_parser_set_case_percent (parser, 100);
+ }
+ else if (lex_match_id (lexer, "FIRST"))
+ {
+ if (!lex_force_int (lexer))
+ goto error;
+ if (lex_integer (lexer) < 1)
+ {
+ msg (SE, _("Value of FIRST must be at least 1."));
+ goto error;
+ }
+ data_parser_set_case_limit (parser, lex_integer (lexer));
+ lex_get (lexer);
+ }
+ else if (lex_match_id (lexer, "PERCENT"))
+ {
+ if (!lex_force_int (lexer))
+ goto error;
+ if (lex_integer (lexer) < 1 || lex_integer (lexer) > 100)
+ {
+ msg (SE, _("Value of PERCENT must be between 1 and 100."));
+ goto error;
+ }
+ data_parser_set_case_percent (parser, lex_integer (lexer));
+ lex_get (lexer);
+ }
+ }
+ else if (lex_match_id_n (lexer, "DELIMITERS", 4))
+ {
+ struct string hard_seps = DS_EMPTY_INITIALIZER;
+ const char *soft_seps = "";
+ struct substring s;
+ int c;
+
+ if (!set_type (parser, "DELIMITERS", DP_DELIMITED, &has_type))
+ goto error;
+ lex_match (lexer, '=');
+
+ if (!lex_force_string (lexer))
+ goto error;
+
+ s = ds_ss (lex_tokstr (lexer));
+ if (ss_match_string (&s, ss_cstr ("\\t")))
+ ds_put_cstr (&hard_seps, "\t");
+ if (ss_match_string (&s, ss_cstr ("\\\\")))
+ ds_put_cstr (&hard_seps, "\\");
+ while ((c = ss_get_char (&s)) != EOF)
+ if (c == ' ')
+ soft_seps = " ";
+ else
+ ds_put_char (&hard_seps, c);
+ data_parser_set_soft_delimiters (parser, ss_cstr (soft_seps));
+ data_parser_set_hard_delimiters (parser, ds_ss (&hard_seps));
+ ds_destroy (&hard_seps);
+
+ lex_get (lexer);
+ }
+ else if (lex_match_id (lexer, "QUALIFIER"))
+ {
+ if (!set_type (parser, "QUALIFIER", DP_DELIMITED, &has_type))
+ goto error;
+ lex_match (lexer, '=');
+
+ if (!lex_force_string (lexer))
+ goto error;
+
+ data_parser_set_quotes (parser, ds_ss (lex_tokstr (lexer)));
+ lex_get (lexer);
+ }
+ else if (lex_match_id (lexer, "VARIABLES"))
+ break;
+ else
+ {
+ lex_error (lexer, _("expecting VARIABLES"));
+ goto error;
+ }
+ }
+ lex_match (lexer, '=');
+
+ dict = dict_create ();
+ record = 1;
+ type = data_parser_get_type (parser);
+ do
+ {
+ char name[VAR_NAME_LEN + 1];
+ struct fmt_spec input, output;
+ int fc, lc;
+ struct variable *v;
+
+ while (type == DP_FIXED && lex_match (lexer, '/'))
+ {
+ if (!lex_force_int (lexer))
+ goto error;
+ if (lex_integer (lexer) < record)
+ {
+ msg (SE, _("The record number specified, %ld, is at or "
+ "before the previous record, %d. Data "
+ "fields must be listed in order of "
+ "increasing record number."),
+ lex_integer (lexer), record);
+ goto error;
+ }
+ if (lex_integer (lexer) > data_parser_get_records (parser))
+ {
+ msg (SE, _("The record number specified, %ld, exceeds "
+ "the number of records per case specified "
+ "on FIXCASE, %d."),
+ lex_integer (lexer), data_parser_get_records (parser));
+ goto error;
+ }
+ record = lex_integer (lexer);
+ lex_get (lexer);
+ }
+
+ if (!lex_force_id (lexer))
+ goto error;
+ strcpy (name, lex_tokid (lexer));
+ lex_get (lexer);
+
+ if (type == DP_DELIMITED)
+ {
+ if (!parse_format_specifier (lexer, &input)
+ || !fmt_check_input (&input))
+ goto error;
+ }
+ else
+ {
+ if (!parse_column_range (lexer, 0, &fc, &lc, NULL))
+ goto error;
+ if (!parse_format_specifier_name (lexer, &input.type))
+ goto error;
+ input.w = lc - fc + 1;
+ input.d = 0;
+ if (!fmt_check_input (&input))
+ goto error;
+ }
+ output = fmt_for_output_from_input (&input);
+
+ v = dict_create_var (dict, name, fmt_var_width (&input));
+ if (v == NULL)
+ {
+ msg (SE, _("%s is a duplicate variable name."), name);
+ goto error;
+ }
+ var_set_both_formats (v, &output);
+
+ if (type == DP_DELIMITED)
+ data_parser_add_delimited_field (parser, &input,
+ var_get_case_index (v),
+ name);
+ else
+ data_parser_add_fixed_field (parser, &input, var_get_case_index (v),
+ name, record, fc);
+ }
+ while (lex_token (lexer) != '.');
+
+ reader = dfm_open_reader (fh, lexer);
+ if (reader == NULL)
+ goto error;
+
+ data_parser_make_active_file (parser, ds, reader, dict);
+ fh_unref (fh);
+ return CMD_SUCCESS;
+
+ error:
+ data_parser_destroy (parser);
+ dict_destroy (dict);
+ fh_unref (fh);
+ return CMD_CASCADING_FAILURE;
+}
+2007-12-04 Ben Pfaff <blp@gnu.org>
+
+ * automake.mk: Add new tests.
+
+ * command/get-data-txt.sh: New test.
+
+ * command/get-data-txt-examples.sh: New test.
+
+ * command/get-data-txt-importcases.sh: New test.
+
2007-11-25 Ben Pfaff <blp@gnu.org>
* bugs/compression.sh: Don't fail on big-endian system. Partial
tests/command/file-handle.sh \
tests/command/filter.sh \
tests/command/flip.sh \
+ tests/command/get-data-txt.sh \
+ tests/command/get-data-txt-examples.sh \
+ tests/command/get-data-txt-importcases.sh \
tests/command/import-export.sh \
tests/command/input-program.sh \
tests/command/insert.sh \
--- /dev/null
+#!/bin/sh
+
+# This program tests the examples for GET DATA/TYPE=TXT given in the
+# PSPP manual.
+
+TEMPDIR=/tmp/pspp-tst-$$
+
+# ensure that top_builddir are absolute
+if [ -z "$top_builddir" ] ; then top_builddir=. ; fi
+if [ -z "$top_srcdir" ] ; then top_srcdir=. ; fi
+top_builddir=`cd $top_builddir; pwd`
+PSPP=$top_builddir/src/ui/terminal/pspp
+
+# ensure that top_srcdir is absolute
+top_srcdir=`cd $top_srcdir; pwd`
+
+STAT_CONFIG_PATH=$top_srcdir/config
+export STAT_CONFIG_PATH
+
+LANG=C
+export LANG
+
+cleanup()
+{
+ cd /
+ rm -rf $TEMPDIR
+}
+
+
+fail()
+{
+ echo $activity
+ echo FAILED
+ cleanup;
+ exit 1;
+}
+
+
+no_result()
+{
+ echo $activity
+ echo NO RESULT;
+ cleanup;
+ exit 2;
+}
+
+pass()
+{
+ cleanup;
+ exit 0;
+}
+
+mkdir -p $TEMPDIR
+
+cd $TEMPDIR
+
+activity="create passwd.data"
+cat > passwd.data <<'EOF'
+root:$1$nyeSP5gD$pDq/:0:0:,,,:/root:/bin/bash
+blp:$1$BrP/pFg4$g7OG:1000:1000:Ben Pfaff,,,:/home/blp:/bin/bash
+john:$1$JBuq/Fioq$g4A:1001:1001:John Darrington,,,:/home/john:/bin/bash
+jhs:$1$D3li4hPL$88X1:1002:1002:Jason Stover,,,:/home/jhs:/bin/csh
+EOF
+if [ $? -ne 0 ] ; then no_result ; fi
+
+activity="create cars.data"
+cat > cars.data <<'EOF'
+model year mileage price type age
+Civic 2002 29883 15900 Si 2
+Civic 2003 13415 15900 EX 1
+Civic 1992 107000 3800 n/a 12
+Accord 2002 26613 17900 EX 1
+EOF
+if [ $? -ne 0 ] ; then no_result ; fi
+
+activity="create pets.data"
+cat > pets.data <<'EOF'
+"Pet Name", "Age", "Color", "Date Received", "Price", "Needs Walking", "Type"
+, (Years), , , (Dollars), ,
+"Rover", 4.5, Brown, "12 Feb 2004", 80, True, "Dog"
+"Charlie", , Gold, "5 Apr 2007", 12.3, False, "Fish"
+"Molly", 2, Black, "12 Dec 2006", 25, False, "Cat"
+"Gilly", , White, "10 Apr 2007", 10, False, "Guinea Pig"
+EOF
+if [ $? -ne 0 ] ; then no_result ; fi
+
+activity="create test.pspp"
+cat > test.pspp <<'EOF'
+GET DATA /TYPE=TXT /FILE='passwd.data' /DELIMITERS=':'
+ /VARIABLES=username A20
+ password A40
+ uid F10
+ gid F10
+ gecos A40
+ home A40
+ shell A40.
+LIST.
+
+GET DATA /TYPE=TXT /FILE='cars.data' /DELIMITERS=' ' /FIRSTCASE=2
+ /VARIABLES=model A8
+ year F4
+ mileage F6
+ price F5
+ type A4
+ age F2.
+LIST.
+
+GET DATA /TYPE=TXT /FILE='cars.data' /ARRANGEMENT=FIXED /FIRSTCASE=2
+ /VARIABLES=model 0-7 A
+ year 8-15 F
+ mileage 16-23 F
+ price 24-31 F
+ type 32-39 A
+ age 40-47 F.
+LIST.
+
+GET DATA /TYPE=TXT /FILE='pets.data' /DELIMITERS=', ' /QUALIFIER='"'
+ /FIRSTCASE=3
+ /VARIABLES=name A10
+ age F3.1
+ color A5
+ received EDATE10
+ price F5.2
+ needs_walking a5
+ type a10.
+LIST.
+EOF
+if [ $? -ne 0 ] ; then no_result ; fi
+
+activity="run test"
+$SUPERVISOR $PSPP --testing-mode test.pspp
+if [ $? -ne 0 ] ; then no_result ; fi
+
+activity="compare test results"
+perl -pi -e 's/^\s*$//g' $TEMPDIR/pspp.list
+diff -b $TEMPDIR/pspp.list - <<'EOF'
+ username password uid gid gecos home shell
+-------------------- ---------------------------------------- ---------- ---------- ---------------------------------------- ---------------------------------------- ----------------------------------------
+root $1$nyeSP5gD$pDq/ 0 0 ,,, /root /bin/bash
+blp $1$BrP/pFg4$g7OG 1000 1000 Ben Pfaff,,, /home/blp /bin/bash
+john $1$JBuq/Fioq$g4A 1001 1001 John Darrington,,, /home/john /bin/bash
+jhs $1$D3li4hPL$88X1 1002 1002 Jason Stover,,, /home/jhs /bin/csh
+ model year mileage price type age
+-------- ---- ------- ----- ---- ---
+Civic 2002 29883 15900 Si 2
+Civic 2003 13415 15900 EX 1
+Civic 1992 107000 3800 n/a 12
+Accord 2002 26613 17900 EX 1
+ model year mileage price type age
+-------- -------- -------- -------- -------- --------
+Civic 2002 29883 15900 Si 2
+Civic 2003 13415 15900 EX 1
+Civic 1992 107000 3800 n/a 12
+Accord 2002 26613 17900 EX 1
+ name age color received price needs_walking type
+---------- ---- ----- ---------- ------ ------------- ----------
+Rover 4.5 Brown 12.02.2004 80.00 True Dog
+Charlie . Gold 05.04.2007 12.30 False Fish
+Molly 2.0 Black 12.12.2006 25.00 False Cat
+Gilly . White 10.04.2007 10.00 False Guinea Pig
+EOF
+if [ $? -ne 0 ] ; then fail ; fi
+
+
+
+pass
--- /dev/null
+#!/bin/sh
+
+# This program tests the IMPORTCASES feature of GET DATA /TYPE=TXT.
+
+TEMPDIR=/tmp/pspp-tst-$$
+TESTFILE=$TEMPDIR/`basename $0`.sps
+: ${PERL:=perl}
+
+# ensure that top_builddir are absolute
+if [ -z "$top_builddir" ] ; then top_builddir=. ; fi
+if [ -z "$top_srcdir" ] ; then top_srcdir=. ; fi
+top_builddir=`cd $top_builddir; pwd`
+PSPP=$top_builddir/src/ui/terminal/pspp
+
+# ensure that top_srcdir is absolute
+top_srcdir=`cd $top_srcdir; pwd`
+
+STAT_CONFIG_PATH=$top_srcdir/config
+export STAT_CONFIG_PATH
+
+
+cleanup()
+{
+ cd /
+ rm -rf $TEMPDIR
+}
+
+
+fail()
+{
+ echo $activity
+ echo FAILED
+ cleanup;
+ exit 1;
+}
+
+
+no_result()
+{
+ echo $activity
+ echo NO RESULT;
+ cleanup;
+ exit 2;
+}
+
+pass()
+{
+ cleanup;
+ exit 0;
+}
+
+mkdir -p $TEMPDIR
+
+cd $TEMPDIR
+
+activity="create data file using Perl"
+$PERL > test.data <<'EOF'
+for ($i = 1; $i <= 100; $i++) {
+ printf "%02d\n", $i;
+}
+EOF
+if [ $? -ne 0 ] ; then no_result ; fi
+
+# Create command file.
+activity="create program"
+cat > $TESTFILE << EOF
+get data /type=txt /file='test.data' /importcases=first 10 /variables x f8.0.
+list.
+
+get data /type=txt /file='test.data' /importcases=percent 1 /variables x f8.0.
+list.
+
+get data /type=txt /file='test.data' /importcases=percent 35 /variables x f8.0.
+list.
+
+get data /type=txt /file='test.data' /importcases=percent 95 /variables x f8.0.
+list.
+
+get data /type=txt /file='test.data' /importcases=percent 100 /variables x f8.0.
+list.
+EOF
+if [ $? -ne 0 ] ; then no_result ; fi
+
+activity="run program"
+$SUPERVISOR $PSPP --testing-mode $TESTFILE
+if [ $? -ne 0 ] ; then fail ; fi
+
+activity="compare output"
+perl -pi -e 's/^\s*$//g' $TEMPDIR/pspp.list
+diff -b $TEMPDIR/pspp.list - << EOF
+ x
+--------
+ 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+ 10
+ x
+--------
+ 1
+ 2
+ x
+--------
+ 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+ 10
+ 11
+ 12
+ 13
+ 14
+ 15
+ 16
+ 17
+ 18
+ 19
+ 20
+ 21
+ 22
+ 23
+ 24
+ 25
+ 26
+ 27
+ 28
+ 29
+ 30
+ 31
+ 32
+ 33
+ 34
+ 35
+ 36
+ x
+--------
+ 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+ 10
+ 11
+ 12
+ 13
+ 14
+ 15
+ 16
+ 17
+ 18
+ 19
+ 20
+ 21
+ 22
+ 23
+ 24
+ 25
+ 26
+ 27
+ 28
+ 29
+ 30
+ 31
+ 32
+ 33
+ 34
+ 35
+ 36
+ 37
+ 38
+ 39
+ 40
+ 41
+ 42
+ 43
+ 44
+ 45
+ 46
+ 47
+ 48
+ 49
+ 50
+ 51
+ 52
+ 53
+ 54
+ 55
+ 56
+ 57
+ 58
+ 59
+ 60
+ 61
+ 62
+ 63
+ 64
+ 65
+ 66
+ 67
+ 68
+ 69
+ 70
+ 71
+ 72
+ 73
+ 74
+ 75
+ 76
+ 77
+ 78
+ 79
+ 80
+ 81
+ 82
+ 83
+ 84
+ 85
+ 86
+ 87
+ 88
+ 89
+ 90
+ 91
+ 92
+ 93
+ 94
+ 95
+ 96
+ x
+--------
+ 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+ 10
+ 11
+ 12
+ 13
+ 14
+ 15
+ 16
+ 17
+ 18
+ 19
+ 20
+ 21
+ 22
+ 23
+ 24
+ 25
+ 26
+ 27
+ 28
+ 29
+ 30
+ 31
+ 32
+ 33
+ 34
+ 35
+ 36
+ 37
+ 38
+ 39
+ 40
+ 41
+ 42
+ 43
+ 44
+ 45
+ 46
+ 47
+ 48
+ 49
+ 50
+ 51
+ 52
+ 53
+ 54
+ 55
+ 56
+ 57
+ 58
+ 59
+ 60
+ 61
+ 62
+ 63
+ 64
+ 65
+ 66
+ 67
+ 68
+ 69
+ 70
+ 71
+ 72
+ 73
+ 74
+ 75
+ 76
+ 77
+ 78
+ 79
+ 80
+ 81
+ 82
+ 83
+ 84
+ 85
+ 86
+ 87
+ 88
+ 89
+ 90
+ 91
+ 92
+ 93
+ 94
+ 95
+ 96
+ 97
+ 98
+ 99
+ 100
+EOF
+if [ $? -ne 0 ] ; then fail ; fi
+
+pass;
--- /dev/null
+#!/bin/sh
+
+# This program tests features of GET DATA /TYPE=TXT input program that
+# it has in common with DATA LIST, using tests drawn from
+# data-list.sh.
+
+TEMPDIR=/tmp/pspp-tst-$$
+TESTFILE=$TEMPDIR/`basename $0`.sps
+
+# ensure that top_builddir are absolute
+if [ -z "$top_builddir" ] ; then top_builddir=. ; fi
+if [ -z "$top_srcdir" ] ; then top_srcdir=. ; fi
+top_builddir=`cd $top_builddir; pwd`
+PSPP=$top_builddir/src/ui/terminal/pspp
+
+# ensure that top_srcdir is absolute
+top_srcdir=`cd $top_srcdir; pwd`
+
+STAT_CONFIG_PATH=$top_srcdir/config
+export STAT_CONFIG_PATH
+
+
+cleanup()
+{
+ cd /
+ rm -rf $TEMPDIR
+}
+
+
+fail()
+{
+ echo $activity
+ echo FAILED
+ cleanup;
+ exit 1;
+}
+
+
+no_result()
+{
+ echo $activity
+ echo NO RESULT;
+ cleanup;
+ exit 2;
+}
+
+pass()
+{
+ cleanup;
+ exit 0;
+}
+
+mkdir -p $TEMPDIR
+
+cd $TEMPDIR
+
+# Create command file.
+activity="create program"
+cat > $TESTFILE << EOF
+get data /type=txt /file=inline /delimiters="|X"
+ /variables=A f7.2 B f7.2 C f7.2 D f7.2.
+begin data.
+1|23X45|2.03
+2X22|34|23|
+3|34|34X34
+end data.
+
+list.
+
+get data /type=txt /file=inline /delimiters=', ' /delcase=variables 4
+ /firstcase=2 /variables=A f7.2 B f7.2 C f7.2 D f7.2.
+begin data.
+# This record is ignored.
+,1,2,3
+,4,,5
+6
+7,
+8 9
+0,1,,,
+,,,,
+2
+
+3
+4
+5
+end data.
+list.
+
+get data /type=txt /file=inline /delimiters='\t' /delcase=variables 4
+ /firstcase=3 /variables=A f7.2 B f7.2 C f7.2 D f7.2.
+begin data.
+# These records
+# are skipped.
+1 2 3 4
+1 2 3
+1 2 4
+1 2
+1 3 4
+1 3
+1 4
+1
+ 2 3 4
+ 2 3
+ 2 4
+ 2
+ 3 4
+ 3
+ 4
+
+end data.
+list.
+
+get data /type=txt /file=inline /arrangement=fixed /fixcase=3 /variables=
+ /1 start 0-19 adate
+ /2 end 0-19 adate
+ /3 count 0-2 f.
+begin data.
+07-22-2007
+10-06-2007
+321
+07-14-1789
+08-26-1789
+4
+01-01-1972
+12-31-1999
+682
+end data.
+list.
+
+get data /type=txt /file=inline /arrangement=fixed /fixcase=2 /variables=
+ /1 x 0 f
+ y 1 f.
+begin data.
+12
+
+34
+
+56
+
+78
+
+90
+
+end data.
+list.
+EOF
+if [ $? -ne 0 ] ; then no_result ; fi
+
+
+activity="run program"
+$SUPERVISOR $PSPP --testing-mode $TESTFILE
+if [ $? -ne 0 ] ; then fail ; fi
+
+activity="compare output"
+perl -pi -e 's/^\s*$//g' $TEMPDIR/pspp.list
+diff -b $TEMPDIR/pspp.list - << EOF
+ A B C D
+-------- -------- -------- --------
+ 1.00 23.00 45.00 2.03
+ 2.00 22.00 34.00 23.00
+ 3.00 34.00 34.00 34.00
+ A B C D
+-------- -------- -------- --------
+ . 1.00 2.00 3.00
+ . 4.00 . 5.00
+ 6.00 7.00 . 8.00
+ 9.00 .00 1.00 .
+ . . . .
+ . . . 2.00
+ . 3.00 4.00 5.00
+ A B C D
+-------- -------- -------- --------
+ 1.00 2.00 3.00 4.00
+ 1.00 2.00 3.00 .
+ 1.00 2.00 . 4.00
+ 1.00 2.00 . .
+ 1.00 . 3.00 4.00
+ 1.00 . 3.00 .
+ 1.00 . . 4.00
+ 1.00 . . .
+ . 2.00 3.00 4.00
+ . 2.00 3.00 .
+ . 2.00 . 4.00
+ . 2.00 . .
+ . . 3.00 4.00
+ . . 3.00 .
+ . . . 4.00
+ . . . .
+ start end count
+-------------------- -------------------- -----
+ 07/22/2007 10/06/2007 321
+ 07/14/1789 08/26/1789 4
+ 01/01/1972 12/31/1999 682
+x y
+- -
+1 2
+3 4
+5 6
+7 8
+9 0
+EOF
+if [ $? -ne 0 ] ; then fail ; fi
+
+pass;