From: Ben Pfaff Date: Sat, 31 Jan 2009 05:03:34 +0000 (-0800) Subject: Accept LF, CR LF, and LF as new-line sequences in data files. X-Git-Tag: v0.7.2~4 X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?p=pspp-builds.git;a=commitdiff_plain;h=70c074eb400ced7ee3ca4e01b701d6a30b75d174 Accept LF, CR LF, and LF as new-line sequences in data files. Until now, PSPP has used the host operating system's idea of the new-line sequence when reading data files and other text files. This means that, when a file with CR LF line ends is read on an OS that uses LF as new-line (e.g. an MS-DOS file on Unix), each line appears to have a CR at the the end. This commit fixes the problem, by normalizing the new-line sequence at time of reading. This commit eliminates a performance optimization from ds_read_line(), because the getdelim() function that it used cannot be made to stop reading at one of two different delimiters. If this causes a real performance regression, then the getndelim2 function from gnulib could be used to restore the optimization. Also adds a test to make sure that it works. Thanks to RĂ©mi Dewitte for pointing out the problem and providing an initial patch (which solved the problem in a completely different way from this commit). --- diff --git a/src/libpspp/str.c b/src/libpspp/str.c index d0826720..f054c9ef 100644 --- a/src/libpspp/str.c +++ b/src/libpspp/str.c @@ -1,5 +1,5 @@ /* PSPP - a program for statistical analysis. - Copyright (C) 1997-9, 2000, 2006 Free Software Foundation, Inc. + Copyright (C) 1997-9, 2000, 2006, 2009 Free Software Foundation, Inc. This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by @@ -1190,48 +1190,42 @@ ds_cstr (const struct string *st_) return st->ss.string; } -/* Appends to ST a newline-terminated line read from STREAM, but - no more than MAX_LENGTH characters. - Newline is the last character of ST on return, if encountering - a newline was the reason for terminating. - Returns true if at least one character was read from STREAM - and appended to ST, false if no characters at all were read - before an I/O error or end of file was encountered (or - MAX_LENGTH was 0). */ +/* Reads characters from STREAM and appends them to ST, stopping + after MAX_LENGTH characters, after appending a newline, or + after an I/O error or end of file was encountered, whichever + comes first. Returns true if at least one character was added + to ST, false if no characters were read before an I/O error or + end of file (or if MAX_LENGTH was 0). + + This function accepts LF, CR LF, and CR sequences as new-line, + and translates each of them to a single '\n' new-line + character in ST. */ bool ds_read_line (struct string *st, FILE *stream, size_t max_length) { - if (!st->ss.length && max_length == SIZE_MAX) - { - size_t capacity = st->capacity ? st->capacity + 1 : 0; - ssize_t n = getline (&st->ss.string, &capacity, stream); - if (capacity) - st->capacity = capacity - 1; - if (n > 0) - { - st->ss.length = n; - return true; - } - else - return false; - } - else + size_t length; + + for (length = 0; length < max_length; length++) { - size_t length; + int c = getc (stream); + if (c == EOF) + break; - for (length = 0; length < max_length; length++) + if (c == '\r') { - int c = getc (stream); - if (c == EOF) - break; - - ds_put_char (st, c); - if (c == '\n') - return true; + c = getc (stream); + if (c != '\n') + { + ungetc (c, stream); + c = '\n'; + } } - - return length > 0; + ds_put_char (st, c); + if (c == '\n') + return true; } + + return length > 0; } /* Removes a comment introduced by `#' from ST, diff --git a/tests/automake.mk b/tests/automake.mk index 00f3a126..d6659e84 100644 --- a/tests/automake.mk +++ b/tests/automake.mk @@ -35,6 +35,7 @@ dist_TESTS = \ tests/command/input-program.sh \ tests/command/insert.sh \ tests/command/lag.sh \ + tests/command/line-ends.sh \ tests/command/list.sh \ tests/command/loop.sh \ tests/command/longvars.sh \ diff --git a/tests/command/line-ends.sh b/tests/command/line-ends.sh new file mode 100755 index 00000000..ca03b5ca --- /dev/null +++ b/tests/command/line-ends.sh @@ -0,0 +1,104 @@ +#!/bin/sh + +# This program tests that DATA LIST can be used to read input files +# with varying line ends (LF only, CR LF, CR only). + +TEMPDIR=/tmp/pspp-tst-$$ +TESTFILE=$TEMPDIR/`basename $0`.sps + +# ensure that top_builddir are absolute +if [ -z "$top_builddir" ] ; then top_builddir=. ; fi +if [ -z "$top_srcdir" ] ; then top_srcdir=. ; fi +top_builddir=`cd $top_builddir; pwd` +PSPP=$top_builddir/src/ui/terminal/pspp + +# ensure that top_srcdir is absolute +top_srcdir=`cd $top_srcdir; pwd` + +STAT_CONFIG_PATH=$top_srcdir/config +export STAT_CONFIG_PATH + + +cleanup() +{ + if [ x"$PSPP_TEST_NO_CLEANUP" != x ] ; then + echo "NOT cleaning $TEMPDIR" + return ; + fi + cd / + rm -rf $TEMPDIR +} + + +fail() +{ + echo $activity + echo FAILED + cleanup; + exit 1; +} + + +no_result() +{ + echo $activity + echo NO RESULT; + cleanup; + exit 2; +} + +pass() +{ + cleanup; + exit 0; +} + +mkdir -p $TEMPDIR + +cd $TEMPDIR + +# Create command file. +activity="create program" +cat > $TESTFILE << EOF +data list list notable file='input.txt'/a b c. +list. +EOF +if [ $? -ne 0 ] ; then no_result ; fi + + +activity="create input.txt" +printf '1 2 3\n4 5 6\r\n7 8 9\r10 11 12\n13 14 15 \r\n16 17 18\r' > input.txt +if [ $? -ne 0 ] ; then no_result ; fi + + +# Make sure that input.txt actually received the data that we expect. +# It might not have, if we're running on a system that translates \n +# into some other sequence. +activity="check input.txt" +cksum input.txt > input.cksum +diff input.cksum - <