From: Ben Pfaff Date: Sat, 23 Apr 2011 03:21:57 +0000 (-0700) Subject: ascii: Add support for multibyte characters. X-Git-Tag: v0.7.8~53 X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=14b3603043;p=pspp-builds.git ascii: Add support for multibyte characters. This commit modifies render.at, changing hyphens to non-breaking hyphens. This change is only to ensure that the output for the tests in render.at are the same afterward. Without these changes, these tests wrap these tables differently, because they break after the hyphens; before, only spaces were considered valid breakpoints. Bug #31478. --- diff --git a/NEWS b/NEWS index 2f8238ed..010ae06e 100644 --- a/NEWS +++ b/NEWS @@ -28,6 +28,10 @@ Changes from 0.7.3 to 0.7.7: * The INCLUDE and INSERT commands now support the ENCODING subcommand to specify the encoding for the included syntax file. + * The plain text output driver now properly supports multibyte UTF-8 + characters, including double-width characters and combining + accents. + * Strings may now include arbitrary Unicode code points specified in hexadecimal, using the syntax U'hhhh'. For example, Unicode code point U+1D11E, the musical G clef character, may be expressed as diff --git a/doc/invoking.texi b/doc/invoking.texi index 4c4f74aa..b37d5956 100644 --- a/doc/invoking.texi +++ b/doc/invoking.texi @@ -273,6 +273,8 @@ specify @option{-o @var{file}} on the PSPP command line, optionally followed by options from the table below to customize the output format. +Plain text output is encoded in UTF-8. + @table @code @item -O format=txt Specify the output format. This is only necessary if the file name @@ -319,26 +321,11 @@ the page length. Default: @code{0}. Length of the bottom margin, in lines. PSPP subtracts this value from the page length. Default: @code{0}. -@item -O box[@var{line-type}]=@var{box-chars} -Sets the characters used for lines in tables. @var{line-type} is a -4-digit number that indicates the type of line to change, in the order -`right', `bottom', `left', `top'. Each digit is 0 for ``no line'', 1 -for a single line, and 2 for a double line. @var{box-chars} is the -character or string of characters to use for this type of line. - -For example, @code{box[0101]="|"} sets @samp{|} as the character to -use for a single-width vertical line, and @code{box[1100]="\xda"} sets -@samp{"\xda"}, which on MS-DOS is suitable for the top-left corner of -a box, as the character for the intersection of two single-width -lines, one each from the right and bottom. - -The defaults use @samp{-}, @samp{|}, and @samp{+} for single-width -lines and @samp{=} and @samp{#} for double-width lines. - -@item -O init=@var{init-string} -If set, this string is written at the beginning of each output file. -It can be used to initialize device features, e.g.@: to enable VT100 -line-drawing characters. +@item -O box=@{ascii|unicode@} +Sets the characters used for lines in tables. The default, +@code{ascii}, uses @samp{-}, @samp{|}, and @samp{+} for single-width +lines and @samp{=} and @samp{#} for double-width lines. Specify +@code{unicode} to use Unicode box drawing characters. @item -O emphasis=@{none|bold|underline@} How to emphasize text. Bold and underline emphasis are achieved with diff --git a/src/libpspp/str.c b/src/libpspp/str.c index 08a85ad7..c630cf9a 100644 --- a/src/libpspp/str.c +++ b/src/libpspp/str.c @@ -1455,6 +1455,30 @@ ds_put_uninit (struct string *st, size_t incr) return end; } +/* Moves the bytes in ST following offset OFS + OLD_LEN in ST to offset OFS + + NEW_LEN and returns the byte at offset OFS. The first min(OLD_LEN, NEW_LEN) + bytes at the returned position are unchanged; if NEW_LEN > OLD_LEN then the + following NEW_LEN - OLD_LEN bytes are initially indeterminate. + + The intention is that the caller should write NEW_LEN bytes at the returned + position, to effectively replace the OLD_LEN bytes previously at that + position. */ +char * +ds_splice_uninit (struct string *st, + size_t ofs, size_t old_len, size_t new_len) +{ + if (new_len != old_len) + { + if (new_len > old_len) + ds_extend (st, ds_length (st) + (new_len - old_len)); + memmove (ds_data (st) + (ofs + new_len), + ds_data (st) + (ofs + old_len), + ds_length (st) - (ofs + old_len)); + st->ss.length += new_len - old_len; + } + return ds_data (st) + ofs; +} + /* Formats FORMAT as a printf string and appends the result to ST. */ void ds_put_format (struct string *st, const char *format, ...) diff --git a/src/libpspp/str.h b/src/libpspp/str.h index cf4888ed..7c3ef3d5 100644 --- a/src/libpspp/str.h +++ b/src/libpspp/str.h @@ -231,6 +231,9 @@ void ds_put_format (struct string *, const char *, ...) PRINTF_FORMAT (2, 3); char *ds_put_uninit (struct string *st, size_t incr); +char *ds_splice_uninit (struct string *, size_t ofs, size_t old_len, + size_t new_len); + /* Other */ /* calls relocate from gnulib on ST */ void ds_relocate (struct string *st); diff --git a/src/output/ascii.c b/src/output/ascii.c index c7c17116..93e79e2e 100644 --- a/src/output/ascii.c +++ b/src/output/ascii.c @@ -1,5 +1,5 @@ /* PSPP - a program for statistical analysis. - Copyright (C) 1997-9, 2000, 2007, 2009, 2010 Free Software Foundation, Inc. + Copyright (C) 1997-9, 2000, 2007, 2009, 2010, 2011 Free Software Foundation, Inc. This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by @@ -23,6 +23,9 @@ #include #include #include +#include +#include +#include #include "data/file-name.h" #include "data/settings.h" @@ -33,6 +36,7 @@ #include "libpspp/start-date.h" #include "libpspp/string-map.h" #include "libpspp/version.h" +#include "output/ascii.h" #include "output/cairo.h" #include "output/chart-item-provider.h" #include "output/driver-provider.h" @@ -54,34 +58,81 @@ #define H TABLE_HORZ #define V TABLE_VERT -/* Line styles bit shifts. */ -enum +#define N_BOX (RENDER_N_LINES * RENDER_N_LINES \ + * RENDER_N_LINES * RENDER_N_LINES) + +static const ucs4_t ascii_box_chars[N_BOX] = { - LNS_TOP = 0, - LNS_LEFT = 2, - LNS_BOTTOM = 4, - LNS_RIGHT = 6, + ' ', '|', '#', + '-', '+', '#', + '=', '#', '#', + '|', '|', '#', + '+', '+', '#', + '#', '#', '#', + '#', '#', '#', + '#', '#', '#', + '#', '#', '#', + '-', '+', '#', + '-', '+', '#', + '#', '#', '#', + '+', '+', '#', + '+', '+', '#', + '#', '#', '#', + '#', '#', '#', + '#', '#', '#', + '#', '#', '#', + '=', '#', '#', + '#', '#', '#', + '=', '#', '#', + '#', '#', '#', + '#', '#', '#', + '#', '#', '#', + '#', '#', '#', + '#', '#', '#', + '#', '#', '#', + }; - LNS_COUNT = 256 +static const ucs4_t unicode_box_chars[N_BOX] = + { + 0x0020, 0x2575, 0x2551, + 0x2574, 0x256f, 0x255c, + 0x2550, 0x255b, 0x255d, + 0x2577, 0x2502, 0x2551, + 0x256e, 0x2524, 0x2562, + 0x2555, 0x2561, 0x2563, + 0x2551, 0x2551, 0x2551, + 0x2556, 0x2562, 0x2562, + 0x2557, 0x2563, 0x2563, + 0x2576, 0x2570, 0x2559, + 0x2500, 0x2534, 0x2568, + 0x2550, 0x2567, 0x2569, + 0x256d, 0x251c, 0x255f, + 0x252c, 0x253c, 0x256a, + 0x2564, 0x256a, 0x256c, + 0x2553, 0x255f, 0x255f, + 0x2565, 0x256b, 0x256b, + 0x2566, 0x256c, 0x256c, + 0x2550, 0x2558, 0x255a, + 0x2550, 0x2567, 0x2569, + 0x2550, 0x2567, 0x2569, + 0x2552, 0x255e, 0x2560, + 0x2564, 0x256a, 0x256c, + 0x2564, 0x256a, 0x256c, + 0x2554, 0x2560, 0x2560, + 0x2566, 0x256c, 0x256c, }; static inline int make_box_index (int left, int right, int top, int bottom) { - return ((left << LNS_LEFT) | (right << LNS_RIGHT) - | (top << LNS_TOP) | (bottom << LNS_BOTTOM)); + return ((right * 3 + bottom) * 3 + left) * 3 + top; } -/* Character attributes. */ -#define ATTR_EMPHASIS 0x100 /* Bold-face. */ -#define ATTR_BOX 0x200 /* Line drawing character. */ - /* A line of text. */ struct ascii_line { - unsigned short *chars; /* Characters and attributes. */ - int n_chars; /* Length. */ - int allocated_chars; /* Allocated "chars" elements. */ + struct string s; /* Content, in UTF-8. */ + size_t width; /* Display width, in character positions. */ }; /* How to emphasize text. */ @@ -113,8 +164,7 @@ struct ascii_driver int top_margin; /* Top margin in lines. */ int bottom_margin; /* Bottom margin in lines. */ - char *box[LNS_COUNT]; /* Line & box drawing characters. */ - char *init; /* Device initialization string. */ + const ucs4_t *box; /* Line & box drawing characters. */ /* Internal state. */ char *command_name; @@ -136,7 +186,6 @@ static void ascii_submit (struct output_driver *, const struct output_item *); static int vertical_margins (const struct ascii_driver *); -static const char *get_default_box (int right, int bottom, int left, int top); static bool update_page_size (struct ascii_driver *, bool issue_error); static int parse_page_size (struct driver_option *); @@ -171,10 +220,10 @@ static struct output_driver * ascii_create (const char *file_name, enum settings_output_devices device_type, struct string_map *o) { + enum { BOX_ASCII, BOX_UNICODE } box; struct output_driver *d; struct ascii_driver *a; int paper_length; - int right, bottom, left, top; a = xzalloc (sizeof *a); d = &a->driver; @@ -200,20 +249,11 @@ ascii_create (const char *file_name, enum settings_output_devices device_type, a->auto_length = paper_length < 0; a->length = paper_length - vertical_margins (a); - for (right = 0; right < 4; right++) - for (bottom = 0; bottom < 4; bottom++) - for (left = 0; left < 4; left++) - for (top = 0; top < 4; top++) - { - int indx = make_box_index (left, right, top, bottom); - const char *default_value; - char name[16]; - - sprintf (name, "box[%d%d%d%d]", right, bottom, left, top); - default_value = get_default_box (right, bottom, left, top); - a->box[indx] = parse_string (opt (d, o, name, default_value)); - } - a->init = parse_string (opt (d, o, "init", "")); + box = parse_enum (opt (d, o, "box", "ascii"), + "ascii", BOX_ASCII, + "unicode", BOX_UNICODE, + NULL_SENTINEL); + a->box = box == BOX_ASCII ? ascii_box_chars : unicode_box_chars; a->command_name = NULL; a->title = xstrdup (""); @@ -236,29 +276,6 @@ error: return NULL; } -static const char * -get_default_box (int right, int bottom, int left, int top) -{ - switch ((top << 12) | (left << 8) | (bottom << 4) | (right << 0)) - { - case 0x0000: - return " "; - - case 0x0100: case 0x0101: case 0x0001: - return "-"; - - case 0x1000: case 0x1010: case 0x0010: - return "|"; - - case 0x0300: case 0x0303: case 0x0003: - case 0x0200: case 0x0202: case 0x0002: - return "="; - - default: - return left > 1 || top > 1 || right > 1 || bottom > 1 ? "#" : "+"; - } -} - static int parse_page_size (struct driver_option *option) { @@ -342,11 +359,8 @@ ascii_destroy (struct output_driver *driver) free (a->subtitle); free (a->file_name); free (a->chart_file_name); - for (i = 0; i < LNS_COUNT; i++) - free (a->box[i]); - free (a->init); for (i = 0; i < a->allocated_lines; i++) - free (a->lines[i].chars); + ds_destroy (&a->lines[i].s); free (a->lines); free (a); } @@ -570,7 +584,8 @@ static const struct output_driver_class ascii_driver_class = ascii_flush, }; -static void ascii_expand_line (struct ascii_driver *, int y, int length); +static char *ascii_reserve (struct ascii_driver *, int y, int x0, int x1, + int n); static void ascii_layout_cell (struct ascii_driver *, const struct table_cell *, int bb[TABLE_N_AXES][2], @@ -582,24 +597,31 @@ ascii_draw_line (void *a_, int bb[TABLE_N_AXES][2], enum render_line_style styles[TABLE_N_AXES][2]) { struct ascii_driver *a = a_; - unsigned short int value; - int x1, y1; + char mbchar[6]; + int x0, x1, y1; + ucs4_t uc; + int mblen; int x, y; /* Clip to the page. */ if (bb[H][0] >= a->width || bb[V][0] + a->y >= a->length) return; + x0 = bb[H][0]; x1 = MIN (bb[H][1], a->width); y1 = MIN (bb[V][1] + a->y, a->length); /* Draw. */ - value = ATTR_BOX | make_box_index (styles[V][0], styles[V][1], - styles[H][0], styles[H][1]); + uc = a->box[make_box_index (styles[V][0], styles[V][1], + styles[H][0], styles[H][1])]; + mblen = u8_uctomb (CHAR_CAST (uint8_t *, mbchar), uc, 6); for (y = bb[V][0] + a->y; y < y1; y++) { - ascii_expand_line (a, y, x1); - for (x = bb[H][0]; x < x1; x++) - a->lines[y].chars[x] = value; + char *p = ascii_reserve (a, y, x0, x1, mblen * (x1 - x0)); + for (x = x0; x < x1; x++) + { + memcpy (p, mbchar, mblen); + p += mblen; + } } } @@ -655,31 +677,140 @@ ascii_draw_cell (void *a_, const struct table_cell *cell, ascii_layout_cell (a, cell, bb, clip, &w, &h); } -/* Ensures that at least the first LENGTH characters of line Y in - ascii driver A have been cleared out. */ +static int +u8_mb_to_display (int *wp, const uint8_t *s, size_t n) +{ + size_t ofs; + ucs4_t uc; + int w; + + ofs = u8_mbtouc (&uc, s, n); + if (ofs < n && s[ofs] == '\b') + { + ofs++; + ofs += u8_mbtouc (&uc, s + ofs, n - ofs); + } + + w = uc_width (uc, "UTF-8"); + if (w <= 0) + { + *wp = 0; + return ofs; + } + + while (ofs < n) + { + int mblen = u8_mbtouc (&uc, s + ofs, n - ofs); + if (uc_width (uc, "UTF-8") > 0) + break; + ofs += mblen; + } + + *wp = w; + return ofs; +} + +struct ascii_pos + { + int x0; + int x1; + size_t ofs0; + size_t ofs1; + }; + static void -ascii_expand_line (struct ascii_driver *a, int y, int length) +find_ascii_pos (struct ascii_line *line, int target_x, struct ascii_pos *c) +{ + const uint8_t *s = CHAR_CAST (const uint8_t *, ds_cstr (&line->s)); + size_t length = ds_length (&line->s); + size_t ofs; + int mblen; + int x; + + x = 0; + for (ofs = 0; ; ofs += mblen) + { + int w; + + mblen = u8_mb_to_display (&w, s + ofs, length - ofs); + if (x + w > target_x) + { + c->x0 = x; + c->x1 = x + w; + c->ofs0 = ofs; + c->ofs1 = ofs + mblen; + return; + } + x += w; + } +} + +static char * +ascii_reserve (struct ascii_driver *a, int y, int x0, int x1, int n) { struct ascii_line *line = &a->lines[y]; - if (line->n_chars < length) + + if (x0 >= line->width) + { + /* The common case: adding new characters at the end of a line. */ + ds_put_byte_multiple (&line->s, ' ', x0 - line->width); + line->width = x1; + return ds_put_uninit (&line->s, n); + } + else if (x0 == x1) + return NULL; + else { - int x; - if (line->allocated_chars < length) + /* An unusual case: overwriting characters in the middle of a line. We + don't keep any kind of mapping from bytes to display positions, so we + have to iterate over the whole line starting from the beginning. */ + struct ascii_pos p0, p1; + char *s; + + /* Find the positions of the first and last character. We must find the + both characters' positions before changing the line, because that + would prevent finding the other character's position. */ + find_ascii_pos (line, x0, &p0); + if (x1 < line->width) + find_ascii_pos (line, x1, &p1); + + /* If a double-width character occupies both x0 - 1 and x0, then replace + its first character width by '?'. */ + s = ds_data (&line->s); + while (p0.x0 < x0) + { + s[p0.ofs0++] = '?'; + p0.x0++; + } + + if (x1 >= line->width) { - line->allocated_chars = MAX (length, MIN (length * 2, a->width)); - line->chars = xnrealloc (line->chars, line->allocated_chars, - sizeof *line->chars); + ds_truncate (&line->s, p0.ofs0); + line->width = x1; + return ds_put_uninit (&line->s, n); } - for (x = line->n_chars; x < length; x++) - line->chars[x] = ' '; - line->n_chars = length; + + /* If a double-width character occupies both x1 - 1 and x1, then we need + to replace its second character width by '?'. */ + if (p1.x0 < x1) + { + do + { + s[--p1.ofs1] = '?'; + p1.x0++; + } + while (p1.x0 < x1); + return ds_splice_uninit (&line->s, p0.ofs0, p1.ofs1 - p0.ofs0, n); + } + + return ds_splice_uninit (&line->s, p0.ofs0, p1.ofs0 - p0.ofs0, n); } } static void -text_draw (struct ascii_driver *a, const struct table_cell *cell, +text_draw (struct ascii_driver *a, unsigned int options, int bb[TABLE_N_AXES][2], int clip[TABLE_N_AXES][2], - int y, const char *string, int n) + int y, const uint8_t *string, int n, size_t width) { int x0 = MAX (0, clip[H][0]); int y0 = MAX (0, clip[V][0] + a->y); @@ -691,97 +822,243 @@ text_draw (struct ascii_driver *a, const struct table_cell *cell, if (y < y0 || y >= y1) return; - switch (cell->options & TAB_ALIGNMENT) + switch (options & TAB_ALIGNMENT) { case TAB_LEFT: x = bb[H][0]; break; case TAB_CENTER: - x = (bb[H][0] + bb[H][1] - n + 1) / 2; + x = (bb[H][0] + bb[H][1] - width + 1) / 2; break; case TAB_RIGHT: - x = bb[H][1] - n; + x = bb[H][1] - width; break; default: NOT_REACHED (); } + if (x >= x1) + return; + + while (x < x0) + { + ucs4_t uc; + int mblen; + int w; + + if (n == 0) + return; + mblen = u8_mbtouc (&uc, string, n); + + string += mblen; + n -= mblen; + + w = uc_width (uc, "UTF-8"); + if (w > 0) + { + x += w; + width -= w; + } + } + if (n == 0) + return; - if (x0 > x) + if (x + width > x1) { - n -= x0 - x; - if (n <= 0) + int ofs; + + ofs = width = 0; + for (ofs = 0; ofs < n; ) + { + ucs4_t uc; + int mblen; + int w; + + mblen = u8_mbtouc (&uc, string + ofs, n - ofs); + + w = uc_width (uc, "UTF-8"); + if (w > 0) + { + if (width + w > x1 - x) + break; + width += w; + } + ofs += mblen; + } + n = ofs; + if (n == 0) return; - string += x0 - x; - x = x0; } - if (x + n >= x1) - n = x1 - x; - if (n > 0) + if (!(options & TAB_EMPH) || a->emphasis == EMPH_NONE) + memcpy (ascii_reserve (a, y, x, x + width, n), string, n); + else { - int attr = cell->options & TAB_EMPH ? ATTR_EMPHASIS : 0; - size_t i; + size_t n_out; + size_t ofs; + char *out; + int mblen; + + /* First figure out how many bytes need to be inserted. */ + n_out = n; + for (ofs = 0; ofs < n; ofs += mblen) + { + ucs4_t uc; + int w; + + mblen = u8_mbtouc (&uc, string + ofs, n - ofs); + w = uc_width (uc, "UTF-8"); + + if (w > 0) + n_out += a->emphasis == EMPH_UNDERLINE ? 2 : 1 + mblen; + } - ascii_expand_line (a, y, x + n); - for (i = 0; i < n; i++) - a->lines[y].chars[x + i] = string[i] | attr; + /* Then insert them. */ + out = ascii_reserve (a, y, x, x + width, n_out); + for (ofs = 0; ofs < n; ofs += mblen) + { + ucs4_t uc; + int w; + + mblen = u8_mbtouc (&uc, string + ofs, n - ofs); + w = uc_width (uc, "UTF-8"); + + if (w > 0) + { + if (a->emphasis == EMPH_UNDERLINE) + *out++ = '_'; + else + out = mempcpy (out, string + ofs, mblen); + *out++ = '\b'; + } + out = mempcpy (out, string + ofs, mblen); + } } } static void ascii_layout_cell (struct ascii_driver *a, const struct table_cell *cell, int bb[TABLE_N_AXES][2], int clip[TABLE_N_AXES][2], - int *width, int *height) + int *widthp, int *heightp) { - size_t length = strlen (cell->contents); - int y, pos; + const char *text = cell->contents; + size_t length = strlen (text); + char *breaks; + int bb_width; + size_t pos; + int y; + + *widthp = 0; + *heightp = 0; + if (length == 0) + return; + + text = cell->contents; + breaks = xmalloc (length + 1); + u8_possible_linebreaks (CHAR_CAST (const uint8_t *, text), length, + "UTF-8", breaks); + breaks[length] = (breaks[length - 1] == UC_BREAK_MANDATORY + ? UC_BREAK_PROHIBITED : UC_BREAK_POSSIBLE); - *width = 0; pos = 0; + bb_width = bb[H][1] - bb[H][0]; for (y = bb[V][0]; y < bb[V][1] && pos < length; y++) { - const char *line = &cell->contents[pos]; - const char *new_line; - size_t line_len; - - /* Find line length without considering word wrap. */ - line_len = MIN (bb[H][1] - bb[H][0], length - pos); - new_line = memchr (line, '\n', line_len); - if (new_line != NULL) - line_len = new_line - line; - - /* Word wrap. */ - if (pos + line_len < length) + const uint8_t *line = CHAR_CAST (const uint8_t *, text + pos); + const char *b = breaks + pos; + size_t n = length - pos; + + size_t last_break_ofs = 0; + int last_break_width = 0; + int width = 0; + size_t ofs; + + for (ofs = 0; ofs < n; ) { - size_t space_len = line_len; - while (space_len > 0 && !isspace ((unsigned char) line[space_len])) - space_len--; - if (space_len > 0) - line_len = space_len; - else + ucs4_t uc; + int mblen; + int w; + + mblen = u8_mbtouc (&uc, line + ofs, n - ofs); + if (b[ofs] == UC_BREAK_MANDATORY) + break; + else if (b[ofs] == UC_BREAK_POSSIBLE) + { + last_break_ofs = ofs; + last_break_width = width; + } + + w = uc_width (uc, "UTF-8"); + if (w > 0) + { + if (width + w > bb_width) + { + if (isspace (line[ofs])) + break; + else if (last_break_ofs != 0) + { + ofs = last_break_ofs; + width = last_break_width; + break; + } + } + width += w; + } + ofs += mblen; + } + if (b[ofs] != UC_BREAK_MANDATORY) + { + while (ofs > 0 && isspace (line[ofs - 1])) { - while (pos + line_len < length - && !isspace ((unsigned char) line[line_len])) - line_len++; + ofs--; + width--; } } - if (line_len > *width) - *width = line_len; + if (width > *widthp) + *widthp = width; /* Draw text. */ - text_draw (a, cell, bb, clip, y, line, line_len); + text_draw (a, cell->options, bb, clip, y, line, ofs, width); /* Next line. */ - pos += line_len; - if (pos < length && isspace ((unsigned char) cell->contents[pos])) + pos += ofs; + if (ofs < n && isspace (line[ofs])) pos++; + } - *height = y - bb[V][0]; + *heightp = y - bb[V][0]; + + free (breaks); +} + +void +ascii_test_write (struct output_driver *driver, + const char *s, int x, int y, unsigned int options) +{ + struct ascii_driver *a = ascii_driver_cast (driver); + struct table_cell cell; + int bb[TABLE_N_AXES][2]; + int width, height; + + if (a->file == NULL && !ascii_open_page (a)) + return; + a->y = 0; + + memset (&cell, 0, sizeof cell); + cell.contents = s; + cell.options = options | TAB_LEFT; + + bb[TABLE_HORZ][0] = x; + bb[TABLE_HORZ][1] = a->width; + bb[TABLE_VERT][0] = y; + bb[TABLE_VERT][1] = a->length; + + ascii_layout_cell (a, &cell, bb, bb, &width, &height); + + a->y = 1; } /* ascii_close_page () and support routines. */ - #if HAVE_DECL_SIGWINCH static struct ascii_driver *the_driver; @@ -818,8 +1095,6 @@ ascii_open_page (struct ascii_driver *a) sigaction (SIGWINCH, &action, NULL); } #endif - if (a->init != NULL) - fputs (a->init, a->file); } else { @@ -838,56 +1113,20 @@ ascii_open_page (struct ascii_driver *a) for (i = a->allocated_lines; i < a->length; i++) { struct ascii_line *line = &a->lines[i]; - line->chars = NULL; - line->allocated_chars = 0; + ds_init_empty (&line->s); + line->width = 0; } a->allocated_lines = a->length; } for (i = 0; i < a->length; i++) - a->lines[i].n_chars = 0; - - return true; -} - -/* Writes LINE to A's output file. */ -static void -output_line (struct ascii_driver *a, const struct ascii_line *line) -{ - size_t length; - size_t i; - - length = line->n_chars; - while (length > 0 && line->chars[length - 1] == ' ') - length--; - - for (i = 0; i < length; i++) { - int attribute = line->chars[i] & (ATTR_BOX | ATTR_EMPHASIS); - int ch = line->chars[i] & ~(ATTR_BOX | ATTR_EMPHASIS); - - switch (attribute) - { - case ATTR_BOX: - fputs (a->box[ch], a->file); - break; - - case ATTR_EMPHASIS: - if (a->emphasis == EMPH_BOLD) - fprintf (a->file, "%c\b%c", ch, ch); - else if (a->emphasis == EMPH_UNDERLINE) - fprintf (a->file, "_\b%c", ch); - else - putc (ch, a->file); - break; - - default: - putc (ch, a->file); - break; - } + struct ascii_line *line = &a->lines[i]; + ds_clear (&line->s); + line->width = 0; } - putc ('\n', a->file); + return true; } static void @@ -946,7 +1185,7 @@ ascii_close_page (struct ascii_driver *a) { struct ascii_line *line = &a->lines[y]; - if (a->squeeze_blank_lines && y > 0 && line->n_chars == 0) + if (a->squeeze_blank_lines && y > 0 && line->width == 0) any_blank = true; else { @@ -956,7 +1195,10 @@ ascii_close_page (struct ascii_driver *a) any_blank = false; } - output_line (a, line); + while (ds_chomp_byte (&line->s, ' ')) + continue; + fwrite (ds_data (&line->s), 1, ds_length (&line->s), a->file); + putc ('\n', a->file); } } if (!a->squeeze_blank_lines) diff --git a/src/output/ascii.h b/src/output/ascii.h new file mode 100644 index 00000000..5d324d84 --- /dev/null +++ b/src/output/ascii.h @@ -0,0 +1,25 @@ +/* PSPP - a program for statistical analysis. + Copyright (C) 2011 Free Software Foundation, Inc. + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see . */ + +#ifndef ASCII_H +#define ASCII_H 1 + +struct output_driver; + +void ascii_test_write (struct output_driver *, + const char *s, int x, int y, unsigned int options); + +#endif /* ascii.h */ diff --git a/tests/automake.mk b/tests/automake.mk index b261a8cd..c9c50b07 100644 --- a/tests/automake.mk +++ b/tests/automake.mk @@ -376,6 +376,7 @@ TESTSUITE_AT = \ tests/libpspp/u8-istream.at \ tests/math/moments.at \ tests/math/randist.at \ + tests/output/ascii.at \ tests/output/charts.at \ tests/output/output.at \ tests/output/paper-size.at \ diff --git a/tests/output/ascii.at b/tests/output/ascii.at new file mode 100644 index 00000000..3c9243e7 --- /dev/null +++ b/tests/output/ascii.at @@ -0,0 +1,541 @@ +AT_BANNER([ASCII driver -- rendering corner cases]) + +AT_SETUP([ASCII driver overwriting single-width text]) +AT_KEYWORDS([render rendering]) +AT_DATA([input], [dnl +## overwriting rest of line +# plain +0 0 0 abc +1 0 0 BCD +# emphasized over plain +0 1 0 efg +1 1 1 FGH +# plain over emphasized +0 2 1 ijk +1 2 0 JKL +# emphasized over emphasized +0 3 1 mno +1 3 1 NOP + +## overwriting partial line +# plain +0 5 0 abcdef +0 5 0 A +2 5 0 CDE +# emphasized over plain +0 6 0 ghijkl +0 6 1 G +2 6 1 IJK +# plain over emphasized +0 7 1 mnopqr +0 7 0 M +2 7 0 OPQ +# emphasized over emphasized +0 8 1 stuvwx +0 8 1 S +2 8 1 UVW + +## overwriting rest of line with double-width characters +# plain +0 10 0 kakiku +2 10 0 きくけ +# emphasized over plain +0 11 0 kakiku +2 11 1 きくけ +# plain over emphasized +0 12 1 kakiku +2 12 0 きくけ +# emphasized over emphasized +0 13 1 kakiku +2 13 1 きくけ + +## overwriting partial line with double-width characters +# plain +0 15 0 kakikukeko +0 15 0 か +4 15 0 くけ +# emphasized over plain +0 16 0 kakikukeko +0 16 1 か +4 16 1 くけ +# plain over emphasized +0 17 1 kakikukeko +0 17 0 か +4 17 0 くけ +# emphasized over emphasized +0 18 1 kakikukeko +0 18 1 か +4 18 1 くけ +]) +AT_CHECK([render-test --draw-mode --emph=none input], [0], [dnl +aBCD +eFGH +iJKL +mNOP + +AbCDEf +GhIJKl +MnOPQr +StUVWx + +kaきくけ +kaきくけ +kaきくけ +kaきくけ + +かkiくけko +かkiくけko +かkiくけko +かkiくけko +]) +AT_CHECK([render-test --draw-mode --emph=bold input], [0], [dnl +aBCD +eFFGGHH +iiJKL +mmNNOOPP + +AbCDEf +GGhIIJJKKl +MnnOPQrr +SSttUUVVWWxx + +kaきくけ +kaききくくけけ +kkaaきくけ +kkaaききくくけけ + +かkiくけko +かかkiくくけけko +かkkiiくけkkoo +かかkkiiくくけけkkoo +]) +AT_CHECK([render-test --draw-mode --emph=underline input], [0], [dnl +aBCD +e_F_G_H +_iJKL +_m_N_O_P + +AbCDEf +_Gh_I_J_Kl +M_nOPQ_r +_S_t_U_V_W_x + +kaきくけ +ka_き_く_け +_k_aきくけ +_k_a_き_く_け + +かkiくけko +_かki_く_けko +か_k_iくけ_k_o +_か_k_i_く_け_k_o +]) +AT_CLEANUP + +AT_SETUP([ASCII driver overwriting double-width text]) +AT_KEYWORDS([render rendering]) +AT_DATA([input], [dnl +## overwrite rest of line, aligned double-width over double-width +# plain +0 0 0 あいう +2 0 0 きくけ +# emphasized over plain +0 1 0 あいう +2 1 1 きくけ +# plain over emphasized +0 2 1 あいう +2 2 0 きくけ +# emphasized over emphasized +0 3 1 あいう +2 3 1 きくけ + +## overwrite rest of line, misaligned double-width over double-width +# plain +0 5 0 あいう +3 5 0 きくけ +# emphasized over plain +0 6 0 あいう +3 6 1 きくけ +# plain over emphasized +0 7 1 あいう +3 7 0 きくけ +# emphasized over emphasized +0 8 1 あいう +3 8 1 きくけ + +## overwrite partial line, aligned double-width over double-width +# plain +0 10 0 あいうえお +0 10 0 か +4 10 0 くけ +# emphasized over plain +0 11 0 あいうえお +0 11 1 か +4 11 1 くけ +# plain over emphasized +0 12 1 あいうえお +0 12 0 か +4 12 0 くけ +# emphasized over emphasized +0 13 1 あいうえお +0 13 1 か +4 13 1 くけ + +## overwrite partial line, misaligned double-width over double-width +# plain +0 15 0 あいうえおさ +1 15 0 か +5 15 0 くけ +# emphasized over plain +0 16 0 あいうえおさ +1 16 1 か +5 16 1 くけ +# plain over emphasized +0 17 1 あいうえおさ +1 17 0 か +5 17 0 くけ +# emphasized over emphasized +0 18 1 あいうえおさ +1 18 1 か +5 18 1 くけ + +## overwrite rest of line, aligned single-width over double-width +# plain +0 20 0 あいう +2 20 0 kikuko +# emphasized over plain +0 21 0 あいう +2 21 1 kikuko +# plain over emphasized +0 22 1 あいう +2 22 0 kikuko +# emphasized over emphasized +0 23 1 あいう +2 23 1 kikuko + +## overwrite rest of line, misaligned single-width over double-width +# plain +0 25 0 あいう +3 25 0 kikuko +# emphasized over plain +0 26 0 あいう +3 26 1 kikuko +# plain over emphasized +0 27 1 あいう +3 27 0 kikuko +# emphasized over emphasized +0 28 1 あいう +3 28 1 kikuko + +## overwrite partial line, aligned single-width over double-width +# plain +0 30 0 あいうえお +0 30 0 ka +4 30 0 kuke +# emphasized over plain +0 31 0 あいうえお +0 31 1 ka +4 31 1 kuke +# plain over emphasized +0 32 1 あいうえお +0 32 0 ka +4 32 0 kuke +# emphasized over emphasized +0 33 1 あいうえお +0 33 1 ka +4 33 1 kuke + +## overwrite partial line, misaligned single-width over double-width +# plain +0 35 0 あいうえおさ +1 35 0 a +5 35 0 kuke +# emphasized over plain +0 36 0 あいうえおさ +1 36 1 a +5 36 1 kuke +# plain over emphasized +0 37 1 あいうえおさ +1 37 0 a +5 37 0 kuke +# emphasized over emphasized +0 38 1 あいうえおさ +1 38 1 a +5 38 1 kuke +]) +AT_CHECK([render-test --draw-mode --emph=none input], [0], [dnl +あきくけ +あきくけ +あきくけ +あきくけ + +あ?きくけ +あ?きくけ +あ?きくけ +あ?きくけ + +かいくけお +かいくけお +かいくけお +かいくけお + +?か??くけ?さ +?か??くけ?さ +?か??くけ?さ +?か??くけ?さ + +あkikuko +あkikuko +あkikuko +あkikuko + +あ?kikuko +あ?kikuko +あ?kikuko +あ?kikuko + +kaいkukeお +kaいkukeお +kaいkukeお +kaいkukeお + +?aい?kuke?さ +?aい?kuke?さ +?aい?kuke?さ +?aい?kuke?さ +]) +AT_CHECK([render-test --draw-mode --emph=bold input], [0], [dnl +あきくけ +あききくくけけ +ああきくけ +ああききくくけけ + +あ?きくけ +あ?ききくくけけ +ああ?きくけ +ああ?ききくくけけ + +かいくけお +かかいくくけけお +かいいくけおお +かかいいくくけけおお + +?か??くけ?さ +?かか??くくけけ?さ +?か??くけ?ささ +?かか??くくけけ?ささ + +あkikuko +あkkiikkuukkoo +ああkikuko +ああkkiikkuukkoo + +あ?kikuko +あ?kkiikkuukkoo +ああ?kikuko +ああ?kkiikkuukkoo + +kaいkukeお +kkaaいkkuukkeeお +kaいいkukeおお +kkaaいいkkuukkeeおお + +?aい?kuke?さ +?aaい?kkuukkee?さ +?aいい?kuke?ささ +?aaいい?kkuukkee?ささ +]) +AT_CHECK([render-test --draw-mode --emph=underline input], [0], [dnl +あきくけ +あ_き_く_け +_あきくけ +_あ_き_く_け + +あ?きくけ +あ?_き_く_け +_あ?きくけ +_あ?_き_く_け + +かいくけお +_かい_く_けお +か_いくけ_お +_か_い_く_け_お + +?か??くけ?さ +?_か??_く_け?さ +?か??くけ?_さ +?_か??_く_け?_さ + +あkikuko +あ_k_i_k_u_k_o +_あkikuko +_あ_k_i_k_u_k_o + +あ?kikuko +あ?_k_i_k_u_k_o +_あ?kikuko +_あ?_k_i_k_u_k_o + +kaいkukeお +_k_aい_k_u_k_eお +ka_いkuke_お +_k_a_い_k_u_k_e_お + +?aい?kuke?さ +?_aい?_k_u_k_e?さ +?a_い?kuke?_さ +?_a_い?_k_u_k_e?_さ +]) +AT_CLEANUP + +AT_SETUP([ASCII driver overwriting combining characters]) +AT_KEYWORDS([render rendering]) +AT_DATA([input], [dnl +## overwriting rest of line, ordinary over combining +# plain +0 0 0 àéî +1 0 0 xyz +# emphasized over plain +0 1 0 àéî +1 1 1 xyz +# plain over emphasized +0 2 1 àéî +1 2 0 xyz +# emphasized over emphasized +0 3 1 àéî +1 3 1 xyz + +## overwriting rest of line, combining over ordinary +# plain +0 5 0 xyz +1 5 0 àéî +# emphasized over plain +0 6 0 xyz +1 6 1 àéî +# plain over emphasized +0 7 1 xyz +1 7 0 àéî +# emphasized over emphasized +0 8 1 xyz +1 8 1 àéî + +## overwriting partial line, ordinary over combining +# plain +0 10 0 àéîo̧ũẙ +0 10 0 a +2 10 0 iou +# emphasized over plain +0 11 0 àéîo̧ũẙ +0 11 1 a +2 11 1 iou +# plain over emphasized +0 12 1 àéîo̧ũẙ +0 12 0 a +2 12 0 iou +# emphasized over emphasized +0 13 1 àéîo̧ũẙ +0 13 1 a +2 13 1 iou + +## overwriting partial line, combining over ordinary +# plain +0 15 0 aeiouy +0 15 0 à +2 15 0 îo̧ũ +# emphasized over plain +0 16 0 aeiouy +0 16 1 à +2 16 1 îo̧ũ +# plain over emphasized +0 17 1 aeiouy +0 17 0 à +2 17 0 îo̧ũ +# emphasized over emphasized +0 18 1 aeiouy +0 18 1 à +2 18 1 îo̧ũ +]) +AT_CHECK([render-test --draw-mode --emph=none input], [0], [dnl +àxyz +àxyz +àxyz +àxyz + +xàéî +xàéî +xàéî +xàéî + +aéiouẙ +aéiouẙ +aéiouẙ +aéiouẙ + +àeîo̧ũy +àeîo̧ũy +àeîo̧ũy +àeîo̧ũy +]) +AT_CHECK([render-test --draw-mode --emph=bold input], [0], [dnl +àxyz +àxxyyzz +aàxyz +aàxxyyzz + +xàéî +xaàeéiî +xxàéî +xxaàeéiî + +aéiouẙ +aaéiioouuẙ +aeéiouyẙ +aaeéiioouuyẙ + +àeîo̧ũy +aàeiîoo̧uũy +àeeîo̧ũyy +aàeeiîoo̧uũyy +]) +AT_CHECK([render-test --draw-mode --emph=underline input], [0], [dnl +àxyz +à_x_y_z +_àxyz +_à_x_y_z + +xàéî +x_à_é_î +_xàéî +_x_à_é_î + +aéiouẙ +_aé_i_o_uẙ +a_éiou_ẙ +_a_é_i_o_u_ẙ + +àeîo̧ũy +_àe_î_o̧_ũy +à_eîo̧ũ_y +_à_e_î_o̧_ũ_y +]) +AT_CLEANUP + +AT_SETUP([ASCII driver Unicode box characters]) +AT_KEYWORDS([render rendering]) +AT_DATA([input], [3 3 +1*2 @abc +2*1 @d\ne\nf +2*1 @g\nh\ni +@j +1*2 @klm +]) +AT_CHECK([render-test --box=unicode input], [0], [dnl +╭───┬─╮ +│abc│d│ +├─┬─┤e│ +│g│j│f│ +│h├─┴─┤ +│i│klm│ +╰─┴───╯ +]) +AT_CLEANUP diff --git a/tests/output/render-test.c b/tests/output/render-test.c index e325dbb1..91fd76c4 100644 --- a/tests/output/render-test.c +++ b/tests/output/render-test.c @@ -1,5 +1,5 @@ /* PSPP - a program for statistical analysis. - Copyright (C) 2009, 2010 Free Software Foundation, Inc. + Copyright (C) 2009, 2010, 2011 Free Software Foundation, Inc. This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by @@ -25,6 +25,7 @@ #include "libpspp/assertion.h" #include "libpspp/compiler.h" #include "libpspp/string-map.h" +#include "output/ascii.h" #include "output/driver.h" #include "output/tab.h" #include "output/table-item.h" @@ -37,14 +38,26 @@ /* --transpose: Transpose the table before outputting? */ static int transpose; +/* --emphasis: ASCII driver emphasis option. */ +static char *emphasis; + +/* --box: ASCII driver box option. */ +static char *box; + +/* --draw-mode: special ASCII driver test mode. */ +static int draw_mode; + +/* ASCII driver, for ASCII driver test mode. */ +static struct output_driver *ascii_driver; + static const char *parse_options (int argc, char **argv); static void usage (void) NO_RETURN; static struct table *read_table (FILE *); +static void draw (FILE *); int main (int argc, char **argv) { - struct table *table; const char *input_file_name; FILE *input; @@ -59,14 +72,24 @@ main (int argc, char **argv) if (input == NULL) error (1, errno, "%s: open failed", input_file_name); } - table = read_table (input); + + if (!draw_mode) + { + struct table *table; + + table = read_table (input); + + if (transpose) + table = table_transpose (table); + + table_item_submit (table_item_create (table, NULL)); + } + else + draw (input); + if (input != stdin) fclose (input); - if (transpose) - table = table_transpose (table); - - table_item_submit (table_item_create (table, NULL)); output_close (); return 0; @@ -85,15 +108,22 @@ configure_drivers (int width, int length) xasprintf ("%d", width)); string_map_insert_nocopy (&options, xstrdup ("length"), xasprintf ("%d", length)); + if (emphasis != NULL) + string_map_insert (&options, "emphasis", emphasis); + if (box != NULL) + string_map_insert (&options, "box", box); /* Render to stdout. */ string_map_clone (&tmp, &options); - driver = output_driver_create (&tmp); + ascii_driver = driver = output_driver_create (&tmp); if (driver == NULL) exit (EXIT_FAILURE); output_driver_register (driver); string_map_destroy (&tmp); + if (draw_mode) + return; + /* Render to render.txt. */ string_map_replace (&options, "output-file", "render.txt"); driver = output_driver_create (&options); @@ -116,6 +146,12 @@ configure_drivers (int width, int length) output_driver_register (driver); #endif + string_map_insert (&options, "output-file", "render.odt"); + driver = output_driver_create (&options); + if (driver == NULL) + exit (EXIT_FAILURE); + output_driver_register (driver); + string_map_destroy (&options); } @@ -130,6 +166,8 @@ parse_options (int argc, char **argv) enum { OPT_WIDTH = UCHAR_MAX + 1, OPT_LENGTH, + OPT_EMPHASIS, + OPT_BOX, OPT_HELP }; static const struct option options[] = @@ -137,6 +175,9 @@ parse_options (int argc, char **argv) {"width", required_argument, NULL, OPT_WIDTH}, {"length", required_argument, NULL, OPT_LENGTH}, {"transpose", no_argument, &transpose, 1}, + {"emphasis", required_argument, NULL, OPT_EMPHASIS}, + {"box", required_argument, NULL, OPT_BOX}, + {"draw-mode", no_argument, &draw_mode, 1}, {"help", no_argument, NULL, OPT_HELP}, {NULL, 0, NULL, 0}, }; @@ -155,6 +196,14 @@ parse_options (int argc, char **argv) length = atoi (optarg); break; + case OPT_EMPHASIS: + emphasis = optarg; + break; + + case OPT_BOX: + box = optarg; + break; + case OPT_HELP: usage (); @@ -185,7 +234,8 @@ usage (void) printf ("%s, to test rendering of PSPP tables\n" "usage: %s [OPTIONS] INPUT\n" "\nOptions:\n" - " --driver=NAME:CLASS:DEVICE:OPTIONS set output driver\n", + " --width=WIDTH set page width in characters\n" + " --length=LINE set page length in lines\n", program_name, program_name); exit (EXIT_SUCCESS); } @@ -298,3 +348,26 @@ read_table (FILE *stream) return &tab->table; } + +static void +draw (FILE *stream) +{ + char buffer[1024]; + int line = 0; + + while (fgets (buffer, sizeof buffer, stream)) + { + char text[sizeof buffer]; + int emph; + int x, y; + + line++; + if (strchr ("#\r\n", buffer[0])) + continue; + + if (sscanf (buffer, "%d %d %d %[^\n]", &x, &y, &emph, text) != 4) + error (1, 0, "line %d has invalid format", line); + + ascii_test_write (ascii_driver, text, x, y, emph ? TAB_EMPH : 0); + } +} diff --git a/tests/output/render.at b/tests/output/render.at index 3b7c8699..471aefcf 100644 --- a/tests/output/render.at +++ b/tests/output/render.at @@ -378,7 +378,7 @@ AT_SETUP([2 big cells with new-lines]) AT_KEYWORDS([render rendering]) AT_DATA([input], [1 2 @PSPP does not place many restrictions on ordering of commands. The main restriction is that variables must be defined before they are otherwise referenced. This section describes the details of command ordering, but most users will have no need to refer to them. PSPP possesses five internal states, called initial, INPUT PROGRAM, FILE TYPE, transformation, and procedure states. -@PSPP includes special support\nfor unknown numeric data values.\nMissing observations are assigned\na special value, called the\n``system-missing value''. This\n``value'' actually indicates the\nabsence of a value; it\nmeans that the actual\nvalue is unknown. +@PSPP includes special support\nfor unknown numeric data values.\nMissing observations are assigned\na special value, called the\n``system‑missing value''. This\n``value'' actually indicates the\nabsence of a value; it\nmeans that the actual\nvalue is unknown. ]) AT_CHECK([render-test input], [0], [dnl +----------------------------------------------------------+------------------+ @@ -392,7 +392,7 @@ AT_CHECK([render-test input], [0], [dnl | | assigned| | | a special value,| | | called the| -| | ``system-missing| +| | ``system‑missing| | | value''. This| | |``value'' actually| | | indicates the| @@ -1712,7 +1712,7 @@ AT_DATA([input], [7 7 @f @g @h -6*6 @The MISSING subcommand determines the handling of missing variables. If INCLUDE is set, then user-missing values are included in the calculations. If NOINCLUDE is set, which is the default, user-missing values are excluded. +6*6 @The MISSING subcommand determines the handling of missing variables. If INCLUDE is set, then user‑missing values are included in the calculations. If NOINCLUDE is set, which is the default, user‑missing values are excluded. @i @j @k @@ -1759,7 +1759,7 @@ ndling of| ariables.| NCLUDE is| set, then| -r-missing| +r‑missing| alues are| ed in the| ions. If| @@ -1768,7 +1768,7 @@ E is set,| E is set,| ch is the| default,| -r-missing| +r‑missing| alues are| excluded.| |