1 /* PSPP - computes sample statistics.
2 Copyright (C) 2007 Free Software Foundation, Inc.
4 This program is free software; you can redistribute it and/or
5 modify it under the terms of the GNU General Public License as
6 published by the Free Software Foundation; either version 2 of the
7 License, or (at your option) any later version.
9 This program is distributed in the hope that it will be useful, but
10 WITHOUT ANY WARRANTY; without even the implied warranty of
11 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
12 General Public License for more details.
14 You should have received a copy of the GNU General Public License
15 along with this program; if not, write to the Free Software
16 Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
19 #ifndef LIBPSPP_TAINT_H
20 #define LIBPSPP_TAINT_H 1
22 /* Tainting and taint propagation.
24 Properly handling I/O errors and other hard errors in data
25 handling is important. At a minimum, we must notify the user
26 that an error occurred and refrain from presenting possibly
27 corrupted output. It is unacceptable, however, to simply
28 terminate PSPP when an I/O error occurs, because of the
29 unfriendliness of that approach, especially in a GUI
30 environment. We should also propagate the error to the top
31 level of command execution; that is, ensure that the command
32 procedure returns CMD_CASCADING_FAILURE to its caller.
34 Usually in C we propagate errors via return values, or by
35 maintaining an error state on an object (e.g. the error state
36 that the ferror function tests on C streams). But neither
37 approach is ideal for PSPP. Using return values requires the
38 programmer to pay more attention to error handling than one
39 would like, especially given how difficult it can be to test
40 error paths. Maintaining error states on important PSPP
41 objects (e.g. casereaders, casewriters) is a step up, but it
42 still requires more attention than one would like, because
43 quite often there are many such objects in use at any given
44 time, and an I/O error encountered by any of them indicates
45 that the final result of any computation that depends on that
48 The solution implemented here is an attempt to automate as
49 much as possible of PSPP's error-detection problem. It is
50 based on use of "taint" objects, created with taint_create or
51 taint_clone. Each taint object represents a state of
52 correctness or corruption (taint) in an associated object
53 whose correctness must be established. The taint_set_taint
54 function is used to mark a taint object as tainted. The taint
55 status of a taint object can be queried with taint_is_tainted.
57 The benefit of taint objects lies in the ability to connect
58 them together in propagation relationships, using
59 taint_propagate. The existence of a propagation relationship
60 from taint object A to taint object B means that, should
61 object A ever become tainted, then object B will automatically
62 be marked tainted as well. This models the situation where
63 the data represented by B are derived from data obtained from
64 A. This is a common situation in PSPP; for example, the data
65 in one casereader or casewriter are often derived from data in
66 another casereader or casewriter.
68 Taint propagation is transitive: if A propagates to B and B
69 propagates to C, then tainting A taints both B and C. Taint
70 propagation is not commutative: propagation from A to B does
71 not imply propagation from B to A. However, taint propagation
72 is robust against loops, so that if A propagates to B and vice
73 versa, whether directly or indirectly, then tainting either A
74 or B will cause the other to be tainted, without producing an
77 The implementation is robust against destruction of taints in
78 propagation relationships. When this happens, taint
79 propagation through the destroyed taint object is preserved,
80 that is, if A taints B and B taints C, then destroying B will
81 preserve the transitive relationship, so that tainting A will
84 Taint objects actually propagate two different types of taints
85 across the taint graph. The first type of taint is the one
86 already described, which indicates that an associated object
87 has corrupted state. The second type of taint, called a
88 "successor-taint" does not necessarily indicate that the
89 associated object is corrupted. Rather, it indicates some
90 successor of the associated object is corrupted, or was
91 corrupted some time in the past before it was destroyed. (A
92 "successor" of a taint object X is any taint object that can
93 be reached by following propagation relationships starting
94 from X.) Stated another way, when a taint object is marked
95 tainted, all the taint objects that are reachable by following
96 propagation relationships *backward* are marked with a
97 successor-taint. In addition, any object that is marked
98 tainted is also marked successor-tainted.
100 The value of a successor-taint is in summarizing the history
101 of the taint objects derived from a common parent. For
102 example, consider a casereader that represents the active
103 file. A statistical procedure can clone this casereader any
104 number of times and pass it to analysis functions, which may
105 themselves in turn clone it themselves, pass it to sort or
106 merge functions, etc. Conventionally, all of these functions
107 would have to carefully check for I/O errors and propagate
108 them upward, which is error-prone and inconvenient. However,
109 given the successor-taint feature, the statistical procedure
110 may simply check the successor-taint on the top-level
111 casereader after calling the analysis functions and, if a
112 successor-taint is present, skip displaying the procedure's
113 output. Thus, error checking is centralized, simplified, and
114 made convenient. This feature is now used in a number of the
115 PSPP statistical procedures; search the source tree for
116 "taint_has_tainted_successor" for details. */
120 struct taint *taint_create (void);
121 struct taint *taint_clone (const struct taint *);
122 bool taint_destroy (struct taint *);
124 void taint_propagate (const struct taint *from, const struct taint *to);
126 bool taint_is_tainted (const struct taint *);
127 void taint_set_taint (const struct taint *);
129 bool taint_has_tainted_successor (const struct taint *);
130 void taint_reset_successor_taint (const struct taint *);
132 #endif /* libpspp/taint.h */