Update TODO.

[pspp-builds.git] / TODO
diff --git a/TODO b/TODO

index c5a488fe8e54e193bcee97afe61380fe1479eb8d..302175eb347d08c7e24ed88fae47f4b2f683f3b1 100644 (file)
--- a/TODO
+++ b/TODO
@@ -1,20 +1,11 @@
-Time-stamp: <2004-03-24 19:52:48 blp>
+Time-stamp: <2004-11-30 22:59:24 blp>
  
  What Ben's working on now.
  --------------------------
  
-Expression parser/optimizer/evaluator revisions:
+Workspace exhaustion heuristics.
  
-       * Testing.
-
-       * Add random distributions.
-
-       * Get rid of Boolean/integer type mismatch errors.
-
-Procedures need to be able to make multiple passes.
-
-Write a better descriptive stats evaluator based on NR two-pass technique,
-revise all existing code to use it.
+Does SET work correctly?
  
  Update q2c input format description.
  
@@ -22,9 +13,29 @@ Rewrite output subsystem, break into multiple processes.
  
  CROSSTABS needs to be re-examined.
  
+RANK, which is needed for the Wilcoxon signed-rank statistic, Mann-Whitney U,
+Kruskal-Wallis on NPAR TESTS and for Spearman and the Johnkheere trend test (in
+other procedures).
+
  TODO
  ----
  
+Make valgrind --leak-check=yes --show-reachable=yes work.
+
+Add Boolean type.
+
+Add NOT_REACHED() macro.
+
+Add compression to casefiles.
+
+Expressions need to be able to abbreviate function names.  XDATE.QUARTER
+abbreviates to XDA.QUA, etc.
+
+The expression tests need tests for XDATE and a few others, see
+tests/xforms/expressions.sh comments for details.
+
+Expressions need random distribution functions.
+
  There needs to be another layer onto the lexer, which should probably be
  entirely rewritten anyway.  The lexer needs to read entire *commands* at a
  time, not just a *line* at a time.  It also needs to support arbitrary putback,
@@ -41,16 +52,9 @@ strd.itl.nist.gov/div898/strd
  
  In debug mode hash table code should verify that collisions are reasonably low.
  
-Use posix_fadvise(POSIX_FADV_SEQUENTIAL) where available.
-
  Use AFM files instead of Groff font files, and include AFMs for our default
  fonts with the distribution.
  
-Add libplot output driver.  Suggested by Robert S. Maier
-<rsm@math.arizona.edu>: "it produces output in idraw-editable PS format, PCL5
-format, xfig-editable format, Illustrator format,..., and can draw vector
-graphics on X11 displays also".
-
  Storage of value labels on disk is inefficient.  Invent new data structure.
  
  Add an output flag which would cause a page break if a table segment could fit
@@ -86,16 +90,6 @@ directory (for data files).  (Yuck!)
  Fix line-too-long problems in PostScript code, instead of covering them up.
  setlinecap is *not* a proper solution.
  
-Need a better way than MAX_WORKSPACE to detect low-memory conditions.
-
-When malloc() returns 0, page to disk and free() unnecessary data.
-
-Remove ccase * argument from procfunc argument to procedure().
-
-See if process_active_file() has wider applicability.
-
-Eliminate private data in struct variable through use of pointers.
-
  Fix som_columns().
  
  Has glob.c been pared down enough?
@@ -103,9 +97,6 @@ Has glob.c been pared down enough?
  Improve interactivity of output by allowing a `commit' function for a page.
  This will also allow for infinite-length pages.
  
-All the tests need to be looked over.  Some of the SET calls don't make sense
-any more.
-
  Implement thin single lines, should be pretty easy now.
  
  SELECT IF should be moved before other transformations whenever possible.  It
@@ -120,21 +111,10 @@ it is used (in dfm.c:cmd_begin_data), not after-the-fact detection.
  Figure out a stylesheet for messages displayed by PSPP: i.e., what quotation
  marks around filenames, etc.
  
-Data input and data output are currently arranged in reciprocal pairs: input is
-done directly, with write_record() or whatever; output is done on a callback
-event-driven basis.  It would definitely be easier if both could be done on a
-direct basis, with read_record() and write_record() routines, with a coroutine
-implementation (see Knuth).  But I'm not sure that coroutines can be
-implemented in ANSI C.  This will require some thought.  Perhaps 0.4.0 can do
-this.
-
  New SET subcommand: OUTPUT.  i.e., SET OUTPUT="filename" to send output to that
  file; SET OUTPUT="filename"(APPEND) to append to that file; SET OUTPUT=DEFAULT
  to reset everything.  There might be a better approach, though--think about it.
  
-HDF export capabilities (http://hdf.ncsa.uiuc.edu).  Suggested by Marcus
-G. Daniels <mgd@santafe.edu>.
-
  From Zvi Grauer <z.grauer@csuohio.edu> and <zvi@mail.ohio.net>:
  
     1. design of experiments software, specifically Factorial, response surface
@@ -301,36 +281,9 @@ string, 1-char strings, and 255-char strings.
  
  g. Test the code.  Write some test syntax files.  Examine the output carefully.
  
-NOTES ON SEARCH ALGORITHMS
---------------------------
-
-1. Trees are nicer when you want a sorted table.  However, you can always
-sort a hash table after you're done adding values.
-
-2. Brent's variation of Algorithm D is best when the table is fixed: it's
-memory-efficient, having small, fixed overhead.  It's easier to use
-when you know in advance how many entries the table will contain.
-
-3. Algorithm L is rather slow for a hash algorithm, however it's easy.
-
-4. Chaining is best in terms of speed; ordered/self-ordering is even
-better.
-
-5. Rehashing is slow.
-
-6. Might want to decide on an algorithm empirically since there are no
-clear mathematical winners in some cases.
-
-7. gprof?  Hey, it works!
-
  MORE NOTES/IDEAS/BUGS
  ---------------------
  
-The behavior of converting a floating point to an integer when the value of the
-float is out of range of the integer type is UNDEFINED!  See ANSI 6.2.1.3.
-
-What should we do for *negative* times in expressions?
-
  Sometimes very wide (or very tall) columns can occur in tables.  What is a good
  way to truncate them?  It doesn't seem to cause problems for the ascii or
  postscript drivers, but it's not good in the general case.  Should they be
@@ -367,7 +320,7 @@ For each case we read from the input program:
  3. Write case to replacement active file.
  4. Execute temporary transformations.  If these drop the case, stop.
  5. Post-TEMPORARY N OF CASES.  If we have already analyzed N cases, stop.
-6. FILTER, PROCESS IF.  If these drop the case, go to 5.
+6. FILTER, PROCESS IF.  If these drop the case, stop.
  7. Pass case to procedure.
  
  Ugly cases: