X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=TODO;h=1048b6c0d32f9fdf2373d3bd0391e573e7185005;hb=97d6c6f6b1922621ca013668eba9a9a9f71d60fe;hp=fd4bd6891166bdce143e518cc6116a1d96c54c2a;hpb=5da3677581de0e41efa4dccb61a9bf82181e725d;p=pspp-builds.git diff --git a/TODO b/TODO index fd4bd689..1048b6c0 100644 --- a/TODO +++ b/TODO @@ -1,15 +1,48 @@ -Time-stamp: <2003-12-15 22:51:49 blp> +Time-stamp: <2004-03-26 00:07:35 blp> + +What Ben's working on now. +-------------------------- + +Procedures need to be able to make multiple passes. + +Write a better descriptive stats evaluator based on NR two-pass technique, +revise all existing code to use it. + +Update q2c input format description. + +Rewrite output subsystem, break into multiple processes. + +CROSSTABS needs to be re-examined. TODO ---- +The expression tests need tests for XDATE and a few others, see +tests/xforms/expressions.sh comments for details. + +Expressions need random distribution functions. + +There needs to be another layer onto the lexer, which should probably be +entirely rewritten anyway. The lexer needs to read entire *commands* at a +time, not just a *line* at a time. It also needs to support arbitrary putback, +probably by just backing up the "current position" in the command buffer. + +Scratch variables should not be available for use following TEMPORARY. + +Details of N OF CASES, SAMPLE, FILTER, PROCESS IF, TEMPORARY, etc., need to be +checked against the documentation. See notes on these at end of file for a +start. + +Check our results against the NIST StRD benchmark results at +strd.itl.nist.gov/div898/strd + +In debug mode hash table code should verify that collisions are reasonably low. + +Use posix_fadvise(POSIX_FADV_SEQUENTIAL) where available. + Use AFM files instead of Groff font files, and include AFMs for our default fonts with the distribution. -The way that data-in.c and data-out.c deal with strings is wrong. Instead of -the way it's done now, we should make it dynamically allocate a buffer and -return a pointer to it. This is a much safer interface. - Add libplot output driver. Suggested by Robert S. Maier : "it produces output in idraw-editable PS format, PCL5 format, xfig-editable format, Illustrator format,..., and can draw vector @@ -58,19 +91,10 @@ Remove ccase * argument from procfunc argument to procedure(). See if process_active_file() has wider applicability. -Looks like there's a potential problem with value labels--we use free_val_lab -from avl_destroy(), but free_val_lab doesn't decrement the reference count, it -just frees the label. Check into this sometime soon. - Eliminate private data in struct variable through use of pointers. Fix som_columns(). -There needs to be another layer onto the lexer, which should probably be -entirely rewritten anyway. The lexer needs to read entire *commands* at a -time, not just a *line* at a time. This would vastly simplify the -(yet-to-be-implemented) logging mechanism and other stuff as well. - Has glob.c been pared down enough? Improve interactivity of output by allowing a `commit' function for a page. @@ -111,12 +135,12 @@ G. Daniels . From Zvi Grauer and : 1. design of experiments software, specifically Factorial, response surface - methodology and mixrture design. + methodology and mixrture design. These would be EXTREMELY USEFUL for chemists, engineeris, and anyone involved in the production of chemicals or formulations. - 2. Multidimensional Scaling analysis (for market analysis) - + 2. Multidimensional Scaling analysis (for market analysis) - 3. Preference mapping software for market analysis @@ -330,6 +354,121 @@ whatever) for it. Then read the /FILE and use the index to match to each case. OTOH, if the /TABLE is too large, then do it the old way, complaining if either file is not sorted on key. +---------------------------------------------------------------------- +Statistical procedures: + +For each case we read from the input program: + +1. Execute permanent transformations. If these drop the case, stop. +2. N OF CASES. If we have already written N cases, stop. +3. Write case to replacement active file. +4. Execute temporary transformations. If these drop the case, stop. +5. Post-TEMPORARY N OF CASES. If we have already analyzed N cases, stop. +6. FILTER, PROCESS IF. If these drop the case, go to 5. +7. Pass case to procedure. + +Ugly cases: + +LAG records cases in step 3. + +AGGREGATE: When output goes to an external file, this is just an ordinary +procedure. When output goes to the active file, step 3 should be skipped, +because AGGREGATE creates its own case sink and writes to it in step 7. Also, +TEMPORARY has no effect and we just cancel it. Regardless of direction of +output, we should not implement AGGREGATE through a transformation because that +will fail to honor FILTER, PROCESS IF, N OF CASES. + +ADD FILES: Essentially an input program. It silently cancels unclosed LOOPs +and DO IFs. If the active file is used for input, then runs EXECUTE (if there +are any transformations) and then steals vfm_source and encapsulates it. If +the active file is not used for input, then it cancels all the transformations +and deletes the original active file. + +CASESTOVARS: ??? + +FLIP: + +MATCH FILES: Similar to AGGREGATE. This is a procedure. When the active file +is used for input, it reads the active file; otherwise, it just cancels all the +transformations and deletes the original active file. Step 3 should be +skipped, because MATCH FILES creates its own case sink and writes to it in step +7. TEMPORARY is not allowed. + +MODIFY VARS: + +REPEATING DATA: + +SORT CASES: + +UPDATE: same as ADD FILES. + +VARSTOCASES: ??? +---------------------------------------------------------------------- +N OF CASES + + * Before TEMPORARY, limits number of cases sent to the sink. + + * After TEMPORARY, limits number of cases sent to the procedure. + + * Without TEMPORARY, those are the same cases, so it limits both. + +SAMPLE + + * Sample is just a transformation. It has no special properties. + +FILTER + + * Always selects cases sent to the procedure. + + * No effect on cases sent to sink. + + * Before TEMPORARY, selection is permanent. After TEMPORARY, + selection stops after a procedure. + +PROCESS IF + + * Always selects cases sent to the procedure. + + * No effect on cases sent to sink. + + * Always stops after a procedure. + +SPLIT FILE + + * Ignored by AGGREGATE. Used when procedures write matrices. + + * Always applies to the procedure. + + * Before TEMPORARY, splitting is permanent. After TEMPORARY, + splitting stops after a procedure. + +TEMPORARY + + * TEMPORARY has no effect on AGGREGATE when output goes to the active file. + + * SORT CASES, ADD FILES, RENAME VARIABLES, CASESTOVARS, VARSTOCASES, + COMPUTE with a lag function cannot be used after TEMPORARY. + + * Cannot be used in DO IF...END IF or LOOP...END LOOP. + + * FLIP ignores TEMPORARY. All transformations become permanent. + + * MATCH FILES and UPDATE cannot be used after TEMPORARY if active + file is an input source. + + * RENAME VARIABLES is invalid after TEMPORARY. + + * WEIGHT, SPLIT FILE, N OF CASES, FILTER, PROCESS IF apply only to + the next procedure when used after TEMPORARY. + +WEIGHT + + * Always applies to the procedure. + + * Before TEMPORARY, weighting is permanent. After TEMPORARY, + weighting stops after a procedure. + + ------------------------------------------------------------------------------- Local Variables: mode: text