X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=TODO;h=73a47a709d7cc3ea2eb0daa5ae2a94a095acb18a;hb=b568df7ae9;hp=85bd51a0a73484bc5c9940c88fb705bfbbec40c6;hpb=3389cbb515f92b56550c41b1ca0bda9d4a6fbf9e;p=pintos-anon diff --git a/TODO b/TODO index 85bd51a..73a47a7 100644 --- a/TODO +++ b/TODO @@ -1,177 +1,163 @@ -*- text -*- -From: "Godmar Back" -Subject: thread_yield in irq handler -To: "Ben Pfaff" -Date: Wed, 22 Feb 2006 22:18:50 -0500 - -Ben, - -you write in your Tour of Pintos: - -"Second, an interrupt handler must not call any function that can -sleep, which rules out thread_yield(), lock_acquire(), and many -others. This is because external interrupts use space on the stack of -the kernel thread that was running at the time the interrupt occurred. -If the interrupt handler tried to sleep and that thread resumed, then -the two uses of the single stack would interfere, which cannot be -allowed." - -Is the last sentence really true? - -I thought the reason that you couldn't sleep is that you would put -effectively a random thread/process to sleep, but I don't think it -would cause problems with the kernel stack. After all, it doesn't -cause this problem if you call thread_yield at the end of -intr_handler(), so why would it cause this problem earlier. - -As for thread_yield(), my understanding is that the reason it's called -at the end is to ensure it's done after the interrupt is acknowledged, -which you can't do until the end because Pintos doesn't handle nested -interrupts. - - - Godmar - -From: "Godmar Back" - -For reasons I don't currently understand, some of our students seem -hesitant to include each thread in a second "all-threads" list and are -looking for ways to implement the advanced scheduler without one. - -Currently, I believe, all tests for the mlfqs are such that all -threads are either ready or sleeping in timer_sleep(). This allows for -an incorrect implementation in which recent-cpu and priorities are -updated only for those threads that are on the alarm list or the ready -list. - -The todo item would be a test where a thread is blocked on a -semaphore, lock or condition variable and have its recent_cpu decay to -zero, and check that it's scheduled right after the unlock/up/signal. - -From: "Godmar Back" -Subject: set_priority & donation - a TODO item -To: "Ben Pfaff" -Date: Mon, 20 Feb 2006 22:20:26 -0500 - -Ben, - -it seems that there are currently no tests that check the proper -behavior of thread_set_priority() when called by a thread that is -running under priority donation. The proper behavior, I assume, is to -temporarily drop the donation if the set priority is higher, and to -reassume the donation should the thread subsequently set its own -priority again to a level that's lower than a still active donation. - - - Godmar - -From: Godmar Back -Subject: project 4 question/comment regarding caching inode data -To: Ben Pfaff -Date: Sat, 14 Jan 2006 15:59:33 -0500 - -Ben, - -in section 6.3.3 in the P4 FAQ, you write: - -"You can store a pointer to inode data in struct inode, if you want," - -Should you point out that if they indeed do that, they likely wouldn't -be able to support more than 64 open inodes systemwide at any given -point in time. - -(This seems like a rather strong limitation; do your current tests -open more than 64 files? -It would also point to an obvious way to make the projects harder by -specifically disallowing that inode data be locked in memory during -the entire time an inode is kept open.) - - - Godmar - -From: Godmar Back -Subject: on caching in project 4 -To: Ben Pfaff -Date: Mon, 9 Jan 2006 20:58:01 -0500 - -here's an idea for future semesters. - -I'm in the middle of project 4, I've started by implementing a buffer -cache and plugging it into the existing filesystem. Along the way I -was wondering how we could test the cache. - -Maybe one could adopt a similar testing strategy as in project 1 for -the MLQFS scheduler: add a function that reads "get_cache_accesses()" -and a function "get_cache_hits()". Then create a version of pintos -that creates access traces for a to-be-determined workload. Run an -off-line analysis that would determine how many hits a perfect cache -would have (MAX), and how much say an LRU strategy would give (MIN). -Then add a fudge factor to account for different index strategies and -test that the reported number of cache hits/accesses is within (MIN, -MAX) +/- fudge factor. - -(As an aside - I am curious why you chose to use a clock-style -algorithm rather than the more straightforward LRU for your buffer -cache implementation in your sample solution. Is there a reason for -that? I was curious to see if it made a difference, so I implemented -LRU for your cache implementation and ran the test workload of project -4 and printed cache hits/accesses. -I found that for that workload, the clock-based algorithm performs -almost identical to LRU (within about 1%, but I ran nondeterministally -with QEMU). I then reduced the cache size to 32 blocks and found again -the same performance, which raises the suspicion that the test -workload might not force any cache replacement, so the eviction -strategy doesn't matter.) - -Godmar Back writes: - -> in your sample solution to P4, dir_reopen does not take any locks when -> changing a directory's open_cnt. This looks like a race condition to -> me, considering that dir_reopen is called from execute_process without -> any filesystem locks held. - -* Get rid of rox--causes more trouble than it's worth +* In grading scripts, warn when a fault is caused by an attempt to + write to the kernel text segment. (Among other things we need to + explain that "text" means "code".) * Reconsider command line arg style--confuses everyone. -* Finish writing tour. +* Internal tests. -* Introduce a "yield" system call to speed up the syn-* tests. +* Godmar: Extend memory leak robustness tests. + multi-oom should still pass in project 3/4 because kernel will run out + of kernel pool memory before running out of swap space. + +* Godmar: Another area is concurrency. I noticed that I had passed all + tests with bochs 2.2.1 (in reproducibility mode). Then I ran them + with qemu and hit two deadlocks (one of them in rox-*, + incidentally). After fixing those deadlocks, I upgraded to bochs + 2.2.5 and hit yet another deadlock in reproducibility mode that + didn't show up in 2.2.1. All in all, a standard grading run would + have missed 3 deadlocks in my code. I'm not sure how to exploit + that for grading - either run with qemu n times (n=2 or 3), or run + it with bochs and a set of -j parameters. Some of which could be + known to the students, some not, depending on preference. (I ported + the -j patch to bochs 2.2.5 - + http://people.cs.vt.edu/~gback/pintos/bochs-2.2.5.jitter.patch but I + have to admit I never tried it so I don't know if it would have + uncovered the deadlocks that qemu and the switch to 2.2.5 + uncovered.) + +* Godmar: There is also the option to require students to develop test + workloads themselves, for instance, to demonstrate the effectiveness + of a particular algorithm (page eviction & buffer cache replacement + come to mind.) This could involve a problem of the form: develop a + workload that you cover well, and develop a "worst-case" load where + you algorithm performs poorly, and show the results of your + quantitative evaluation in your report - this could then be part of + their test score. + +* Godmar: the spec says that illegal syscall arguments can be handled either by + terminating the process, or by returning an error code such as -1. + + Looking at http://gback.cs.vt.edu:8080/source/xref/tests/userprog/write-bad-ptr.c + and http://gback.cs.vt.edu:8080/source/xref/tests/userprog/write-bad-ptr.ck + I'm wondering if write-bad-ptr isn't forcing them to terminate the + process(?). Even though write-bad-ptr.ck has a provision to allow + continuation after returning -1, wouldn't it still fail since the test + executes: + fail ("should have exited with -1"); + ? + +* Godmar: mmap-inherit needs a IGNORE_USER_FAULTS since we say to "not output + any messages Pintos doesn't already print." - which technically puts + the onus on us to ignore the default page fault msg whenever a test is + expected to fault. + +* Godmar: add _end to user.lds script and construct some tests that fail + unless students check a region for validity rather than just the first + address of a region. Right now, unfortunately, they pass all p2 tests + with just checking the first address. [A possible problem is that the + tests may be unable to tell termination due to unintentional fault + from willful termination when address check fails. Should we require + they return -1/EINVAL on a bad address and disallow termination? Or + construct a test that they'll likely fail if they unintentionally + terminate, maybe while holding the filesystem lock? Or require that + the diagnostic message only be output when fault occurs in user mode? + Something to think about.] + +* Threads project: + + - Godmar: + + >> Describe a potential race in thread_set_priority() and explain how + >> your implementation avoids it. Can you use a lock to avoid this race? + + I'm not sure what you're getting at here: + If changing the priority of a thread involves accessing the ready + list, then of course there's a race with interrupt handlers and locks + can't be used to resolve it. + + Changing the priority however also involves a race with respect to + accessing a thread's "priority" field - this race is with respect to + other threads that attempt to donate priority to the thread that's + changing its priority. Since this is a thread-to-thread race, I would + tend to believe that locks could be used, although I'm not certain. [ + I should point out, though, that lock_acquire currently disables + interrupts - the purpose of which I had doubted in an earlier email, + since sema_down() sufficiently establishes mutual exclusion. Taking + priority donation into account, disabling interrupts prevents the race + for the priority field, assuming the priority field of each thread is + always updated with interrupts disabled. ] + + What answer are you looking for for this design document question? + + - Godmar: Another thing: one group passed all tests even though they + wake up all waiters on a lock_release(), rather than just + one. Since there's never more than one waiter in our tests, they + didn't fail anything. Another possible TODO item - this could be + part a series of "regression tests" that check that they didn't + break basic functionality in project 1. I don't think this would + be insulting to the students. -via Godmar Back: +* Userprog project: -* Project 3 solution needs FS lock. + - Get rid of rox--causes more trouble than it's worth -* Get rid of mmap syscall, add sbrk. + - Extra credit: specifics on how to implement sbrk, malloc. + Godmar: I have a sample solution and tests for that! Stay tuned. -* Make backtrace program accept multiple object file arguments, - e.g. add -u option to allow backtracing user program also. + - Godmar: We're missing tests that pass arguments to system calls + that span multiple pages, where some are mapped and some are not. + An implementation that only checks the first page, rather than all + pages that can be touched during a call to read()/write() passes + all tests. -* page-linear, page-shuffle VM tests do not use enough memory to force - eviction. Should increase memory consumption. + - Godmar: There does not appear to be a test that checks that they + close all fd's on exit. Idea: add statistics & self-diagnostics + code to palloc.c and malloc.c. Self-diagnostics code could be + used for debugging. The statistics code would report how much + kernel memory is free. Add a system call + "get_kernel_memory_information". User programs could engage in a + variety of activities and notice leaks by checking the kernel + memory statistics. + - note: multi-oom tests that now. -* Add FS persistence test(s). + - Godmar: In the wait() tests, there's currently no test that tests + that a process can only wait for its own children. There's only + one test that tests that wait() on an invalid pid returns -1 (or + kills the process), but no test where a valid pid is used that is + not a child of the current process. -* lock_acquire(), lock_release() don't need additional intr_dis/enable - calls, because the semaphore protects lock->holder. + The current tests also do not ensure that both scenarios (parent waits + first vs. child exits first) are exercised. In this context, I'm + wondering if we should add a sleep() system call that would export + timer_sleep() to user processes; this would allow the construction of + such a test. It would also make it easier to construct a test for the + valid-pid, but not-a-child scenario. + As in Project 4, the baseline implementation of timer_sleep() should + suffice, so this would not necessarily require basing Project 2 on + Project 1. [ A related thought: IMO it would not be entirely + unreasonable to require timer_sleep() and priority scheduling sans + donation from Project 1 working for subsequent projects. ] +* VM project: -* process_death test needs improvement + - Godmar: Get rid of mmap syscall, add sbrk. -* Internal tests. + - Godmar: page-linear, page-shuffle VM tests do not use enough + memory to force eviction. Should increase memory consumption. -* Improve automatic interpretation of exception messages. + - Godmar: fix the page* tests to require swapping -* Userprog project: - - - Mark read-only pages as actually read-only in the page table. Or, - since this was consistently rated as the easiest project by the - students, require them to do it. + - Godmar: make sure the filesystem fails if not properly + concurrency-protected in project 3. - - Don't provide per-process pagedir implementation but only - single-process implementation and require students to implement - the separation? This project was rated as the easiest after all. - Alternately we could just remove the synchronization on pid - selection and check that students fix it. + - Godmar: Another area in which tests could be created are for + project 3: tests that combine mmap with a paging workload to see + their kernel pages properly while mmapping pages - I don't think + the current tests test that, do they? * Filesys project: @@ -180,6 +166,58 @@ via Godmar Back: cache--likely, Bochs doesn't simulate a disk with a realistic speed. + (Perhaps we should count disk reads and writes, not time.) + + - Need lots more tests. + + - Detect implementations that represent the cwd as a string, by + removing a directory that is the cwd of another process, then + creating a new directory of the same name and putting some files + in it, then checking whether the process that had it as cwd sees + them. + + - dir-rm-cwd should have a related test that uses a separate process + to try to pin the directory as its cwd. + + - Godmar: I'm not sure if I mentioned that already, but I passed all + tests for the filesys project without having implemented inode + deallocation. A test is needed that checks that blocks are + reclaimed when files are deleted. + + - Godmar: I'm in the middle of project 4, I've started by + implementing a buffer cache and plugging it into the existing + filesystem. Along the way I was wondering how we could test the + cache. + + Maybe one could adopt a similar testing strategy as in project 1 + for the MLQFS scheduler: add a function that reads + "get_cache_accesses()" and a function "get_cache_hits()". Then + create a version of pintos that creates access traces for a + to-be-determined workload. Run an off-line analysis that would + determine how many hits a perfect cache would have (MAX), and how + much say an LRU strategy would give (MIN). Then add a fudge + factor to account for different index strategies and test that the + reported number of cache hits/accesses is within (MIN, MAX) +/- + fudge factor. + + (As an aside - I am curious why you chose to use a clock-style + algorithm rather than the more straightforward LRU for your buffer + cache implementation in your sample solution. Is there a reason + for that? I was curious to see if it made a difference, so I + implemented LRU for your cache implementation and ran the test + workload of project 4 and printed cache hits/accesses. I found + that for that workload, the clock-based algorithm performs almost + identical to LRU (within about 1%, but I ran nondeterministally + with QEMU). I then reduced the cache size to 32 blocks and found + again the same performance, which raises the suspicion that the + test workload might not force any cache replacement, so the + eviction strategy doesn't matter.) + + - Godmar: I haven't analyzed the tests for project 4 yet, but I'm + wondering if the fairness requirements your specification has for + readers/writers are covered in the tests or not. + + * Documentation: - Add "Digging Deeper" sections that describe the nitty-gritty x86 @@ -188,16 +226,81 @@ via Godmar Back: - Add explanations of what "real" OSes do to give students some perspective. -* Assignments: +* To add partition support: + + - Find four partition types that are more or less unused and choose + to use them for Pintos. (This is implemented.) + + - Bootloader reads partition tables of all BIOS devices to find the + first that has the "Pintos kernel" partition type. (This is + implemented.) Ideally the bootloader would make sure there is + exactly one such partition, but I didn't implement that yet. + + - Bootloader reads kernel into memory at 1 MB using BIOS calls. + (This is implemented.) + + - Kernel arguments have to go into a separate sector because the + bootloader is otherwise too big to fit now? (I don't recall if I + did anything about this.) + + - Kernel at boot also scans partition tables of all the disks it can + find to find the ones with the four Pintos partition types + (perhaps not all exist). After that, it makes them available to + the rest of the kernel (and doesn't allow access to other devices, + for safety). + + - "pintos" and "pintos-mkdisk" need to write a partition table to + the disks that they create. "pintos-mkdisk" will need to take a + new parameter specifying the type. (I might have partially + implemented this, don't remember.) + + - "pintos" should insist on finding a partition header on disks + handed to it, for safety. + + - Need some way for "pintos" to assemble multiple disks or + partitions into a single image that can be copied directly to a + USB block device. (I don't know whether I came up with a good + solution yet or not, or whether I implemented any of it.) + +* To add USB support: + + - Needs to be able to scan PCI bus for UHCI controller. (I + implemented this partially.) + + - May want to be able to initialize USB controllers over CardBus + bridges. I don't know whether this requires additional work or + if it's useful enough to warrant extra work. (It's of special + interest for me because I have a laptop that only has USB via + CardBus.) + + - There are many protocol layers involved: SCSI over USB-Mass + Storage over USB over UHCI over PCI. (I may be forgetting one.) + I don't know yet whether it's best to separate the layers or to + merge (some of) them. I think that a simple and clean + organization should be a priority. + + - VMware can likely be used for testing because it can expose host + USB devices as guest USB devices. This is safer and more + convenient than using real hardware for testing. + + - Should test with a variety of USB keychain devices because there + seems to be wide variation among them, especially in the SCSI + protocols they support. Should try to use a "lowest-common + denominator" SCSI protocol if any such thing really exists. - - Add extra credit: + - Might want to add a feature whereby kernel arguments can be + given interactively, rather than passed on-disk. Needs some + though. - . Low-level x86 stuff, like paged page tables. +========================================================================== +============================== COMPLETED TASKS =========================== +========================================================================== - . Specifics on how to implement sbrk, malloc. +* Godmar: Introduce memory leak robustness tests - both for the + well-behaved as well as the mis-behaved case - that tests that the + kernel handles low-mem conditions well. - . Other good ideas. + - handled by new multi-oom. - . opendir/readdir/closedir +* Godmar: improved priority inheritance tests (see priority-donate-chain) - . everything needed for getcwd()