-"Second, an interrupt handler must not call any function that can
-sleep, which rules out thread_yield(), lock_acquire(), and many
-others. This is because external interrupts use space on the stack of
-the kernel thread that was running at the time the interrupt occurred.
-If the interrupt handler tried to sleep and that thread resumed, then
-the two uses of the single stack would interfere, which cannot be
-allowed."
-
-Is the last sentence really true?
-
-I thought the reason that you couldn't sleep is that you would put
-effectively a random thread/process to sleep, but I don't think it
-would cause problems with the kernel stack. After all, it doesn't
-cause this problem if you call thread_yield at the end of
-intr_handler(), so why would it cause this problem earlier.
-
-As for thread_yield(), my understanding is that the reason it's called
-at the end is to ensure it's done after the interrupt is acknowledged,
-which you can't do until the end because Pintos doesn't handle nested
-interrupts.
-
- - Godmar
-
-From: "Godmar Back" <godmar@gmail.com>
-
-For reasons I don't currently understand, some of our students seem
-hesitant to include each thread in a second "all-threads" list and are
-looking for ways to implement the advanced scheduler without one.
-
-Currently, I believe, all tests for the mlfqs are such that all
-threads are either ready or sleeping in timer_sleep(). This allows for
-an incorrect implementation in which recent-cpu and priorities are
-updated only for those threads that are on the alarm list or the ready
-list.
-
-The todo item would be a test where a thread is blocked on a
-semaphore, lock or condition variable and have its recent_cpu decay to
-zero, and check that it's scheduled right after the unlock/up/signal.
-
-From: "Godmar Back" <godmar@gmail.com>
-Subject: set_priority & donation - a TODO item
-To: "Ben Pfaff" <blp@cs.stanford.edu>
-Date: Mon, 20 Feb 2006 22:20:26 -0500
-
-Ben,
-
-it seems that there are currently no tests that check the proper
-behavior of thread_set_priority() when called by a thread that is
-running under priority donation. The proper behavior, I assume, is to
-temporarily drop the donation if the set priority is higher, and to
-reassume the donation should the thread subsequently set its own
-priority again to a level that's lower than a still active donation.
-
- - Godmar
-
-From: Godmar Back <godmar@gmail.com>
-Subject: project 4 question/comment regarding caching inode data
-To: Ben Pfaff <blp@cs.stanford.edu>
-Date: Sat, 14 Jan 2006 15:59:33 -0500
-
-Ben,
-
-in section 6.3.3 in the P4 FAQ, you write:
-
-"You can store a pointer to inode data in struct inode, if you want,"
-
-Should you point out that if they indeed do that, they likely wouldn't
-be able to support more than 64 open inodes systemwide at any given
-point in time.
-
-(This seems like a rather strong limitation; do your current tests
-open more than 64 files?
-It would also point to an obvious way to make the projects harder by
-specifically disallowing that inode data be locked in memory during
-the entire time an inode is kept open.)
-
- - Godmar
-
-From: Godmar Back <godmar@gmail.com>
-Subject: on caching in project 4
-To: Ben Pfaff <blp@cs.stanford.edu>
-Date: Mon, 9 Jan 2006 20:58:01 -0500
-
-here's an idea for future semesters.
-
-I'm in the middle of project 4, I've started by implementing a buffer
-cache and plugging it into the existing filesystem. Along the way I
-was wondering how we could test the cache.
-
-Maybe one could adopt a similar testing strategy as in project 1 for
-the MLQFS scheduler: add a function that reads "get_cache_accesses()"
-and a function "get_cache_hits()". Then create a version of pintos
-that creates access traces for a to-be-determined workload. Run an
-off-line analysis that would determine how many hits a perfect cache
-would have (MAX), and how much say an LRU strategy would give (MIN).
-Then add a fudge factor to account for different index strategies and
-test that the reported number of cache hits/accesses is within (MIN,
-MAX) +/- fudge factor.
-
-(As an aside - I am curious why you chose to use a clock-style
-algorithm rather than the more straightforward LRU for your buffer
-cache implementation in your sample solution. Is there a reason for
-that? I was curious to see if it made a difference, so I implemented
-LRU for your cache implementation and ran the test workload of project
-4 and printed cache hits/accesses.
-I found that for that workload, the clock-based algorithm performs
-almost identical to LRU (within about 1%, but I ran nondeterministally
-with QEMU). I then reduced the cache size to 32 blocks and found again
-the same performance, which raises the suspicion that the test
-workload might not force any cache replacement, so the eviction
-strategy doesn't matter.)
-
-Godmar Back <godmar@gmail.com> writes:
-
-> in your sample solution to P4, dir_reopen does not take any locks when
-> changing a directory's open_cnt. This looks like a race condition to
-> me, considering that dir_reopen is called from execute_process without
-> any filesystem locks held.
+* Godmar: Introduce memory leak robustness tests - both for the
+ well-behaved as well as the mis-behaved case - that tests that the
+ kernel handles low-mem conditions well.
+
+* Godmar: Another area is concurrency. I noticed that I had passed all
+ tests with bochs 2.2.1 (in reproducibility mode). Then I ran them
+ with qemu and hit two deadlocks (one of them in rox-*,
+ incidentally). After fixing those deadlocks, I upgraded to bochs
+ 2.2.5 and hit yet another deadlock in reproducibility mode that
+ didn't show up in 2.2.1. All in all, a standard grading run would
+ have missed 3 deadlocks in my code. I'm not sure how to exploit
+ that for grading - either run with qemu n times (n=2 or 3), or run
+ it with bochs and a set of -j parameters. Some of which could be
+ known to the students, some not, depending on preference. (I ported
+ the -j patch to bochs 2.2.5 -
+ http://people.cs.vt.edu/~gback/pintos/bochs-2.2.5.jitter.patch but I
+ have to admit I never tried it so I don't know if it would have
+ uncovered the deadlocks that qemu and the switch to 2.2.5
+ uncovered.)
+
+* Godmar: There is also the option to require students to develop test
+ workloads themselves, for instance, to demonstrate the effectiveness
+ of a particular algorithm (page eviction & buffer cache replacement
+ come to mind.) This could involve a problem of the form: develop a
+ workload that you cover well, and develop a "worst-case" load where
+ you algorithm performs poorly, and show the results of your
+ quantitative evaluation in your report - this could then be part of
+ their test score.
+
+* Threads project:
+
+ - Godmar:
+
+ >> Describe a potential race in thread_set_priority() and explain how
+ >> your implementation avoids it. Can you use a lock to avoid this race?
+
+ I'm not sure what you're getting at here:
+ If changing the priority of a thread involves accessing the ready
+ list, then of course there's a race with interrupt handlers and locks
+ can't be used to resolve it.
+
+ Changing the priority however also involves a race with respect to
+ accessing a thread's "priority" field - this race is with respect to
+ other threads that attempt to donate priority to the thread that's
+ changing its priority. Since this is a thread-to-thread race, I would
+ tend to believe that locks could be used, although I'm not certain. [
+ I should point out, though, that lock_acquire currently disables
+ interrupts - the purpose of which I had doubted in an earlier email,
+ since sema_down() sufficiently establishes mutual exclusion. Taking
+ priority donation into account, disabling interrupts prevents the race
+ for the priority field, assuming the priority field of each thread is
+ always updated with interrupts disabled. ]
+
+ What answer are you looking for for this design document question?
+
+ - Godmar:
+
+ >> Did any ambiguities in the scheduler specification make values in the
+ >> table uncertain? If so, what rule did you use to resolve them? Does
+ >> this match the behavior of your scheduler?
+
+ My guess is that you're referring to the fact the scheduler
+ specification does not prescribe any order in which the priorities of
+ all threads are updated, so if multiple threads end up with the same
+ priority, it doesn't say which one to pick. ("round-robin" order
+ would not apply here.)
+
+ Is that correct?
+
+ - Godmar:
+
+ One of my groups implemented priority donation with these data
+ structures in synch.cc:
+ ---
+ struct value
+ {
+ struct list_elem elem; /* List element. */
+ int value; /* Item value. */
+ };
+
+ static struct value values[10];
+ static int start = 10;
+ static int numNest = 0;
+ ---
+ In their implementation, the "elem" field in their "struct value" is
+ not even used.
+
+ The sad part is that they've passed all tests that are currently in
+ the Pintos base with this implementation. (They do fail the additional
+ tests I added priority-donate-sema & priority-donate-multiple2.)
+
+ Another group managed to pass all tests with this construct:
+ ---
+ struct lock
+ {
+ struct thread *holder; /* Thread holding lock (for debugging). */
+ struct semaphore semaphore; /* Binary semaphore controlling access. */
+ //*************************************
+ int pri_prev;
+ int pri_delta; //Used for Priority Donation
+ /**************************************************/
+ };
+ ---
+ where "pri_delta" keeps track of "priority deltas." They even pass
+ priority-donate-multiple2.
+
+ I think we'll need a test where a larger number of threads & locks
+ simultaneously exercise priority donation to weed out those
+ implementations.
+
+ It may also be a good idea to use non-constant deltas for the low,
+ medium, and high priority threads in the tests - otherwise, adding a
+ "priority delta" might give - by coincidence - the proper priority for
+ a thread.
+
+ - Godmar: Another thing: one group passed all tests even though they
+ wake up all waiters on a lock_release(), rather than just
+ one. Since there's never more than one waiter in our tests, they
+ didn't fail anything. Another possible TODO item - this could be
+ part a series of "regression tests" that check that they didn't
+ break basic functionality in project 1. I don't think this would
+ be insulting to the students.