-*- text -*-

From: "Waqar Mohsin" <wmohsin@gmail.com>
Subject: 3 questions about switch_threads() in switch.S
To: blp@cs.stanford.edu, joshwise@stanford.edu
Date: Fri, 3 Mar 2006 17:09:21 -0800

QUESTION 1
 
In the section
 
  # Save current stack pointer to old thread's stack, if any.
  movl SWITCH_CUR(%esp), %eax
  test %eax, %eax
  jz 1f
  movl %esp, (%eax,%edx,1)
1:

  # Restore stack pointer from new thread's stack.
  movl SWITCH_NEXT(%esp), %ecx
  movl (%ecx,%edx,1), %esp

why are we saving the current stack pointer only if the "cur" thread pointer
is non-NULL ? Isn't it gauranteed to be non-NULL because switch_threads() is
only called form schedule(), where we have

  struct thread *cur = running_thread ();

which should always be non-NULL (given the way kernel pool is laid out).

QUESTION 2

  # This stack frame must match the one set up by thread_create().
  pushl %ebx
  pushl %ebp
  pushl %esi
  pushl %edi

I find the comment confusing. thread_create() is a special case: the set of
registers popped from switch_threads stack frame for a newly created thread
are all zero, so their order shouldn't dictate the order above.

I think all that matters is that the order of pops at the end of
switch_threads() is the opposite of the pushes at the beginning (as shown
above).

QUESTION 3

Is it true that struct switch_threads_frame does NOT strictly require

    struct thread *cur;         /* 20: switch_threads()'s CUR argument. */
    struct thread *next;        /* 24: switch_threads()'s NEXT argument. */
at the end ?

When a newly created thread's stack pointer is installed in switch_threads(),
all we do is pop the saved registers and return to switch_entry() which pops
off and discards the above two simulated (and not used) arguments to
switch_threads().

If we remove these two from struct switch_threads_frame and don't do a

  # Discard switch_threads() arguments.
  addl $8, %esp
in switch_entry(), things should still work. Am I right ?

Thanks
Waqar

From: "Godmar Back" <godmar@gmail.com>
Subject: thread_yield in irq handler
To: "Ben Pfaff" <blp@cs.stanford.edu>
Date: Wed, 22 Feb 2006 22:18:50 -0500

Ben,

you write in your Tour of Pintos:

"Second, an interrupt handler must not call any function that can
sleep, which rules out thread_yield(), lock_acquire(), and many
others. This is because external interrupts use space on the stack of
the kernel thread that was running at the time the interrupt occurred.
If the interrupt handler tried to sleep and that thread resumed, then
the two uses of the single stack would interfere, which cannot be
allowed."

Is the last sentence really true?

I thought the reason that you couldn't sleep is that you would put
effectively a random thread/process to sleep, but I don't think it
would cause problems with the kernel stack.  After all, it doesn't
cause this problem if you call thread_yield at the end of
intr_handler(), so why would it cause this problem earlier.

As for thread_yield(), my understanding is that the reason it's called
at the end is to ensure it's done after the interrupt is acknowledged,
which you can't do until the end because Pintos doesn't handle nested
interrupts.

 - Godmar

From: "Godmar Back" <godmar@gmail.com>

For reasons I don't currently understand, some of our students seem
hesitant to include each thread in a second "all-threads" list and are
looking for ways to implement the advanced scheduler without one.

Currently, I believe, all tests for the mlfqs are such that all
threads are either ready or sleeping in timer_sleep(). This allows for
an incorrect implementation in which recent-cpu and priorities are
updated only for those threads that are on the alarm list or the ready
list.

The todo item would be a test where a thread is blocked on a
semaphore, lock or condition variable and have its recent_cpu decay to
zero, and check that it's scheduled right after the unlock/up/signal.

From: "Godmar Back" <godmar@gmail.com>
Subject: set_priority & donation - a TODO item
To: "Ben Pfaff" <blp@cs.stanford.edu>
Date: Mon, 20 Feb 2006 22:20:26 -0500

Ben,

it seems that there are currently no tests that check the proper
behavior of thread_set_priority() when called by a thread that is
running under priority donation.  The proper behavior, I assume, is to
temporarily drop the donation if the set priority is higher, and to
reassume the donation should the thread subsequently set its own
priority again to a level that's lower than a still active donation.

 - Godmar

From: Godmar Back <godmar@gmail.com>
Subject: project 4 question/comment regarding caching inode data
To: Ben Pfaff <blp@cs.stanford.edu>
Date: Sat, 14 Jan 2006 15:59:33 -0500

Ben,

in section 6.3.3 in the P4 FAQ, you write:

"You can store a pointer to inode data in struct inode, if you want,"

Should you point out that if they indeed do that, they likely wouldn't
be able to support more than 64 open inodes systemwide at any given
point in time.

(This seems like a rather strong limitation; do your current tests
open more than 64 files?
It would also point to an obvious way to make the projects harder by
specifically disallowing that inode data be locked in memory during
the entire time an inode is kept open.)

 - Godmar

From: Godmar Back <godmar@gmail.com>
Subject: on caching in project 4
To: Ben Pfaff <blp@cs.stanford.edu>
Date: Mon, 9 Jan 2006 20:58:01 -0500

here's an idea for future semesters.

I'm in the middle of project 4, I've started by implementing a buffer
cache and plugging it into the existing filesystem.  Along the way I
was wondering how we could test the cache.

Maybe one could adopt a similar testing strategy as in project 1 for
the MLQFS scheduler: add a function that reads "get_cache_accesses()"
and a function "get_cache_hits()".  Then create a version of pintos
that creates access traces for a to-be-determined workload.  Run an
off-line analysis that would determine how many hits a perfect cache
would have (MAX), and how much say an LRU strategy would give (MIN).
Then add a fudge factor to account for different index strategies and
test that the reported number of cache hits/accesses is within (MIN,
MAX) +/- fudge factor.

(As an aside - I am curious why you chose to use a clock-style
algorithm rather than the more straightforward LRU for your buffer
cache implementation in your sample solution. Is there a reason for
that?  I was curious to see if it made a difference, so I implemented
LRU for your cache implementation and ran the test workload of project
4 and printed cache hits/accesses.
I found that for that workload, the clock-based algorithm performs
almost identical to LRU (within about 1%, but I ran nondeterministally
with QEMU). I then reduced the cache size to 32 blocks and found again
the same performance, which raises the suspicion that the test
workload might not force any cache replacement, so the eviction
strategy doesn't matter.)

Godmar Back <godmar@gmail.com> writes:

> in your sample solution to P4, dir_reopen does not take any locks when
> changing a directory's open_cnt. This looks like a race condition to
> me, considering that dir_reopen is called from execute_process without
> any filesystem locks held.

* Get rid of rox--causes more trouble than it's worth

* Reconsider command line arg style--confuses everyone.

* Finish writing tour.

* Introduce a "yield" system call to speed up the syn-* tests.

via Godmar Back:

* Project 3 solution needs FS lock.

* Get rid of mmap syscall, add sbrk.

* Make backtrace program accept multiple object file arguments,
  e.g. add -u option to allow backtracing user program also.

* page-linear, page-shuffle VM tests do not use enough memory to force
  eviction.  Should increase memory consumption.

* Add FS persistence test(s).

* lock_acquire(), lock_release() don't need additional intr_dis/enable
  calls, because the semaphore protects lock->holder.


* process_death test needs improvement

* Internal tests.

* Improve automatic interpretation of exception messages.

* Userprog project:

  - Mark read-only pages as actually read-only in the page table.  Or,
    since this was consistently rated as the easiest project by the
    students, require them to do it.

  - Don't provide per-process pagedir implementation but only
    single-process implementation and require students to implement
    the separation?  This project was rated as the easiest after all.
    Alternately we could just remove the synchronization on pid
    selection and check that students fix it.

* Filesys project:

  - Need a better way to measure performance improvement of buffer
    cache.  Some students reported that their system was slower with
    cache--likely, Bochs doesn't simulate a disk with a realistic
    speed.

* Documentation:

  - Add "Digging Deeper" sections that describe the nitty-gritty x86
    details for the benefit of those interested.

  - Add explanations of what "real" OSes do to give students some
    perspective.

* Assignments:

  - Add extra credit:

    . Low-level x86 stuff, like paged page tables.

    . Specifics on how to implement sbrk, malloc.

    . Other good ideas.

    . opendir/readdir/closedir

    . everything needed for getcwd()