Try to clarify synchronization.

[pintos-anon] / doc / threads.texi
diff --git a/doc/threads.texi b/doc/threads.texi

index 5e5335a1d66bda761f571278446b96907e661d5a..0b6bd701ff70acb6859c264959506acf8b14ae9a 100644 (file)
--- a/doc/threads.texi
+++ b/doc/threads.texi
@@ -77,10 +77,8 @@ single-step from there.@footnote{@command{gdb} might tell you that
  @func{schedule} doesn't exist, which is arguably a @command{gdb} bug.
  You can work around this by setting the breakpoint by filename and
  line number, e.g.@: @code{break thread.c:@var{ln}} where @var{ln} is
-the line number of the first declaration in @func{schedule}.
-Alternatively you can recompile with optimization turned off, by
-removing @samp{-O3} from the @code{CFLAGS} line in
-@file{Make.config}.}  Be sure to keep track of each thread's address
+the line number of the first declaration in @func{schedule}.}  Be sure
+to keep track of each thread's address
  and state, and what procedures are on the call stack for each thread.
  You will notice that when one thread calls @func{switch_threads},
  another thread starts running, and the first thing the new thread does
@@ -295,7 +293,8 @@ program twice and have it do exactly the same thing.  On second and
  later runs, you can make new observations without having to discard or
  verify your old observations.  This property is called
  ``reproducibility.''  The simulator we use, Bochs, can be set up for
-reproducibility, and that's the way that @command{pintos} invokes it.
+reproducibility, and that's the way that @command{pintos} invokes it
+by default.
  
  Of course, a simulation can only be reproducible from one run to the
  next if its input is the same each time.  For simulating an entire
@@ -313,50 +312,80 @@ thread switches.  That means that running the same test several times
  doesn't give you any greater confidence in your code's correctness
  than does running it only once.
  
-So, to make your code easier to test, we've added a feature to Bochs
-that makes timer interrupts come at random intervals, but in a
-perfectly predictable way.  In particular, if you invoke
-@command{pintos} with the option @option{-j @var{seed}}, timer
+So, to make your code easier to test, we've added a feature, called
+``jitter,'' to Bochs, that makes timer interrupts come at random
+intervals, but in a perfectly predictable way.  In particular, if you
+invoke @command{pintos} with the option @option{-j @var{seed}}, timer
  interrupts will come at irregularly spaced intervals.  Within a single
  @var{seed} value, execution will still be reproducible, but timer
  behavior will change as @var{seed} is varied.  Thus, for the highest
  degree of confidence you should test your code with many seed values.
  
+On the other hand, when Bochs runs in reproducible mode, timings are not
+realistic, meaning that a ``one-second'' delay may be much shorter or
+even much longer than one second.  You can invoke @command{pintos} with
+a different option, @option{-r}, to make it set up Bochs for realistic
+timings, in which a one-second delay should take approximately one
+second of real time.  Simulation in real-time mode is not reproducible,
+and options @option{-j} and @option{-r} are mutually exclusive.
+
  @node Tips
  @section Tips
  
-There should be no busy-waiting in any of your solutions to this
-assignment.  Furthermore, resist the temptation to directly disable
-interrupts in your solution by calling @func{intr_disable} or
-@func{intr_set_level}, although you may find doing so to be useful
-while debugging.  Instead, use semaphores, locks and condition
-variables to solve synchronization problems.  Hint: read the comments
-in @file{threads/synch.h} if you're unsure what synchronization
-primitives may be used in what situations.
-
-Given some designs of some problems, there may be one or two instances
-in which it is appropriate to directly change the interrupt levels
-instead of relying on the given synchroniztion primitives.  This must
-be justified in your @file{DESIGNDOC} file.  If you're not sure you're
-justified, ask!
-
-While all parts of this assignment are required if you intend to earn
-full credit on this project, keep in mind that Problem 1-2 (Join) will
-be needed for future assignments, so you'll want to get this one
-right.  We don't give out solutions, so you're stuck with your Join
-code for the whole quarter.  Problem 1-1 (Alarm Clock) could be very
-handy, but not strictly required in the future.  The upshot of all
-this is that you should focus heavily on making sure that your
-implementation of @func{thread_join} works correctly, since if it's
-broken, you will need to fix it for future assignments.  The other
-parts can be turned off in the future if you find you can't make them
-work quite right.
-
-Also keep in mind that Problem 1-4 (the MLFQS) builds on the features you
-implement in Problem 1-3, so to avoid unnecessary code duplication, it
+@itemize @bullet
+@item
+There should be no busy waiting in any of your solutions to this
+assignment.  We consider a tight loop that calls @func{thread_yield}
+to be one form of busy waiting.
+
+@item
+Proper synchronization is an important part of the solutions to these
+problems.  It is tempting to synchronize all your code by turning off
+interrupts with @func{intr_disable} or @func{intr_set_level}, because
+this eliminates concurrency and thus the possibility for race
+conditions, but @strong{don't}.  Instead, use semaphores, locks, and
+condition variables to solve the bulk of your synchronization
+problems.  Read the tour section on synchronization
+(@pxref{Synchronization}) or the comments in @file{threads/synch.c} if
+you're unsure what synchronization primitives may be used in what
+situations.
+
+You might run into a few situations where interrupt disabling is the
+best way to handle synchronization.  If so, you need to explain your
+rationale in your design documents.  If you're unsure whether a given
+situation justifies disabling interrupts, talk to the TAs, who can
+help you decide on the right thing to do.
+
+Disabling interrupts can be useful for debugging, if you want to make
+sure that a section of code is not interrupted.  You should remove
+debugging code before turning in your project.
+
+@item
+All parts of this assignment are required if you intend to earn full
+credit on this project.  However, some will be more important in
+future projects:
+
+@itemize @minus
+@item
+Problem 1-1 (Alarm Clock) could be handy for later projects, but it is
+not strictly required.
+
+@item
+Problem 1-2 (Join) will be needed for future projects.  We don't give
+out solutions, so to avoid extra work later you should make sure that
+your implementation of @func{thread_join} works correctly.
+
+@item
+Problems 1-3 and 1-4 won't be needed for later projects.
+@end itemize
+
+@item
+Problem 1-4 (MLFQS) builds on the features you
+implement in Problem 1-3.  To avoid unnecessary code duplication, it
  would be a good idea to divide up the work among your team members
  such that you have Problem 1-3 fully working before you begin to tackle
  Problem 1-4.
+@end itemize
  
  @node Problem 1-1 Alarm Clock
  @section Problem 1-1: Alarm Clock
@@ -378,11 +407,21 @@ advanced far enough.  This is undesirable because it wastes time that
  could potentially be used more profitably by another thread.  Your
  solution should not busy wait.
  
-The argument to @func{timer_sleep} is expressed in timer ticks, not
-in milliseconds or another unit.  There are @code{TIMER_FREQ} timer
+The argument to @func{timer_sleep} is expressed in timer ticks, not in
+milliseconds or any another unit.  There are @code{TIMER_FREQ} timer
  ticks per second, where @code{TIMER_FREQ} is a macro defined in
  @code{devices/timer.h}.
  
+Separate functions @func{timer_msleep}, @func{timer_usleep}, and
+@func{timer_nsleep} do exist for sleeping a specific number of
+milliseconds, microseconds, or nanoseconds, respectively, but these will
+call @func{timer_sleep} automatically when necessary.  You do not need
+to modify them.
+
+If your delays seem too short or too long, reread the explanation of the
+@option{-r} option to @command{pintos} (@pxref{Debugging versus
+Testing}).
+
  @node Problem 1-2 Join
  @section Problem 1-2: Join
  
@@ -397,8 +436,8 @@ Incidentally, we don't use @code{struct thread *} as
  @func{thread_join}'s parameter type because a thread pointer is not
  unique over time.  That is, when a thread dies, its memory may be,
  whether immediately or much later, reused for another thread.  If
-thread A over time had two children B and C that were stored at the
-same address, then @code{thread_join(@var{B})} and
+thread @var{A} over time had two children @var{B} and @var{C} that
+were stored at the same address, then @code{thread_join(@var{B})} and
  @code{thread_join(@var{C})} would be ambiguous.  Introducing a thread
  id or @dfn{tid}, represented by type @code{tid_t}, that is
  intentionally unique over time solves the problem.  The provided code
@@ -447,9 +486,9 @@ Implement priority scheduling in Pintos.  Priority scheduling is a key
  building block for real-time systems.  Implement functions
  @func{thread_set_priority} to set the priority of the running thread
  and @func{thread_get_priority} to get the running thread's priority.
-(A thread can examine and modify only its own priority.)  There are
-already prototypes for these functions in @file{threads/thread.h},
-which you should not change.
+(This API only allows a thread to examine and modify its own
+priority.)  There are already prototypes for these functions in
+@file{threads/thread.h}, which you should not change.
  
  Thread priority ranges from @code{PRI_MIN} (0) to @code{PRI_MAX} (59).
  The initial thread priority is passed as an argument to
@@ -476,7 +515,7 @@ A partial fix for this problem is to have the waiting thread
  the lock, then recall the donation once it has acquired the lock.
  Implement this fix.
  
-You will need to account for all different orders that priority
+You will need to account for all different orders in which priority
  donation and inversion can occur.  Be sure to handle multiple
  donations, in which multiple priorities are donated to a thread.  You
  must also handle nested donation: given high, medium, and low priority
@@ -487,8 +526,13 @@ that @var{L} holds, then both @var{M} and @var{L} should be boosted to
  
  You only need to implement priority donation when a thread is waiting
  for a lock held by a lower-priority thread.  You do not need to
-implement this fix for semaphores, condition variables or joins.
-However, you do need to implement priority scheduling in all cases.
+implement this fix for semaphores, condition variables, or joins,
+although you are welcome to do so.  However, you do need to implement
+priority scheduling in all cases.
+
+You may assume a static priority for priority donation, that is, it is
+not necessary to ``re-donate'' a thread's priority if it changes
+(although you are free to do so).
  
  @node Problem 1-4 Advanced Scheduler
  @section Problem 1-4: Advanced Scheduler
@@ -503,10 +547,6 @@ relative to the original Pintos scheduling algorithm (round robin) for
  at least one workload of your own design (i.e.@: in addition to the
  provided test).
  
-You may assume a static priority for this problem. It is not necessary
-to ``re-donate'' a thread's priority if it changes (although you are
-free to do so).
-
  You must write your code so that we can turn the MLFQS on and off at
  compile time.  By default, it must be off, but we must be able to turn
  it on by inserting the line @code{#define MLFQS 1} in
@@ -555,7 +595,7 @@ latency, which can make a machine feel sluggish if taken too far.
  Therefore, in general, setting the interrupt level should be used
  sparingly.  Also, any synchronization problem can be easily solved by
  turning interrupts off, since while interrupts are off, there is no
-concurrency, so there's no possibility for race condition.
+concurrency, so there's no possibility for race conditions.
  
  To make sure you understand concurrency well, we are discouraging you
  from taking this shortcut at all in your solution.  If you are unable
@@ -636,7 +676,8 @@ off interrupts.
  @item
  Examples of synchronization mechanisms have been presented in lecture.
  Going over these examples should help you understand when each type is
-useful or needed.
+useful or needed.  @xref{Synchronization}, for specific information
+about synchronization in Pintos.
  @end enumerate
  
  @item
@@ -650,10 +691,11 @@ second should be good for almost 2,924,712,087 years.
  @item
  @b{The test program mostly works but reports a few out-of-order
  wake ups.  I think it's a problem in the test program.  What gives?}
+@anchor{Out of Order 1-1}
  
  This test is inherently full of race conditions.  On a real system it
-wouldn't work perfectly all the time either.  However, you can help it
-work more reliably:
+wouldn't work perfectly all the time either.  There are a few ways you
+can help it work more reliably:
  
  @itemize @bullet
  @item
@@ -663,17 +705,16 @@ Make time slices longer by increasing @code{TIME_SLICE} in
  @item
  Make the timer tick more slowly by decreasing @code{TIMER_FREQ} in
  @file{timer.h} to its minimum value of 19.
+@end itemize
+
+The former two changes are only desirable for testing problem 1-1 and
+possibly 1-3.  You should revert them before working on other parts
+of the project or turn in the project.
  
  @item
-Increase the serial output speed to the maximum of 115200 bps by
-modifying the call to @func{set_serial} in @func{serial_init_poll} in
-@file{devices/serial.c}.
-@end itemize
+@b{Should @file{p1-1.c} be expected to work with the MLFQS turned on?}
  
-The former two changes are only desirable for testing problem 1-1.  You
-should revert them before working on other parts of the project or turn
-in the project.  The latter is harmless, so you can retain it or revert
-it at your option.
+No.  The MLFQS will adjust priorities, changing thread ordering.
  @end enumerate
  
  @node Problem 1-2 Join FAQ
@@ -763,44 +804,32 @@ Yes.  Same scenario as above except L gets blocked waiting on a new
  lock when H restores its priority.
  
  @item
-@b{Why is pubtest3's FIFO test skipping some threads! I know my scheduler
-is round-robin'ing them like it's supposed to!  Our output is like this:}
-
-@example
-Thread 0 goes.
-Thread 2 goes.
-Thread 3 goes.
-Thread 4 goes.
-Thread 0 goes.
-Thread 1 goes.
-Thread 2 goes.
-Thread 3 goes.
-Thread 4 goes.
-@end example
-
-@noindent @b{which repeats 5 times and then}
-
-@example
-Thread 1 goes.
-Thread 1 goes.
-Thread 1 goes.
-Thread 1 goes.
-Thread 1 goes.
-@end example
-
-This happens because context switches are being invoked by the test
-when it explicitly calls @func{thread_yield}.  However, the time
-slice timer is still alive and so, every tick (by default), thread 1
-gets switched out (caused by @func{timer_interrupt} calling
-@func{intr_yield_on_return}) before it gets a chance to run its
-mainline.  It is by coincidence that Thread 1 is the one that gets
-skipped in our example.  If we use a different jitter value, the same
-behavior is seen where a thread gets started and switched out
-completely.
-
-Solution: Increase the value of @code{TIME_SLICE} in
-@file{devices/timer.c} to a very high value, such as 10000, to see
-that the threads will round-robin if they aren't interrupted.
+@b{Why is @file{p1-3.c}'s FIFO test skipping some threads?  I know my
+scheduler is round-robin'ing them like it's supposed to.   Our output
+starts out okay, but toward the end it starts getting out of order.}
+
+The usual problem is that the serial output buffer fills up.  This is
+causing serial_putc() to block in thread @var{A}, so that thread
+@var{B} is scheduled.  Thread @var{B} immediately tries to do output
+of its own and blocks on the serial lock (which is held by thread
+@var{A}).  Now that we've wasted some time in scheduling and locking,
+typically some characters have been drained out of the serial buffer
+by the interrupt handler, so thread @var{A} can continue its output.
+After it finishes, though, some other thread (not @var{B}) is
+scheduled, because thread @var{B} was already scheduled while we
+waited for the buffer to drain.
+
+There's at least one other possibility.  Context switches are being
+invoked by the test when it explicitly calls @func{thread_yield}.
+However, the time slice timer is still alive and so, every tick (by
+default), a thread gets switched out (caused by @func{timer_interrupt}
+calling @func{intr_yield_on_return}) before it gets a chance to run
+@func{printf}, effectively skipping it.  If we use a different jitter
+value, the same behavior is seen where a thread gets started and
+switched out completely.
+
+Normally you can fix these problems using the same techniques
+suggested on problem 1-1 (@pxref{Out of Order 1-1}).
  
  @item
  @b{What happens when a thread is added to the ready list which has
@@ -814,6 +843,20 @@ solution must act this way.
  its priority has been increased by a donation?}
  
  The higher (donated) priority.
+
+@item
+@b{Should @file{p1-3.c} be expected to work with the MLFQS turned on?}
+
+No.  The MLFQS will adjust priorities, changing thread ordering.
+
+@item
+@b{@func{printf} in @func{sema_up} or @func{sema_down} makes the
+system reboot!}
+
+Yes.  These functions are called before @func{printf} is ready to go.
+You could add a global flag initialized to false and set it to true
+just before the first @func{printf} in @func{main}.  Then modify
+@func{printf} itself to return immediately if the flag isn't set.
  @end enumerate
  
  @node Problem 1-4 Advanced Scheduler FAQ