Initial projects.

author Ben Pfaff <blp@cs.stanford.edu>

Thu, 9 Sep 2004 22:37:50 +0000 (22:37 +0000)

committer Ben Pfaff <blp@cs.stanford.edu>

Thu, 9 Sep 2004 22:37:50 +0000 (22:37 +0000)
author Ben Pfaff <blp@cs.stanford.edu>
Thu, 9 Sep 2004 22:37:50 +0000 (22:37 +0000)
committer Ben Pfaff <blp@cs.stanford.edu>
Thu, 9 Sep 2004 22:37:50 +0000 (22:37 +0000)
diff --git a/doc/filesys.texi b/doc/filesys.texi

new file mode 100644 (file)

index 0000000..36d92e3
--- /dev/null
+++ b/doc/filesys.texi
@@ -0,0 +1,447 @@
+@node Project 4--File Systems, , Project 3--Virtual Memory, Top
+@chapter Project 4: File Systems
+
+In the previous two assignments, you made extensive use of a
+filesystem without actually worrying about how it was implemented
+underneath.  For this last assignment, you will fill in the
+implementation of the filesystem.  You will be working primarily in
+the @file{filesys} directory.
+
+You should build on the code you wrote for the previous assignments.
+However, if you wish, you may turn off your VM features, as they are
+not vital to making the filesystem work.  (You will need to edit
+@file{filesys/Makefile.vars} to fully disable VM.)  All of the
+functionality needed for project 2 (argument passing, syscalls and
+multiprogramming) must work in your filesys submission.
+
+On the other hand, one of the particular charms of working on
+operating systems is being able to use what you build, and building
+full-featured systems.  Therefore, you should strive to make all the
+parts work together so that you can run VM and your filesystem at the
+same time.  Plus, keeping VM is a great way to stress-test your
+filesystem implementation.
+
+FIXME FIXME FIXME
+The first step is to understand the default filesystem provided by the
+base code.  The first things you should look at are
+@file{threads/init.c} and @file{filesys/fsutil.c}: there are special
+command line arguments to Pintos which are used to format and
+manipulate the emulated disk.  Specifically, @option{-f} formats the
+file system disk for use with Pintos, and @option{-cp @var{src}
+@var{dst}} copies @var{src} on the Unix filesystem to @var{dst} on the
+file system disk.  With this information, you should be able to
+compile the base code and try out the command: @samp{pintos -f -cp
+./small small} to copy the file @file{small} from your working
+directory into the Pintos file system.
+
+FIXME FIXME FIXME
+One thing you should realize immediately is that, until you use the
+above operation to copy the shell (or whatever your initial program
+is) to the emulated disk, Pintos will be unable to do very much useful
+work (it'll try to open the shell and fail, thereby quitting out).  You
+will also find that you won't be able to do interesting things until
+you copy a variety of programs to the disk.  A useful technique, once
+your inode formats are finalized, is to create a clean reference disk
+and copy that over whenever you trash your disk beyond a useful state
+(which will probably happen quite often while debugging).
+
+@menu
+* File System New Code::        
+* Problem 4-1 Large Files::     
+* Problem 4-2 File Growth::     
+* Problem 4-3 Subdirectories::  
+* Problem 4-4 Buffer Cache::    
+* File System Design Document Requirements::  
+* File System FAQs::            
+@end menu
+
+@node File System New Code, Problem 4-1 Large Files, Project 4--File Systems, Project 4--File Systems
+@section New Code
+
+Here are some files that are probably new to you.  These are in the
+@file{filesys} directory except where indicated:
+
+@table @file
+@item fsutil.c
+Simple utilities for the filesystem that are accessible from the
+kernel command line.
+
+@item filesys.h
+@itemx filesys.c
+Top-level interface to the file system.
+
+@item directory.h
+@itemx directory.c
+Translates file names to disk file headers; the
+directory data structure is stored as a file.
+
+@item filehdr.h
+@itemx filehdr.c
+Manages the data structure representing the layout of a
+file's data on disk.
+
+@item file.h
+@itemx file.c
+Translates file reads and writes to disk sector reads
+and writes.
+
+@item devices/disk.h
+@itemx devices/disk.c
+Provides access to the physical disk, abstracting away the rather
+awful IDE interface.
+
+@item lib/kernel/bitmap.h
+@itemx lib/kernel/bitmap.c
+A bitmap data structure along with routines for reading and writing
+the bitmap to disk files.
+@end table
+
+Our file system has a Unix-like interface, so you may also wish to
+read the Unix man pages for @code{creat}, @code{open}, @code{close},
+@code{read}, @code{write}, @code{lseek}, and @code{unlink}.  Our file
+system has calls that are similar, but not identical, to these.  The
+file system translates these calls into physical disk operations.  
+
+All the basic functionality is there in the code above, so that the
+filesystem is usable right off the bat.  In fact, you've been using it
+in the previous two projects.  However, it has severe limitations
+which you will remove.
+
+While most of your work will be in @file{filesys}, you should be
+prepared for interactions with all previous parts (as usual).
+
+@node Problem 4-1 Large Files, Problem 4-2 File Growth, File System New Code, Project 4--File Systems
+@section Problem 4-1: Large Files
+
+Modify the file system to allow the maximum size of a file to be as
+large as the disk.  You can assume that the disk will not be larger
+than 8 MB.  In the basic file system, each file is limited to a file
+size of just under 64 kB.  Each file has a header (@code{struct
+filehdr}) that is a table of direct pointers to the disk blocks for
+that file.  Since the header is stored in one disk sector, the maximum
+size of a file is limited by the number of pointers that will fit in
+one disk sector.  Increasing the limit to 8 MB will require you to
+implement doubly-indirect blocks.
+
+@node Problem 4-2 File Growth, Problem 4-3 Subdirectories, Problem 4-1 Large Files, Project 4--File Systems
+@section Problem 4-2: File Growth
+
+Implement extensible files.  In the basic file system, the file size
+is specified when the file is created.  One advantage of this is that
+the FileHeader data structure, once created, never changes.  In UNIX
+and most other file systems, a file is initially created with size 0
+and is then expanded every time a write is made off the end of the
+file.  Modify the file system to allow this.  As one test case, allow
+the root directory file to expand beyond its current limit of ten
+files.  Make sure that concurrent accesses to the file header remain
+properly synchronized.
+
+@node Problem 4-3 Subdirectories, Problem 4-4 Buffer Cache, Problem 4-2 File Growth, Project 4--File Systems
+@section Problem 4-3: Subdirectories
+
+Implement a hierarchical name space.  In the basic file system, all
+files live in a single directory.  Modify this to allow directories to
+point to either files or other directories.  To do this, you will need
+to implement routines that parse path names into a sequence of
+directories, as well as routines that change the current working
+directory and that list the contents of the current directory.  For
+performance, allow concurrent updates to different directories, but
+use mutual exclusion to ensure that updates to the same directory are
+performed atomically (for example, to ensure that a file is deleted
+only once).
+
+Make sure that directories can expand beyond their original size just
+as any other file can.
+
+To take advantage of hierarchical name spaces in user programs,
+provide the following syscalls:
+
+@table @code
+@item SYS_chdir
+@itemx bool chdir (const char *@var{dir})
+Attempts to change the current working directory of the process to
+@var{dir}, which may be either relative or absolute.  Returns true if
+successful, false on failure.
+
+@item SYS_mkdir
+@itemx bool mkdir (const char *dir)
+Attempts to create the directory named @var{dir}, which may be either
+relative or absolute.  Returns true if successful, false on failure.
+
+@item SYS_lsdir
+@itemx void lsdir (void)
+Prints a list of files in the current directory to @code{stdout}, one
+per line.
+@end table
+
+Also write the @command{ls} and @command{mkdir} user programs.  This
+is straightforward once the above syscalls are implemented.  If Unix,
+these are programs rather than built-in shell commands, but
+@command{cd} is a shell command.  (Why?)
+
+@node Problem 4-4 Buffer Cache, File System Design Document Requirements, Problem 4-3 Subdirectories, Project 4--File Systems
+@section Problem 4-4: Buffer Cache
+
+Modify the file system to keep a cache of file blocks.  When a request
+is made to read or write a block, check to see if it is stored in the
+cache, and if so, fetch it immediately from the cache without going to
+disk.  (Otherwise, fetch the block from disk into cache, evicting an
+older entry if necessary.)  You are limited to a cache no greater than
+64 sectors in size.  Be sure to choose an intelligent cache
+replacement algorithm.  Experiment to see what combination of use,
+dirty, and other information results in the best performance, as
+measured by the number of disk accesses.  (For example, metadata is
+generally more valuable to cache than data.)  Document your
+replacement algoritm in your design document.
+
+In addition to the basic file caching scheme, your implementation
+should also include the following enhancements:
+
+@table @b
+@item write-behind:
+Instead of always immediately writing modified data to disk, dirty
+blocks can be kept in the cache and written out sometime later.  Your
+buffer cache should write behind whenever a block is evicted from the
+cache.
+
+@item read-ahead:
+Your buffer cache should automatically fetch the next block of a file
+into the cache when one block of a file is read, in case that block is
+about to be read.
+@end table
+
+For each of these three optimizations, design a file I/O workload that
+is likely to benefit from the enhancement, explain why you expect it
+to perform better than on the original file system implementation, and
+demonstrate the performance improvement.
+
+Note that write-behind makes your filesystem more fragile in the face
+of crashes.  Therefore, you should implement some manner to
+periodically write all cached blocks to disk.  If you have
+@code{timer_msleep()} from the first project working, this is an
+excellent application for it.
+
+Likewise, read-ahead is only really useful when done asynchronously.
+That is, if a process wants disk block 1 from the file, it needs to
+block until disk block 1 is read in, but once that read is complete,
+control should return to the process immediately while the read
+request for disk block 2 is handled asynchronously.  In other words,
+the process will block to wait for disk block 1, but should not block
+waiting for disk block 2.
+
+FIXME
+When you're implementing this, please make sure you have a scheme for
+making any read-ahead and write-behind threads halt when Pintos is
+``done'' (when the user program has completed, etc), so that Pintos
+will halt normally and print its various statistics.
+
+@node File System Design Document Requirements, File System FAQs, Problem 4-4 Buffer Cache, Project 4--File Systems
+@section Design Document Requirements
+
+As always, submit a design document file summarizing your design.  Be
+sure to cover the following points :
+
+@itemize @bullet
+@item
+How did you structure your inodes? How many blocks did you access
+directly, via single-indirection, and/or via double-indirection?  Why?
+
+@item
+How did you structure your buffer cache? How did you perform a lookup
+in the cache? How did you choose elements to evict from the cache?
+
+@item
+How and when did you flush the cache?
+@end itemize
+
+@node File System FAQs,  , File System Design Document Requirements, Project 4--File Systems
+@section FAQ
+
+@enumerate 1
+@item
+@b{What extra credit opportunities are available for this assignment?}
+
+@itemize @bullet
+@item
+We'll give out extra credit to groups that implement Unix-style
+support for @file{.} and @file{..} in relative paths in their projects.
+
+@item
+We'll give some extra credit if you submit with VM enabled.  If you do
+this, make sure you show us that you can run multiple programs
+concurrently.  A particularly good demonstration is running
+@file{capitalize} (with a reduced words file that fits comfortably on
+your disk, of course).  So submit a file system disk that contains a
+VM-heavy program like @file{capitalize}, so we can try it out.  And also
+include the results in your test case file.
+
+We feel that you will be much more satisfied with your cs140 ``final
+product'' if you can get your VM working with your file system.  It's
+also a great stress test for your FS, but obviously you have to be
+pretty confident with your VM if you're going to submit this extra
+credit, since you'll still lose points for failing FS-related tests,
+even if the problem is in your VM code.
+
+@item
+A point of extra credit can be assigned if a user can recursively
+remove directories from the shell command prompt.  Note that the
+typical semantic is to just fail if a directory is not empty.
+@end itemize
+
+Make sure that you discuss any extra credit in your @file{README}
+file.  We're likely to miss it if it gets buried in your design
+document.
+
+@item
+@b{What exec modes for running Pintos do I absolutely need to
+support?}
+
+FIXME FIXME
+The most standard mode is to run your Pintos with all the command
+flags on one command line, like this: @samp{pintos -f -cp shell
+shell -ex "shell"}.  However, you also need to support these flags
+individually---especially since that's how the grader tests your
+program.  Thus, you should be able to run the above instead as:
+
+FIXME
+@example
+pintos -f
+pintos -cp shell shell
+pintos -ex "shell"
+@end example
+
+Note that this also provides a way for you to validate that your disks
+are really persistent.  This is a common problem with a write behind
+cache: if you don't shut down properly it will leave the disk in an
+inconsistent state.
+
+@item
+@b{Will you test our file system with a different @code{DISK_SECTOR_SIZE}?}
+
+No, @code{DISK_SECTOR_SIZE} will not change.
+
+@item
+@b{Will the @code{struct filehdr} take up space on the disk too?}
+
+Yes.  Anything stored in @code{struct filehdr} takes up space on disk,
+so you must include this in your calculation of how many entires will
+fit in a single disk sector.
+
+@item
+File Growth FAQs
+
+@enumerate 1
+@item
+@b{What is the largest file size that we are supposed to support?}
+
+The disk we create will be 8 MB or smaller.  However, individual files
+will have to be smaller than the disk to accommodate the metadata.
+You'll need to consider this when deciding your @code{struct filehdr}
+(inode) organization.
+@end enumerate
+
+@item
+Subdirectory FAQs
+
+@enumerate 1
+@item
+@b{What's the answer to the question in the spec about why
+@command{ls} and @command{mkdir} are user programs, while @command{cd}
+is a shell command?}
+
+Each process maintains its own current working directory, so it's much
+easier to change the current working directory of the shell process if
+@command{cd} is implemented as a shell command rather than as another
+user process.  In fact, Unix-like systems don't provide any way for
+one process to change another process's current working directory.
+
+@item
+@b{When the spec states that directories should be able to grow beyond
+ten files, does this mean that there can still be a set maximum number
+of files per directory that is greater than ten, or should directories
+now support unlimited growth (bounded by the maximum supported file
+size)?}
+
+We're looking for directories that can support arbitrarily large
+numbers of files.  Now that directories can grow, we want you to
+remove the concept of a preset maximum file limit.
+
+@item
+@b{When should the @code{lsdir} system call return?}
+
+The @code{lsdir} system call should not return until after the
+directory has been printed.  Here's a code fragment, and the desired
+output:
+
+@example
+printf ("Start of directory\n");
+lsdir ();
+printf ("End of directory\n");
+@end example
+
+This code should create the following output:
+
+@example
+Start of directory
+...  directory contents ...
+End of directory
+@end example
+
+@item
+@b{Do we have to implement both absolute and relative pathnames?}
+
+Yes.  Implementing @file{.} and @file{..} is extra credit, though.
+
+@item
+@b{Should @code{remove()} also be able to remove directories?}
+
+Yes.  The @code{remove} system call should handle removal of both
+regular files and directories.  You may assume that directories can
+only be deleted if they are empty, as in Unix.
+@end enumerate
+
+@item
+Buffer Cache FAQs
+
+@enumerate 1
+@item
+@b{We're limited to a 64-block cache, but can we also keep a copy of
+each @code{struct filehdr} for an open file inside @code{struct file},
+the way the stub code does?}
+
+No, you shouldn't keep any disk sectors stored anywhere outside the
+cache.  That means you'll have to change the way the file
+implementation accesses its corresponding inode right now, since it
+currently just creates a new @code{struct filehdr} in its constructor
+and reads the corresponding sector in from disk when it's created.
+
+There are two reasons for not storing inodes in @code{struct file}.
+First, keeping extra copies of inodes would be cheating the 64-block
+limitation that we place on your cache.  Second, if two processes have
+the same file open, you will create a huge synchronization headache
+for yourself if each @code{struct file} has its own copy of the inode.
+
+Note that you can store pointers to inodes in @code{struct file} if
+you want, and you can store some other small amount of information to
+help you find the inode when you need it.
+
+Similarly, if you want to store one block of data plus some small
+amount of metadata for each of your 64 cache entries, that's fine.
+
+@item
+@b{But why can't we store copies of inodes in @code{struct file}? We
+don't understand the answer to the previous question.}
+
+The issue regarding storing @code{struct filehdr}s has to do with
+implementation of the buffer cache.  Basically, you can't store a
+@code{struct filehdr *} in @code{struct filehdr}.  Each time you need
+to read a @code{struct filehdr}, you'll have to get it either from the
+buffer cache or from disk.
+
+If you look at @code{file_read_at()}, it uses @code{hdr} directly
+without having first read in that sector from wherever it was in the
+storage hierarchy.  You are no longer allowed to do this.  You will
+need to change @code{file_read_at} (and similar functions) so that it
+reads @code{hdr} from the storage hierarchy before using it.
+@end enumerate
+@end enumerate
diff --git a/doc/projects.texi b/doc/projects.texi

new file mode 100644 (file)

index 0000000..bc5cd13
--- /dev/null
+++ b/doc/projects.texi
@@ -0,0 +1,14 @@
+@node Top, Project 1--Threads, (dir), (dir)
+@top Pintos Projects
+
+@menu
+* Project 1--Threads::          
+* Project 2--User Programs::    
+* Project 3--Virtual Memory::   
+* Project 4--File Systems::     
+@end menu
+
+@include threads.texi
+@include userprog.texi
+@include vm.texi
+@include filesys.texi
diff --git a/doc/threads.texi b/doc/threads.texi

new file mode 100644 (file)

index 0000000..c1fb41d
--- /dev/null
+++ b/doc/threads.texi
@@ -0,0 +1,574 @@
+@node Project 1--Threads, Project 2--User Programs, Top, Top
+@chapter Project 1: Threads
+
+In this assignment, we give you a minimally functional thread system.
+Your job is to extend the functionality of this system to gain a
+better understanding of synchronization problems. Additionally, you
+will use at least part of this increased functionality in future
+assignments.
+
+You will be working in the @file{threads} and @file{devices}
+directories for this assignment.  Compilation should be done in the
+@file{threads} directory.
+
+@menu
+* Understanding Threads::       
+* Debugging versus Testing::    
+* Tips::                        
+* Problem 1-1 Alarm Clock::     
+* Problem 1-2 Join::            
+* Problem 1-3 Priority Scheduling::  
+* Problem 1-4 Advanced Scheduler::  
+* Threads FAQ::                 
+@end menu
+
+@node Understanding Threads, Debugging versus Testing, Project 1--Threads, Project 1--Threads
+@section Understanding Threads
+
+The first step is to read and understand the initial thread system.
+Pintos, by default, implements thread creation and thread completion,
+a simple scheduler to switch between threads, and synchronization
+primitives (semaphores, locks, and condition variables). 
+@c FIXME: base system doesn't do anything. Debugger sucks.
+However, there's a lot of magic going on in some of this code, so you
+should compile and run the base system to see a simple test of the
+code.  You should trace the execution using your favorite debugger to
+get a sense for what's going on.
+
+When a thread is created, you are creating a new context to be
+scheduled. You provide a function to be run in this context as an
+argument to @code{thread_create()}. The first time the thread is
+scheduled and runs, it will start from the beginning of that function
+and execute it in the context. When that function returns, that thread
+completes. Each thread, therefore, acts like a mini-program running
+inside Pintos, with the function passed to @code{thread_create()}
+acting like @code{main()}.
+
+At any given time, Pintos is running exactly one thread, with the
+others switched out.  The scheduler decides which thread to run next
+when it needs to switch between them.  (If no thread is ready to run
+at any given time, then the special ``idle'' thread runs.)  The
+synchronization primitives are used to force context switches when one
+thread needs to wait for another thread to do something.
+
+The exact mechanics of a context switch are pretty gruesome and have
+been provided for you in @file{switch.S} (this is 80@var{x}86
+assembly; don't worry about understanding it). It involves saving the
+state of the currently running thread and restoring the state of the
+thread we're switching to.
+@c FIXME 
+Slowly trace through a context switch to see what happens.  Be sure to
+keep track of each thread's address and state, and what procedures are
+on the call stack for each thread. You will notice that when one
+thread calls @code{switch_threads()}, another thread starts running,
+and the first thing the new thread does is to return from
+@code{switch_threads()}. We realize this comment will seem cryptic to
+you at this point, but you will understand threads once you understand
+why the @code{switch_threads()} that gets called is different from the
+@code{switch_threads()} that returns.
+
+@strong{Warning}: In Pintos, each thread is assigned a small,
+fixed-size execution stack just under @w{4 kB} in size.  The kernel
+does try to detect stack overflow, but it cannot always succeed.  You
+ma cause bizarre problems, such as mysterious kernel panics, if you
+declare large data structures as non-static local variables,
+e.g. @samp{int buf[1000];}.  Alternatives to stack allocation include
+the page allocator in @file{threads/palloc.c} and the block allocator
+in @file{threads/malloc.c}.  Note that the page allocator doles out
+@w{4 kB} chunks and that @code{malloc()} has a @w{2 kB} block size
+limit.  If you need larger chunks, consider using a linked structure
+instead.
+
+@node Debugging versus Testing, Tips, Understanding Threads, Project 1--Threads
+@section Debugging versus Testing
+
+When you're debugging code, it's useful to be able to be able to run a
+program twice and have it do exactly the same thing.  On second and
+later runs, you can make new observations without having to discard or
+verify your old observations.  This property is called
+``reproducibility.''  The simulator we use, Bochs, can be set up for
+reproducibility.  If you use the Bochs configuration files we provide,
+which specify @samp{ips: @var{n}} where @var{n} is a number of
+simulated instructions per second, your simulations can be
+reproducible.
+
+Of course, a simulation can only be reproducible from one run to the
+next if its input is the same each time.  For simulating an entire
+computer, as we do, this means that every part of the computer must be
+the same.  For example, you must use the same disks, the same version
+of Bochs, and you must not hit any keys on the keyboard (because you
+could not be sure to hit them at exactly the same point each time)
+during the runs.
+
+While reproducibility is useful for debugging, it is a problem for
+testing thread synchronization, an important part of this project.  In
+particular, when Bochs is set up for reproducibility, timer interrupts
+will come at perfectly reproducible points, and therefore so will
+thread switches.  That means that running the same test several times
+doesn't give you any greater confidence in your code's correctness
+than does running it only once.
+
+So, to make your code easier to test, we've added a feature to Bochs
+that makes timer interrupts come at random intervals, but in a
+perfectly predictable way.  In particular, if you put a line
+@samp{ips-jitter: @var{seed}}, where @var{seed} is an integer, into
+your Bochs configuration file, then timer interrupts will come at
+irregularly spaced intervals.  Within a single @var{seed} value,
+execution will still be reproducible, but timer behavior will change
+as @var{seed} is varied.  Thus, for the highest degree of confidence
+you should test your code with many seed values.
+
+@node Tips, Problem 1-1 Alarm Clock, Debugging versus Testing, Project 1--Threads
+@section Tips
+
+There should be no busy-waiting in any of your solutions to this
+assignment.  Furthermore, resist the temptation to directly disable
+interrupts in your solution by calling @code{intr_disable()} or
+@code{intr_set_level()}, although you may find doing so to be useful
+while debugging.  Instead, use semaphores, locks and condition
+variables to solve synchronization problems.  Hint: read the comments
+in @file{threads/synch.h} if you're unsure what synchronization
+primitives may be used in what situations.
+
+Given some designs of some problems, there may be one or two instances
+in which it is appropriate to directly change the interrupt levels
+instead of relying on the given synchroniztion primitives.  This must
+be justified in your @file{DESIGNDOC} file.  If you're not sure you're
+justified, ask!
+
+While all parts of this assignment are required if you intend to earn
+full credit on this project, keep in mind that Problem 2 (Join) will
+be needed for future assignments, so you'll want to get this one
+right.  We don't give out solutions, so you're stuck with your Join
+code for the whole quarter.  Problem 1 (Alarm Clock) could be very
+handy, but not strictly required in the future.  The upshot of all
+this is that you should focus heavily on making sure that your
+implementation of Join works correctly, since if it's broken, you will
+need to fix it for future assignments.  The other parts can be turned
+off in the future if you find you can't make them work quite right.
+
+Also keep in mind that Problem 4 (the MFQS) builds on the features you
+implement in Problem 3, so to avoid unnecessary code duplication, it
+would be a good idea to divide up the work among your team members
+such that you have Problem 3 fully working before you begin to tackle
+Problem 4.
+
+@node Problem 1-1 Alarm Clock, Problem 1-2 Join, Tips, Project 1--Threads
+@section Problem 1-2: Alarm Clock
+
+Improve the implementation of the timer device defined in
+@file{devices/timer.c} by reimplementing @code{timer_msleep(0}.
+Threads call @code{timer_msleep(@var{x})} to suspend execution until
+time has advanced by at least @w{@var{x} milliseconds}.  This is
+useful for threads that operate in real-time, for example, for
+blinking the cursor once per second.  There is no requirement that
+threads start running immediately after waking up; just put them on
+the ready queue after they have waited for approximately the right
+amount of time.
+
+A working implementation of this function is provided.  However, the
+version provided is poor, because it ``busy waits,'' that is, it spins
+in a tight loop checking the current time until the current time has
+advanced far enough.  This is undesirable because it wastes time that
+could potentially be used more profitably by another thread.  Your
+solution should not busy wait.
+
+The argument to @code{timer_msleep()} is expressed in milliseconds.
+You must convert it into timer ticks, rounding up.  The code provided
+does this acceptably; there is no need to change it.
+
+@node Problem 1-2 Join, Problem 1-3 Priority Scheduling, Problem 1-1 Alarm Clock, Project 1--Threads
+@section Problem 1-2: Join
+
+Implement @code{thread_join(struct thread *)} in
+@file{threads/thread.c}.  There is already a prototype for it in
+@file{threads/thread.h}, which you should not change.  This function
+causes the currently running thread to block until thread passed as an
+argument exits.  If A is the running thread and B is the argument,
+then we say that ``A joins B'' in this case.
+
+The model for @code{thread_join()} is the @command{wait} system call
+in Unix-like systems.  (Try reading the manpages.)  That system call
+can only be used by a parent process to wait for a child's death.  You
+should implement @code{thread_join()} to have the same restriction.
+That is, a thread may only join on its immediate children.
+
+A thread need not ever be joined.  Your solution should properly free
+all of a thread's resources, including its @code{struct thread},
+whether it is ever joined or not, and regardless of whether the child
+exits before or after its parent.  That is, a thread should be freed
+exactly once in all cases.
+
+Joining a given thread is idempotent.  That is, joining a thread T
+multiple times is equivalent to joining it once, because T has already
+exited at the time of the later joins.  Thus, joins on T after the
+first should return immediately.
+
+The behavior of calling @code{thread_join()} on an thread that is not
+the caller's child is undefined.  You need not handle this case
+gracefully.
+
+Consider all the ways a join can occur: nested joins (A joins B when B
+is joined on C), multiple joins (A joins B, then A joins C), and so
+on.  Does your join work if @code{thread_join()} is called on a thread
+that has not yet been scheduled for the first time?  You should handle
+all of these cases.  Write test code that demonstrates the cases your
+join works for.  Don't overdo the output volume, please!
+
+Be careful to program this function correctly.  You will need its
+functionality for project 2.
+
+@node Problem 1-3 Priority Scheduling, Problem 1-4 Advanced Scheduler, Problem 1-2 Join, Project 1--Threads
+@section Problem 1-3 Priority Scheduling
+
+Implement priority scheduling in Pintos.  Priority
+scheduling is a key building block for real-time systems.  Implement functions
+@code{thread_set_priority()} to set the priority of a thread and
+@code{thread_get_priority()} to get the priority of a thread.  There
+are already prototypes for these functions in @file{threads/thread.h},
+which you should not change.
+
+When a thread is added to the ready list that has a higher priority
+than the currently running thread, the current thread should
+immediately yield the processor to the new thread.  Similarly, when
+threads are waiting for a lock, semaphore or condition variable, the
+highest priority waiting thread should be woken up first.  A thread's
+priority may be set at any time, including while the thread is waiting
+on a lock, semaphore, or condition variable.
+
+One issue with priority scheduling is ``priority inversion'': if a
+high priority thread needs to wait for a low priority thread (for
+instance, for a lock held by a low priority thread, or in
+@code{thread_join()} for a thread to complete), and a middle priority
+thread is on the ready list, then the high priority thread will never
+get the CPU because the low priority thread will not get any CPU time.
+A partial fix for this problem is to have the waiting thread
+``donate'' its priority to the low priority thread while it is holding
+the lock, then recall the donation once it has acquired the lock.
+Implement this fix.
+
+You will need to account for all different orders that priority
+donation and inversion can occur.  Be sure to handle multiple
+donations, in which multiple priorities are donated to a thread.  You
+must also handle nested donation: given high, medium, and low priority
+threads H, M, and L, respectively, and supposing H is waiting on a
+lock that M holds and M is waiting on a lock that L holds, both M and
+L should be boosted to H's priority.
+
+You only need to implement priority donation when a thread is waiting
+for a lock held by a lower-priority thread. You do not need to
+implement this fix for semaphores, condition variables or joins.
+However, you do need to implement priority scheduling in all cases.
+
+@node Problem 1-4 Advanced Scheduler, Threads FAQ, Problem 1-3 Priority Scheduling, Project 1--Threads
+@section Problem 1-4 Advanced Scheduler
+
+Implement Solaris's multilevel feedback queue scheduler (MFQS), as
+explained below, to reduce the average response time for running jobs
+on your system. 
+@c FIXME need link
+
+Demonstrate that your scheduling algorithm reduces response time
+relative to the original Pintos scheduling algorithm (round robin) for
+at least one workload of your own design (i.e. in addition to the
+provided test).
+
+You may assume a static priority for this problem. It is not necessary
+to ``re-donate'' a thread's priority if it changes (although you are
+free to do so).
+
+@node Threads FAQ,  , Problem 1-4 Advanced Scheduler, Project 1--Threads
+@section FAQ
+
+@enumerate 1
+@item General FAQs
+
+@enumerate 1
+@item
+@b{I am adding a new @file{.h} or @file{.c} file.  How do I fix the
+@file{Makefile}s?}
+
+To add a @file{.c} file, edit the top-level @file{Makefile.build}.
+You'll want to add your file to variable @samp{@var{dir}_SRC}, where
+@var{dir} is the directory where you added the file.  For this
+project, that means you should add it to @code{threads_SRC}, or
+possibly @code{devices_SRC} if you put in the @file{devices}
+directory.  Then run @code{make}.  If your new file doesn't get
+compiled, run @code{make clean} and then try again.
+
+There is no need to edit the @file{Makefile}s to add a @file{.h} file.
+
+@item
+@b{If a thread finishes, should its children be terminated immediately,
+or should they finish normally?}
+
+You should feel free to decide what semantics you think this
+should have. You need only provide justification for your
+decision.
+
+@item
+@b{Why can't I disable interrupts?}
+
+Turning off interrupts should only be done for short amounts of time,
+or else you end up losing important things such as disk or input
+events.  Turning off interrupts also increases the interrupt handling
+latency, which can make a machine feel sluggish if taken too far.
+Therefore, in general, setting the interrupt level should be used
+sparingly.  Also, any synchronization problem can be easily solved by
+turning interrupts off, since while interrupts are off, there is no
+concurrency, so there's no possibility for race condition.
+
+To make sure you understand concurrency well, we are discouraging you
+from taking this shortcut at all in your solution.  If you are unable
+to solve a particular synchronization problem with semaphores, locks,
+or conditions, or think that they are inadequate for a particular
+reason, you may turn to disabling interrupts.  If you want to do this,
+we require in your design document a complete justification and
+scenario (i.e.@: exact sequence of events) to show why interrupt
+manipulation is the best solution.  If you are unsure, the TAs can
+help you determine if you are using interrupts too haphazardly.  We
+want to emphasize that there are only limited cases where this is
+appropriate.
+
+@item
+@b{Where might interrupt-level manipuation be appropriate?}
+
+You might find it necessary in some solutions to the Alarm problem.
+
+You might want it at one small point for the priority scheduling
+problem.  Note that it is not required to use interrupts for these
+problems.  There are other, equally correct solutions that do not
+require interrupt manipulation.  However, if you do manipulate
+interrupts and @strong{correctly and fully document it} in your design
+document, we will allow limited use of interrupt disabling.
+@end enumerate
+
+@item Alarm Clock FAQs
+
+@enumerate 1
+@item
+@b{Why can't I use most synchronization primitives in an interrupt
+handler?}
+
+As you've discovered, you cannot sleep in an external interrupt
+handler.  Since many lock, semaphore, and condition variable functions
+attempt to sleep, you won't be able to call those in
+@code{timer_interrupt()}.  You may still use those that never sleep.
+
+Having said that, you need to make sure that global data does not get
+updated by multiple threads simultaneously executing
+@code{timer_msleep()}.  Here are some pieces of information to think
+about:
+
+@enumerate a
+@item
+Interrupts are turned off while @code{timer_interrupt()} runs.  This
+means that @code{timer_interrupt()} will not be interrupted by a
+thread running in @code{timer_msleep()}.
+
+@item
+A thread in @code{timer_msleep()}, however, can be interrupted by a
+call to @code{timer_interrupt()}, except when that thread has turned
+off interrupts.
+
+@item
+Examples of synchronization mechanisms have been presented in lecture.
+Going over these examples should help you understand when each type is
+useful or needed.
+@end enumerate
+
+@item
+@b{What about timer overflow due to the fact that times are defined as
+integers? Do I need to check for that?}
+
+Don't worry about the possibility of timer values overflowing.  Timer
+values are expressed as signed 63-bit numbers, which at 100 ticks per
+second should be good for almost 2,924,712,087 years.
+@end enumerate
+
+@item Join FAQs
+
+@enumerate 1
+@item
+@b{Am I correct to assume that once a thread is deleted, it is no
+longer accessible by the parent (i.e.@: the parent can't call
+@code{thread_join(child)})?}
+
+A parent joining a child that has completed should be handled
+gracefully and should act as a no-op.
+@end enumerate
+
+@item Priority Scheduling FAQs
+
+@enumerate 1
+@item
+@b{Doesn't the priority scheduling lead to starvation? Or do I have to
+implement some sort of aging?}
+
+
+It is true that strict priority scheduling can lead to starvation
+because thread may not run if a higher-priority thread is runnable.
+In this problem, don't worry about starvation or any sort of aging
+technique.  Problem 4 will introduce a mechanism for dynamically
+changing thread priorities.
+
+This sort of scheduling is valuable in real-time systems because it
+offers the programmer more control over which jobs get processing
+time.  High priorities are generally reserved for time-critical
+tasks. It's not ``fair,'' but it addresses other concerns not
+applicable to a general-purpose operating system.
+
+@item
+@b{After a lock has been released, does the program need to switch to
+the highest priority thread that needs the lock (assuming that its
+priority is higher than that of the current thread)?}
+
+When a lock is released, the highest priority thread waiting for that
+lock should be unblocked and put on the ready to run list.  The
+scheduler should then run the highest priority thread on the ready
+list.
+
+@item
+@b{If a thread calls @code{thread_yield()} and then it turns out that
+it has higher priority than any other threads, does the high-priority
+thread continue running?}
+
+Yes.  If there is a single highest-priority thread, it continues
+running until it blocks or finishes, even if it calls
+@code{thread_yield()}.
+
+@item
+@b{If the highest priority thread is added to the ready to run list it
+should start execution immediately.  Is it immediate enough if I
+wait until next timer interrupt occurs?}
+
+The highest priority thread should run as soon as it is runnable,
+preempting whatever thread is currently running.
+
+@item
+@b{What happens to the priority of the donating thread?  Do the priorities
+get swapped?}
+
+No.  Priority donation only changes the priority of the low-priority
+thread.  The donating thread's priority stays unchanged.  Also note
+that priorities aren't additive: if thread A (with priority 5) donates
+to thread B (with priority 3), then B's new priority is 5, not 8.
+
+@item 
+@b{Can a thread's priority be changed while it is sitting on the ready
+queue?}
+
+Yes.  Consider this case: low-priority thread L currently has a lock
+that high-priority thread H wants.  H donates its priority to L (the
+lock holder).  L finishes with the lock, and then loses the CPU and is
+moved to the ready queue.  Now L's old priority is restored while it
+is in the ready queue.
+
+@item
+@b{Can a thread's priority change while it is sitting on the queue of a
+semaphore?}
+
+Yes.  Same scenario as above except L gets blocked waiting on a new
+lock when H restores its priority.
+
+@item
+@b{Why is pubtest3's FIFO test skipping some threads! I know my scheduler
+is round-robin'ing them like it's supposed to!  Our output is like this:}
+
+@example
+Thread 0 goes.
+Thread 2 goes.
+Thread 3 goes.
+Thread 4 goes.
+Thread 0 goes.
+Thread 1 goes.
+Thread 2 goes.
+Thread 3 goes.
+Thread 4 goes.
+@end example
+
+@noindent @b{which repeats 5 times and then}
+
+@example
+Thread 1 goes.
+Thread 1 goes.
+Thread 1 goes.
+Thread 1 goes.
+Thread 1 goes.
+@end example
+
+This happens because context switches are being invoked by the test
+when it explicitly calls @code{thread_yield()}.  However, the time
+slice timer is still alive and so, every tick (by default), thread 1
+gets switched out (caused by @code{timer_interrupt()} calling
+@code{intr_yield_on_return()}) before it gets a chance to run its
+mainline.  It is by coincidence that Thread 1 is the one that gets
+skipped in our example.  If we use a different jitter value, the same
+behavior is seen where a thread gets started and switched out
+completely.
+
+Solution: Increase the value of @code{TIME_SLICE} in
+@file{devices/timer.c} to a very high value, such as 10000, to see
+that the threads will round-robin if they aren't interrupted.
+
+@item
+@b{What happens when a thread is added to the ready list which has
+higher priority than the currently running thread?}
+
+The correct behavior is to immediately yield the processor.  Your
+solution must act this way.
+
+@item
+@b{What range of priorities should be supported and what should the
+default priority of a thread be?}
+
+Your implementation should support priorities from 0 through 59 and
+the default priority of a thread should be 29.
+@end enumerate
+
+@item Advanced Scheduler FAQs
+
+@enumerate 1
+@item
+@b{What is the interval between timer interrupts?}
+
+Timer interrupts occur @code{TIMER_FREQ} times per second.  You can
+adjust this value by editing @file{devices/timer.h}.  The default is
+100 Hz.
+
+@item
+@b{Do I have to modify the dispatch table?}
+
+No, although you are allowed to. It is possible to complete
+this problem (i.e.@: demonstrate response time improvement)
+without doing so.
+
+@item
+@b{When the scheduler changes the priority of a thread, how does this
+affect priority donation?}
+
+Short (official) answer: Don't worry about it. Your priority donation
+code may assume static priority assignment.
+
+Longer (unofficial) opinion: If you wish to take this into account,
+however, your design may end up being ``cleaner.''  You have
+considerable freedom in what actually takes place. I believe what
+makes the most sense is for scheduler changes to affect the
+``original'' (non-donated) priority.  This change may actually be
+masked by the donated priority.  Priority changes should only
+propagate with donations, not ``backwards'' from donees to donors.
+
+@item
+@b{What is meant by ``static priority''?}
+
+Once thread A has donated its priority to thread B, if thread A's
+priority changes (due to the scheduler) while the donation still
+exists, you do not have to change thread B's donated priority.
+However, you are free to do so.
+
+@item
+@b{Do I have to make my dispatch table user-configurable?}
+
+No.  Hard-coding the dispatch table values is fine.
+@end enumerate
+@end enumerate
diff --git a/doc/userprog.texi b/doc/userprog.texi

new file mode 100644 (file)

index 0000000..2b7edb4
--- /dev/null
+++ b/doc/userprog.texi
@@ -0,0 +1,689 @@
+@node Project 2--User Programs, Project 3--Virtual Memory, Project 1--Threads, Top
+@chapter Project 2: User Programs
+
+Now that you've worked with Pintos and are familiar with its
+infrastructure and thread package, it's time to start working on the
+parts of the system that will allow users to run programs on top of
+your operating system.  The base code already supports loading and
+running a single user program at a time with little interactivity
+possible.  You will allow multiple programs to be loaded in at once,
+and to interact with the OS via system calls.
+
+You will be working out of the @file{userprog} directory for this
+assignment.  However, you will also be interacting with almost every
+other part of the code for this assignment. We will describe the
+relevant parts below. If you are confident in your HW1 code, you can
+build on top of it.  However, if you wish you can start with a fresh
+copy of the code and re-implement @code{thread_join()}, which is the
+only part of project #1 required for this assignment.
+
+Up to now, all of the code you have written for Pintos has been part
+of the operating system kernel.  This means, for example, that all the
+test code from the last assignment ran as part of the kernel, with
+full access to privileged parts of the system.  Once we start running
+user programs on top of the operating system, this is no longer true.
+This project deals with consequences of the change.
+
+We allow more than one user program to run at a time.  Because user
+programs are written and compiled to work under the illusion that they
+have the entire machine, when you load into memory and run more than
+one process at a time, you must manage things correctly to maintain
+this illusion.
+
+Before we delve into the details of the new code that you'll be
+working with, you should probably undo the test cases from project 1.
+All you need to do is make sure the original
+@file{threads/pintostest.c} is in place.  This will stop the tests
+from being run.
+
+@menu
+* Project 2 Code to Hack::      
+* How User Programs Work::      
+* Global Requirements::         
+* Problem 2-1 Argument Passing::  
+* Problem 2-2 System Calls::    
+* User Programs FAQ::           
+* 80x86 Calling Convention::    
+* System Calls::                
+@end menu
+
+@node Project 2 Code to Hack, How User Programs Work, Project 2--User Programs, Project 2--User Programs
+@section Code to Hack
+
+The easiest way to get an overview of the programming you will be
+doing is to simply go over each part you'll be working with.  In
+@file{userprog}, you'll find a small number of files, but here is
+where the bulk of your work will be:
+
+@table @file
+@item addrspace.c
+@itemx addrspace.h
+An address space keeps track of all the data necessary to execute a
+user program.  Address space data is stored in @code{struct thread},
+but manipulated only by @file{addrspace.c}.  Address spaces need to
+keep track of things like paging information for the process (so that
+it knows which memory the process is using).  Address spaces also
+handle loading the program into memory and starting up the process's
+execution.
+
+@item syscall.c
+@itemx syscall.h
+Whenever a user process wants to access some kernel functionality, it
+needs to do so via a system call.  This is a skeleton system call
+handler.  Currently, it just prints a message and terminates the user
+process.  In part 2 of this project you will add code to do everything
+else needed by system calls.
+
+@item exception.c
+@itemx exception.h
+When a user process performs a privileged or prohibited operation, it
+traps into the kernel as an ``exception'' or ``fault.''@footnote{We
+will treat these terms as synonymous.  There is no standard
+distinction between them, although the Intel processor manuals define
+them slightly differently on 80@var{x}86.}  These files handle
+exceptions.  Currently all exceptions simply print a message and
+terminate the process.  @strong{You should not need to modify this
+file for project 2.}
+
+@item gdt.c
+@itemx gdt.c
+The 80@var{x}86 is a segmented architecture.  The Global Descriptor
+Table (GDT) is a table that describes the segments in use.  These
+files set up the GDT.  @strong{You should not need to modify these
+files for any of the projects.}  However, you can read the code if
+you're interested in how the GDT works.
+
+@item tss.c
+@itemx tss.c
+The Task-State Segment (TSS) is used for 80@var{x}86 architectural
+task switching.  Pintos uses the TSS only for switching stacks when a
+user process enters an interrupt handler, as does Linux.  @strong{You
+should not need to modify these files for any of the projects.}
+However, you can read the code if you're interested in how the GDT
+works.
+@end table
+
+Elsewhere in the kernel, you will need to use some file system code.
+You will not actually write a file system until the end of the
+quarter, but since user programs need files to do anything
+interesting, we have provided a simple file system in the
+@file{filesys} directory.  You will want to look over the
+@file{filesys.h} and @file{file.h} interfaces to understand how to use
+the file system.  However, @strong{you should not modify the file
+system code for this project}.  Proper use of the file system routines
+now will make life much easier for project 4, when you improve the
+file system implementation.
+
+Finally, in @file{lib/kernel}, you might want to use
+@file{bitmap.[ch]}.  A bitmap is basically an array of bits, each of
+which can be true or false.  Bitmaps are typically used to keep track
+of the usage of a large array of (identical) resources: if resource
+@var{n} is in use, then bit @var{n} of the bitmap is true.  You might
+find it useful for tracking memory pages, for example.
+
+@node How User Programs Work, Global Requirements, Project 2 Code to Hack, Project 2--User Programs
+@section How User Programs Work
+
+Pintos can run normal C programs.  In fact, it can run any program you
+want, provided it's compiled into the proper file format, and uses
+only the system calls you implement.  (For example, @code{malloc()}
+makes use of functionality that isn't provided by any of the syscalls
+we require you to support.)  The only other limitation is that Pintos
+can't run programs using floating point operations, since it doesn't
+include the necessary kernel functionality to save and restore the
+processor's floating-point unit when switching threads.  You can look
+in @file{test} directory for some examples.
+
+Pintos loads ELF executables, where ELF is an executable format used
+by Linux, Solaris, and many other Unix and Unix-like systems.
+Therefore, you can use any compiler and linker that produce
+80@var{x}86 ELF executables to produce programs for Pintos.  We
+recommend using the tools we provide in the @file{test} directory.  By
+default, the @file{Makefile} in this directory will compile the test
+programs we provide.  You can edit the @file{Makefile} to compile your
+own test programs as well.
+
+@node Global Requirements, Problem 2-1 Argument Passing, How User Programs Work, Project 2--User Programs
+@section Global Requirements
+
+For testing and grading purposes, we have some simple requirements for
+your output.  The kernel should print out the program's name and exit
+status whenever a process exits.  Aside from this, it should print out
+no other messages.  You may understand all those debug messages, but
+we won't, and it just clutters our ability to see the stuff we care
+about.  Additionally, while it may be useful to hard-code which
+process will run at startup while debugging, before you submit your
+code you must make sure that it takes the start-up process name and
+arguments from the @samp{-ex} argument.  The infrastructure for this
+is already there---you just need to make sure you enable it!  For
+example, running @code{pintos -ex "testprogram 1 2 3 4"} will spawn
+@samp{testprogram 1 2 3 4} as the first process.
+
+@node Problem 2-1 Argument Passing, Problem 2-2 System Calls, Global Requirements, Project 2--User Programs
+@section Problem 2-1: Argument Passing
+
+Currently, @code{thread_execute()} does not support passing arguments
+to new processes.  UNIX and other operating systems do allow passing
+command line arguments to a program, which accesses them via the argc,
+argv arguments to main.  You must implement this functionality by
+extending @code{thread_execute()} so that instead of simply taking a
+program file name, it can take a program name with arguments as a
+single string.  That is, @code{thread_execute("grep foo *.c")} should
+be a legal call.  @xref{80x86 Calling Convention}, for information on
+exactly how this works.
+
+@strong{This functionality is extremely important.}  Almost all our
+test cases rely on being able to pass arguments, so if you don't get
+this right, a lot of things will not appear to work correctly with our
+tests.  If the tests fail, so do you.  Fortunately, this part
+shouldn't be too hard.
+
+@node Problem 2-2 System Calls, User Programs FAQ, Problem 2-1 Argument Passing, Project 2--User Programs
+@section Problem 2-2: System Calls
+
+Implement the system call handler in @file{userprog/syscall.c} to
+properly deal with all the system calls described below.  Currently,
+it ``handles'' system calls by terminating the process.  You will need
+to decipher system call arguments and take the appropriate action for
+each.
+
+In addition, implement system calls and system call handling.  You are
+required to support the following system calls, whose syscall numbers
+are defined in @file{lib/syscall-nr.h} and whose C functions called by
+user programs are prototyped in @file{lib/user/syscall.h}:
+
+@table @code
+@item SYS_halt
+@itemx void halt (void)
+Stops Pintos and prints out performance statistics.  Note that this
+should be seldom used, since then you lose some information about
+possible deadlock situations, etc.
+
+@item SYS_exit
+@itemx void exit (int @var{status})
+Terminates the current user program, returning @var{status} to the
+kernel.  A @var{status} of 0 indicates a successful exit.  Other
+values may be used to indicate user-defined error conditions.
+
+@item SYS_exec
+@itemx pid_t exec (const char *@var{file})
+Run the executable in @var{file} and return the new process's program
+id (pid).  If there is an error loading this program, returns pid -1,
+which otherwise should not be a valid id number.
+
+@item SYS_join
+@itemx int join (pid_t @var{pid})
+Joins the process @var{pid}, using the join rules from the last
+assignment, and returns the process's exit status.  If the process was
+terminated by the kernel (i.e.@: killed due to an exception), the exit
+status should be -1.  If the process was not a child process, the
+return value is undefined (but kernel operation must not be
+disrupted).
+
+@item SYS_create
+@itemx bool create (const char *@var{file})
+Create a new file called @var{file}.  Returns -1 if failed, 0 if OK.
+
+@item SYS_remove
+@itemx bool remove (const char *@var{file})
+Delete the file called @var{file}.  Returns -1 if failed, 0 if OK.
+
+@item SYS_open
+@itemx int open (const char *@var{file})
+Open the file called @var{file}.  Returns a nonnegative integer handle
+called a ``file descriptor'' (fd), or -1 if the file could not be
+opened.  File descriptors numbered 0 and 1 are reserved for the
+console.  All open files associated with a process should be closed
+when the process exits or is terminated.
+
+@item SYS_filesize
+@itemx int filesize (int @var{fd})
+Returns the size, in bytes, of the file open as @var{fd}, or -1 if the
+file is invalid.
+
+@item SYS_read
+@itemx int read (int @var{fd}, void *@var{buffer}, unsigned @var{size})
+Read @var{size} bytes from the file open as @var{fd} into
+@var{buffer}.  Returns the number of bytes actually read, or -1 if the
+file could not be read.
+
+@item SYS_write
+@itemx int write (int @var{fd}, const void *@var{buffer}, unsigned @var{size})
+Write @var{size} bytes from @var{buffer} to the open file @var{fd}.
+Returns the number of bytes actually written, or -1 if the file could
+not be written.
+
+@item SYS_close
+@itemx void close (int @var{fd})
+Close file descriptor @var{fd}.
+@end table
+
+The file defines other syscalls.  Ignore them for now.  You will
+implement some of them in project 3 and the rest in project 4, so be
+sure to design your system with extensibility in mind.
+
+To implement syscalls, you will need to provide a way of copying data
+from the user's virtual address space into the kernel and vice versa.
+This can be a bit tricky: what if the user provides an invalid
+pointer, a pointer into kernel memory, or points to a block that is
+partially in one of those regions?  You should handle these cases by
+terminating the user process.  You will need this code before you can
+even obtain the system call number, because the system call number is
+on the user's stack in the user's virtual address space.  We recommend
+writing and testing this code before implementing any other system
+call functionality.
+
+You must make sure that system calls are properly synchronized so that
+any number of user processes can make them at once.  In particular, it
+is not safe to call into the filesystem code provided in the
+@file{filesys} directory from multiple threads at once.  For now, we
+recommend adding a single lock that controls access to the filesystem
+code.  You should acquire this lock before calling any functions in
+the @file{filesys} directory, and release it afterward.  Because it
+calls into @file{filesys} functions, you will have to modify
+@file{addrspace_load()} in the same way.  @strong{For now, we
+recommend against modifying code in the @file{filesys} directory.}
+
+We have provided you a function for each system call in
+@file{lib/user/syscall.c}.  These provide a way for user processes to
+invoke each system call from a C program.  Each of them calls an
+assembly language routine in @file{lib/user/syscall-stub.S}, which in
+turn invokes the system call interrupt and returns.
+
+When you're done with this part, and forevermore, Pintos should be
+bulletproof.  Nothing that a user program can do should ever cause the
+OS to crash, halt, assert fail, or otherwise stop running.  The sole
+exception is a call to the @code{halt} system call.
+
+@xref{System Calls}, for more information on how syscalls work.
+
+@node User Programs FAQ, 80x86 Calling Convention, Problem 2-2 System Calls, Project 2--User Programs
+@section FAQ
+
+@enumerate 1
+@item General FAQs
+
+@enumerate 1
+@item
+@b{Do we need a working project 1 to implement project 2?}
+
+You may find the code for @code{thread_join()} to be useful in
+implementing the join syscall, but besides that, you can use
+the original code provided for project 1.
+
+@item
+@b{Is there a way I can disassemble user programs?}
+
+@c FIXME
+The @command{objdump} utility can disassemble entire user programs or
+object files.  Invoke it as @code{objdump -d @var{file}}.  You can
+also use @code{gdb}'s @command{disassemble} command to disassemble
+individual functions in object files compiled with debug information.
+
+@item
+@b{Why can't I use many C include files in my Pintos programs?}
+
+The C library we provide is very limited.  It does not include many of
+the features that are expected of a real operating system's C library.
+The C library must be built specifically for the operating system (and
+architecture), since it must make system calls for I/O and memory
+allocation.  (Not all functions do, of course, but usually the library
+is compiled as a unit.)  If you wish to port libraries to Pintos, feel
+free.
+
+@item
+@b{How do I compile new user programs? How do I make 'echo' compile?}
+
+You need to modify @file{tests/Makefile}.
+
+@item
+@b{Help, Solaris only allows 128 open files at once!}
+
+Solaris limits the number of file descriptors a process may keep open
+at any given time.  The default limit is 128 open file descriptors.
+
+To see the current limit for all new processes type @samp{limit} at
+the shell prompt and look at the line titled ``descriptors''. To
+increase this limit to the maximum allowed type @code{ulimit
+descriptors} in a @command{csh} derived shell or @code{unlimit
+descriptors} in a @command{sh} derived shell.  This will increase the
+number of open file descriptors your Pintos process can use, but it
+will still be limited.
+
+Refer to the @command{limit(1)} man page for more information.
+
+@item
+@b{I can't seem to figure out how to read from and write to
+memory. What should I do?}
+
+Here are some pointers:
+
+FIXME
+
+@item
+@b{I'm also confused about reading from and writing to the stack. Can
+you help?}
+
+FIXME: relevant?
+
+@itemize @bullet
+@item
+Only non-@samp{char} values will have issues when writing them to
+memory.  If a digit is in a string, it is considered a character.
+However, the value of @code{argc} would be a non-char.
+
+@item
+You will need to write characters and non-characters into main memory.
+
+@item
+When you add items to the stack, you will be decrementing the stack
+pointer.  You'll need to decrement the stack pointer before writing to
+the location.
+
+@item
+Each character is 1 byte.
+@end itemize
+@end enumerate
+
+@item Argument Passing FAQs
+
+@enumerate 1
+@item
+@b{What will be the format of command line arguments?}
+
+You should assume that command line arguments are delimited by white
+space.
+
+@item
+@b{How do I parse all these argument strings?}
+
+We recommend you look at @code{strtok_r()}, prototyped in
+@file{lib/string.h} and implemented with thorough comments in
+@file{lib/string.c}.  You can find more about it by looking at the man
+page (run @code{man strtok_r} at the prompt).
+
+@item
+@b{Why is the top of the stack at @t{0xc0000000}?  Isn't that off the
+top of user virtual memory?  Shouldn't it be @t{0xbfffffff}?}
+
+When the processor pushes data on the stack, it decrements the stack
+pointer first.  Thus, the first (4-byte) value pushed on the stack
+will be at address @t{0xbffffffc}.
+
+Also, the stack should always be aligned to a 4-byte boundary, but
+@t{0xbfffffff} isn't.
+@end enumerate
+
+@item System Calls FAQs
+
+@enumerate 1
+@item
+@b{What should I do with the parameter passed to @code{exit()}?}
+
+This value, the exit status of the process, must be returned to the
+thread's parent when @code{join()} is called.
+
+@item
+@b{Can I just cast a pointer to a @code{struct file} object to get a
+unique file descriptor?  Can I just cast a @code{struct thread *} to a
+@code{pid_t}?  It's so much simpler that way!}
+
+This is a design decision you will have to make for yourself.
+However, note that most operating systems do distinguish between file
+descriptors (or pids) and the addresses of their kernel data
+structures.  You might want to give some thought as to why they do so
+before committing yourself.
+
+@item
+@b{Can I set a maximum number of open files per process?}
+
+From a design standpoint, it would be better not to set an arbitrary
+maximum.  That said, if your design calls for it, you may impose a
+limit of 128 open files per process (as the Solaris machines here do).
+
+@item
+@b{What happens when two (or more) processes have a file open and one of
+them removes it?}
+
+FIXME FIXME FIXME
+
+You should copy the standard Unix semantics for files.  That is, when
+a file is removed an process which has a file descriptor for that file
+may continue to do operations on that descriptor.  This means that
+they can read and write from the file.  The file will not have a name,
+and no other processes will be able to open it, but it will continue
+to exist until all file descriptors referring to the file are closed
+or the machine shuts down.
+
+@item
+@b{What happens if a system call is passed an invalid argument, such
+as Open being called with an invalid filename?}
+
+Pintos should not crash.  You should have your system calls check for
+invalid arguments and return error codes.
+
+@item
+@b{I've discovered that some of my user programs need more than one 4
+kB page of stack space.  What should I do?}
+
+You may modify the stack setup code to allocate more than one page of
+stack space for each process.
+
+@item
+@b{What do I need to print on thread completion?}
+
+You should print the complete thread name (as specified in the
+@code{SYS_exec} call) followed by the exit status code,
+e.g.@: @samp{example 1 2 3 4: 0}.
+@end enumerate
+@end enumerate
+
+@node 80x86 Calling Convention, System Calls, User Programs FAQ, Project 2--User Programs
+@appendixsec 80@var{x}86 Calling Convention
+
+What follows is a quick and dirty discussion of the 80@var{x}86
+calling convention.  Some of the basics should be familiar from CS
+107, and if you've already taken CS 143 or EE 182, then you should
+have seen even more of it.  I've omitted some of the complexity, since
+this isn't a class in how function calls work, so don't expect this to
+be exactly correct in full, gory detail.  If you do want all the
+details, you can refer to @cite{[SysV-i386]}.
+
+Whenever a function call happens, you need to put the arguments on the
+call stack for that function, before the code for that function
+executes, so that the callee has access to those values.  The caller
+has to be responsible for this (be sure you understand why).
+Therefore, when you compile a program, the assembly code emitted will
+have in it, before every function call, a bunch of instructions that
+prepares for the call in whatever manner is conventional for the
+machine you're working on.  This includes saving registers as needed,
+putting stuff on the stack, saving the location to return to somewhere
+(so that when the callee finishes, it knows where the caller code is),
+and some other bookkeeping stuff.  Then you do the jump to the
+callee's code, and it goes along, assuming that the stack and
+registers are prepared in the appropriate manner.  When the callee is
+done, it looks at the return location as saved earlier, and jumps back
+to that location.  The caller may then have to do some cleanup:
+clearing arguments and the return value off the stack, restoring
+registers that were saved before the call, and so on.
+
+If you think about it, some of these things should remind you of
+context switching.
+
+As an aside, in general, function calls are not cheap.  You have to do
+a bunch of memory writes to prepare the stack, you need to save and
+restore registers before and after a function call, you need to write
+the stack pointer, you have a couple of jumps which probably wrecks
+some of your caches.  This is why inlining code can be much faster.
+
+@node Argument Passing to main
+@subsection Argument Passing to @code{main()}
+
+In @code{main()}'s case, there is no caller to prepare the stack
+before it runs.  Therefore, the kernel needs to do it.  Fortunately,
+since there's no caller, there are no registers to save, no return
+address to deal with, etc.  The only difficult detail to take care of,
+after loading the code, is putting the arguments to @code{main()} on
+the stack.
+
+(The above is a small lie: most compilers will emit code where main
+isn't strictly speaking the first function.  This isn't an important
+detail.  If you want to look into it more, try disassembling a program
+and looking around a bit.  However, you can just act as if
+@code{main()} is the very first function called.)
+
+Pintos is written for the 80@var{x}86 architecture.  Therefore, we
+need to adhere to the 80@var{x}86 calling convention, which is
+detailed in the FAQ.  Basically, you put all the arguments on the
+stack and move the stack pointer appropriately.  The program will
+assume that this has been done when it begins running.
+
+So, what are the arguments to @code{main()}? Just two: an @samp{int}
+(@code{argc}) and a @samp{char **} (@code{argv}).  @code{argv} is an
+array of strings, and @code{argc} is the number of strings in that
+array.  However, the hard part isn't these two things.  The hard part is
+getting all the individual strings in the right place.  As we go
+through the procedure, let us consider the following example command:
+@samp{/bin/ls -l *.h *.c}.
+
+The first thing to do is to break the command line into individual
+strings: @samp{/bin/ls}, @samp{-l}, @samp{*.h}, and @samp{*.c}.  These
+constitute the arguments of the command, including the program name
+itself (which belongs in @code{argv[0]}).
+
+These individual, null-terminated strings should be placed on the user
+stack.  They may be placed in any order, as you'll see shortly,
+without affecting how main works, but for simplicity let's assume they
+are in reverse order (keeping in mind that the stack grows downward on
+an 80@var{x}86 machine).  As we copy the strings onto the stack, we
+record their (virtual) stack addresses.  These addresses will become
+important when we write the argument vector (two paragraphs down).
+
+After we push all of the strings onto the stack, we adjust the stack
+pointer so that it is word-aligned: that is, we move it down to the
+next 4-byte boundary.  This is required because we will next be
+placing several words of data on the stack, and they must be aligned
+in order to be read correctly.  In our example, as you'll see below,
+the strings start at address @t{0xffed}.  One word below that would be
+at @t{0xffe9}, so we could in theory put the next word on the stack
+there.  However, since the stack pointer should always be
+word-aligned, we instead leave the stack pointer at @t{0xffe8}.
+
+Once we align the stack pointer, we then push the elements of the
+argument vector (that is, the addresses of the strings @samp{/bin/ls},
+@samp{-l}, @samp{*.h}, and @samp{*.c}) onto the stack.  This must be
+done in reverse order, such that @code{argv[0]} is at the lowest
+virtual address (again, because the stack is growing downward).  This
+is because we are now writing the actual array of strings; if we write
+them in the wrong order, then the strings will be in the wrong order
+in the array.  This is also why, strictly speaking, it doesn't matter
+what order the strings themselves are placed on the stack: as long as
+the pointers are in the right order, the strings themselves can really
+be anywhere.  After we finish, we note the stack address of the first
+element of the argument vector, which is @code{argv} itself.
+
+Finally, we push @code{argv} (that is, the address of the first
+element of the @code{argv} array) onto the stack, along with the
+length of the argument vector (@code{argc}, 4 in this example).  This
+must also be done in this order, since @code{argc} is the first
+argument to main and therefore is on first (smaller address) on the
+stack.  We leave the stack pointer to point to the location where
+@code{argc} is, because it is at the top of the stack, the location
+directly below @code{argc}.
+
+All of which may sound very confusing, so here's a picture which will
+hopefully clarify what's going on. This represents the state of the
+stack and the relevant registers right before the beginning of the
+user program (assuming for this example a 16-bit virtual address space
+with addresses from @t{0x0000} to @t{0xffff}):
+
+@html
+<CENTER>
+@end html
+@multitable {@t{0xffff}} {word-align} {@t{/bin/ls\0}}
+@item Address @tab Name @tab Data
+@item @t{0xfffc} @tab @code{*argv[3]} @tab @samp{*.c\0}
+@item @t{0xfff8} @tab @code{*argv[2]} @tab @samp{*.h\0}
+@item @t{0xfff5} @tab @code{*argv[1]} @tab @samp{-l\0}
+@item @t{0xffed} @tab @code{*argv[0]} @tab @samp{/bin/ls\0}
+@item @t{0xffec} @tab word-align @tab @samp{\0}
+@item @t{0xffe8} @tab @code{argv[3]} @tab @t{0xfffc}
+@item @t{0xffe4} @tab @code{argv[2]} @tab @t{0xfff8}
+@item @t{0xffe0} @tab @code{argv[1]} @tab @t{0xfff5}
+@item @t{0xffdc} @tab @code{argv[0]} @tab @t{0xffed}
+@item @t{0xffd8} @tab @code{argv} @tab @t{0xffdc}
+@item @t{0xffd4} @tab @code{argc} @tab 4
+@end multitable
+@html
+</CENTER>
+@end html
+
+In this example, the stack pointer would be initialized to @t{0xffd4}.
+
+Your code should start the stack at the very top of the user virtual
+address space, in the page just below virtual address @code{PHYS_BASE}
+(defined in @file{threads/mmu.h}).
+
+@node System Calls,  , 80x86 Calling Convention, Project 2--User Programs
+@appendixsec System Calls
+
+We have already been dealing with one way that the operating system
+can regain control from a user program: interrupts from timers and I/O
+devices.  These are ``external'' interrupts, because they are caused
+by entities outside the CPU.
+
+The operating system is also called to deal with software exceptions,
+which are events generated in response to the code.  These can be
+errors such as a page fault or division by zero.  However, exceptions
+are also the means by which a user program can request services
+(``system calls'') from the operating system.
+
+Some exceptions are ``restartable'': the condition that caused the
+exception can be fixed and the instruction retried. For example, page
+faults call the operating system, but the user code should re-start on
+the load or store that caused the exception (not the next one) so that
+the memory access actually occurs.  On the 80@var{x}86, restartable
+exceptions are called ``faults,'' whereas most non-restartable
+exceptions are classed as ``traps.''  Other architectures may define
+these terms differently.
+
+In the 80@var{x}86 architecture, the @samp{int} instruction is the
+most commonly used means for invoking system calls.  This instruction
+is handled in the same way that other software exceptions.  In Pintos,
+user program invoke @samp{int $0x30} to make a system call.  The
+system call number and any additional arguments are expected to be
+pushed on the stack in the normal fashion before invoking the
+interrupt.
+
+The normal calling convention pushes function arguments on the stack
+from right to left and the stack grows downward.  Thus, when the
+system call handler @code{syscall_handler()} gets control, the system
+call number is in the 32-bit word at the caller's stack pointer, the
+first argument is in the 32-bit word at the next higher address, and
+so on.  The caller's stack pointer is accessible to
+@code{syscall_handler()} as the @samp{esp} member of the @code{struct
+intr_frame} passed to it.
+
+Here's an example stack frame for calling a system call numbered 10
+with three arguments passed as 1, 2, and 3.  The stack addresses are
+arbitrary:
+
+@html
+<CENTER>
+@end html
+@multitable {Address} {Value}
+@item Address @tab Value
+@item @t{0xfe7c} @tab 3
+@item @t{0xfe78} @tab 2
+@item @t{0xfe74} @tab 1
+@item @t{0xfe70} @tab 10
+@end multitable
+@html
+</CENTER>
+@end html
+
+In this example, the caller's stack pointer would be at @t{0xfe70}.
+
+The 80@var{x}86 convention for function return values is to place them
+in the @samp{EAX} register.  System calls that return a value can do
+so by modifying the @samp{eax} member of @code{struct intr_frame}.
diff --git a/doc/vm.texi b/doc/vm.texi

new file mode 100644 (file)

index 0000000..812915e
--- /dev/null
+++ b/doc/vm.texi
@@ -0,0 +1,657 @@
+@node Project 3--Virtual Memory, Project 4--File Systems, Project 2--User Programs, Top
+@chapter Project 3: Virtual Memory
+
+By now you should be familiar with the inner workings of Pintos.
+You've already come a long way: your OS can properly handle multiple
+threads of execution with proper synchronization, and can load
+multiple user programs at once.  However, when loading user programs,
+your OS is limited by how much main memory the simulated machine has.
+In this assignment, you will remove that limitation.
+
+You will be using the @file{vm} directory for this project.  There is
+no new code to get acquainted with for this assignment.  The @file{vm}
+directory contains only the @file{Makefile}s.  The only change from
+@file{userprog} is that this new @file{Makefile} turns on the setting
+@option{-DVM}, which you will need for this assignment.  All code you
+write will either be newly generated files (e.g.@: if you choose to
+implement your paging code in their own source files), or will be
+modifications to pre-existing code (e.g.@: you will change the
+behavior of @file{addrspace.c} significantly).
+
+You will be building this assignment on the last one.  It will benefit
+you to get your project 2 in good working order before this assignment
+so those bugs don't keep haunting you.
+
+@menu
+* VM Design::                   
+* Page Faults::                 
+* Disk as Backing Store::       
+* Memory Mapped Files::         
+* Stack::                       
+* Problem 3-1 Page Table Management::  
+* Problem 3-2 Paging To and From Disk::  
+* Problem 3-3 Memory Mapped Files::  
+* Virtual Memory FAQ::          
+@end menu
+
+@node VM Design, Page Faults, Project 3--Virtual Memory, Project 3--Virtual Memory
+@section A Word about Design
+
+It is important for you to note that in addition to getting virtual
+memory working, this assignment is also meant to be an open-ended
+design problem.  We will expect you to come up with a design that
+makes sense.  You will have the freedom to choose how to do software
+translation on TLB misses, how to represent the swap partition, how to
+implement paging, etc.  In each case, we will expect you to provide a
+defensible justification in your design documentation as to why your
+choices are reasonable.  You should evaluate your design on all the
+available criteria: speed of handling a page fault, space overhead in
+memory, minimizing the number of page faults, simplicity, etc.
+
+In keeping with this, you will find that we are going to say as little
+as possible about how to do things.  Instead we will focus on what end
+functionality we require your OS to support.
+
+@node Page Faults, Disk as Backing Store, VM Design, Project 3--Virtual Memory
+@section Page Faults
+
+For the last assignment, whenever a context switch occurred, the new
+process would install its own page table into the machine.  The page
+table contained all the virtual-to-physical translations for the
+process.  Whenever the processor needed to look up a translation, it
+consulted the page table.  As long as the process only accessed
+memory that it didn't own, all was well.  If the process accessed
+memory it didn't own, it ``page faulted'' and @code{page_fault()}
+terminated the process.
+
+When we implement virtual memory, the rules have to change.  A page
+fault is no longer necessarily an error, since it might only indicate
+that the page must be brought in from a disk file or from swap.  You
+will have to implement a more sophisticated page fault handler to
+handle these cases.
+
+On the 80@var{x}86, the page table format is fixed by hardware.  The
+top-level data structure is a 4 kB page called the ``page directory''
+(PD) arranged as an array of 1,024 32-bit page directory entries
+(PDEs), each of which represents 4 MB of virtual memory.  Each PDE may
+point to the physical address of another 4 kB page called a ``page
+table'' (PT) arranged in the same fashion as an array of 1,024 32-bit
+page table entries (PTEs), each of which translates a single 4 kB
+virtual page into physical memory.
+
+Thus, translation of a virtual address into a physical address follows
+the three-step process illustrated in the diagram
+below:@footnote{Actually, virtual to physical translation on the
+80@var{x}86 architecture happens via an intermediate ``linear
+address,'' but Pintos (and most other 80@var{x}86 OSes) set up the CPU
+so that linear and virtual addresses are one and the same, so that you
+can effectively ignore this CPU feature.}
+
+@enumerate 1
+@item
+The top 10 bits of the virtual address (bits 22:31) are used to index
+into the page directory.  If the PDE is marked ``present,'' the
+physical address of a page table is read from the PDE thus obtained.
+If the PDE is marked ``not present'' then a page fault occurs.
+
+@item
+The next 10 bits of the virtual address (bits 12:21) are used to index
+into the page table.  If the PTE is marked ``present,'' the physical
+address of a data page is read from the PTE thus obtained.  If the PTE
+is marked ``not present'' then a page fault occurs.
+
+
+@item
+The bottom 12 bits of the virtual address (bits 0:11) are added to the
+data page's physical base address, producing the final physical
+address.
+@end enumerate
+
+@example
+32                    22                     12                      0
++--------------------------------------------------------------------+
+| Page Directory Index |   Page Table Index   |    Page Offset       |
++--------------------------------------------------------------------+
+             |                    |                     |
+     _______/             _______/                _____/
+    /                    /                       /
+   /    Page Directory  /      Page Table       /    Data Page
+  /     .____________. /     .____________.    /   .____________.
+  |1,023|____________| |1,023|____________|    |   |____________|
+  |1,022|____________| |1,022|____________|    |   |____________|
+  |1,021|____________| |1,021|____________|    \__\|____________|
+  |1,020|____________| |1,020|____________|       /|____________|
+  |     |            | |     |            |        |            |
+  |     |            | \____\|            |_       |            |
+  |     |      .     |      /|      .     | \      |      .     |
+  \____\|      .     |_      |      .     |  |     |      .     |
+       /|      .     | \     |      .     |  |     |      .     |
+        |      .     |  |    |      .     |  |     |      .     |
+        |            |  |    |            |  |     |            |
+        |____________|  |    |____________|  |     |____________|
+       4|____________|  |   4|____________|  |     |____________|
+       3|____________|  |   3|____________|  |     |____________|
+       2|____________|  |   2|____________|  |     |____________|
+       1|____________|  |   1|____________|  |     |____________|
+       0|____________|  \__\0|____________|  \____\|____________|
+                           /                      /
+@end example
+
+
+FIXME need to explain virtual and physical memory layout - probably
+back in userprog project
+
+FIXME need to mention that there are many possible implementations and
+that the above is just an outline
+
+@node Disk as Backing Store, Memory Mapped Files, Page Faults, Project 3--Virtual Memory
+@section Disk as Backing Store
+
+In VM systems, since memory is less plentiful than disk, you will
+effectively use memory as a cache for disk.  Looking at it from
+another angle, you will use disk as a backing store for memory.  This
+provides the abstraction of an (almost) unlimited virtual memory size.
+Part of your task in this project is to do this, with the additional
+constraint that your performance should be close to that provided by
+physical memory.  You will use the page tables' ``dirty'' bits to
+denote whether pages need to be written back to disk when they're
+evicted from main memory and the ``accessed'' bit for page replacement
+algorithms.  Whenever the hardware writes memory, it sets the dirty
+bit, and if it reads or writes to the page, it sets the accessed bit.
+
+As with any caching system, performance depends on the policy used to
+decide which things are kept in memory and which are only stored on
+disk.  On a page fault, the kernel must decide which page to replace.
+Ideally, it will throw out a page that will not be referenced for a
+long time, keeping in memory those pages that are soon to be
+referenced.  Another consideration is that if the replaced page has
+been modified, the page must be first saved to disk before the needed
+page can be brought in.  Many virtual memory systems avoid this extra
+overhead by writing modified pages to disk in advance, so that later
+page faults can be completed more quickly.
+
+@node Memory Mapped Files, Stack, Disk as Backing Store, Project 3--Virtual Memory
+@section Memory Mapped Files
+
+The traditional way to access the file system is via @code{read} and
+@code{write} system calls, but that requires an extra level of copying
+between the kernel and the user level.  A secondary interface is
+simply to ``map'' the file into the virtual address space.  The
+program can then use load and store instructions directly on the file
+data.  (An alternative way of viewing the file system is as ``durable
+memory.''  Files just store data structures.  If you access data
+structures in memory using load and store instructions, why not access
+data structures in files the same way?)
+
+Memory mapped files are typically implemented using system calls.  One
+system call maps the file to a particular part of the address space.
+For example, one might map the file @file{foo}, which is 1000 bytes
+long, starting at address 5000.  Assuming that nothing else is already
+at virtual addresses 5000@dots{}6000, any memory accesses to these
+locations will access the corresponding bytes of @file{foo}.
+
+A consequence of memory mapped files is that address spaces are
+sparsely populated with lots of segments, one for each memory mapped
+file (plus one each for code, data, and stack).  You will implement
+memory mapped files for problem 3 of this assignment, but you should
+design your solutions to problems 1 and 2 to account for this.
+
+@node Stack, Problem 3-1 Page Table Management, Memory Mapped Files, Project 3--Virtual Memory
+@section Stack
+
+In project 2, the stack was a single page at the top of the user
+virtual address space.  The stack's location does not change in this
+project, but your kernel should allocate additional pages to the stack
+on demand.  That is, if the stack grows past its current bottom, the
+system should allocate additional pages for the stack as necessary,
+unless those pages are unavailable because they are in use by another
+segment, in which case some sort of fault should occur.
+
+@node Problem 3-1 Page Table Management, Problem 3-2 Paging To and From Disk, Stack, Project 3--Virtual Memory
+@section Problem 3-1: Page Table Management
+
+Implement page directory and page table management to support virtual
+memory.  You will need data structures to accomplish the following
+tasks:
+
+@itemize @bullet
+@item
+Some way of translating in software from virtual page frames to
+physical page frames (consider using a hash table---note
+that we provide one in @file{lib/kernel}).
+
+@item
+Some way of translating from physical page frames back to virtual
+page frames, so that when you replace a page, you can invalidate
+its translation(s).
+
+@item
+Some way of finding a page on disk if it is not in memory.  You won't
+need this data structure until part 2, but planning ahead is a good
+idea.
+@end itemize
+
+You need to do the roughly the following to handle a page fault:
+
+@enumerate 1
+@item
+Determine the location of the physical page backing the virtual
+address that faulted.  It might be in the file system, in swap,
+already be in physical memory and just not set up in the page table,
+or it might be an invalid virtual address.
+
+If the virtual address is invalid, that is, if there's no physical
+page backing it, or if the virtual address is above @code{PHYS_BASE},
+meaning that it belongs to the kernel instead of the user, then the
+process's memory access must be disallowed.  You should terminate the
+process at this point, being sure to free all of its resources.
+
+@item
+If the physical page is not in physical memory, bring it into memory.
+If necessary to make room, first evict some other page from memory.
+(When you do that you need to first remove references to the page from
+any page table that refers to it.)
+
+@item
+Each user process's @code{struct thread} has a @samp{pagedir} member
+that points to its own per-process page directory.  Read the PDE for
+the faulting virtual address.
+
+@item 
+If the PDE is marked ``not present'' then allocate a new page table
+page and initialize the PDE to point to the new page table.  As when
+you allocated a data page, you might have to first evict some other
+page from memory.
+
+@item
+Follow the PDE to the page table.  Point the PTE for the faulting
+virtual address to the physical page found in step 2.
+@end enumerate
+
+You'll need to modify the ELF loader in @file{userprog/addrspace.c} to
+do page table management according to your new design.  As supplied,
+it reads all the process's pages from disk and initializes the page
+tables for them at the same time.  For testing purposes, you'll
+probably want to leave the code that reads the pages from disk, but
+use your new page table management code to construct the page tables
+only as page faults occur for them.
+
+@node Problem 3-2 Paging To and From Disk, Problem 3-3 Memory Mapped Files, Problem 3-1 Page Table Management, Project 3--Virtual Memory
+@section Problem 3-2: Paging To and From Disk
+
+Implement paging to and from disk.
+
+You will need routines to move a page from memory to disk and from
+disk to memory.  You may use the Pintos file system for swap space, or
+you may use the disk on interface @code{hd1:1}, which is otherwise
+unused.  A swap disk can theoretically be faster than using the file
+system, because it avoid file system overhead and because the swap
+disk and file system disk will be on separate hard disk controllers.
+You will definitely need to be able to retrieve pages from files in
+any case, so to avoid special cases it may be easier to use a file for
+swap.  You will still be using the basic file system provided with
+Pintos.  If you do everything correctly, your VM should still work
+when you implement your own file system for the next assignment.
+
+You will need a way to track pages which are used by a process but
+which are not in physical memory, to fully handle page faults.  Pages
+that you store on disk should not be constrained to be in sequential
+order, and consequently your swap file (or swap disk) should not
+require unused empty space.  You will also need a way to track all of
+the physical memory pages, in order to find an unused one when needed,
+or to evict a page when memory is needed but no empty pages are
+available.  The data structures that you designed in part 1 should do
+most of the work for you.
+
+You will need a page replacement algorithm.  The hardware sets the
+accessed and dirty bits when it accesses memory.  Therefore, you
+should be able to take advantage of this information to implement some
+algorithm which attempts to achieve LRU-type behavior.  We expect that
+your algorithm perform at least as well as a reasonable implementation
+of the second-chance (clock) algorithm.  You will need to show in your
+test cases the value of your page replacement algorithm by
+demonstrating for some workload that it pages less frequently using
+your algorithm than using some inferior page replacement policy.  The
+canonical example of a poor page replacement policy is random
+replacement.
+
+Since you will already be paging from disk, you should implement a
+``lazy'' loading scheme for new processes.  When a process is created,
+it will not run immediately.  Therefore, it doesn't make sense to load
+all its code, data, and stack into memory when the process is created,
+since it might incur additional disk accesses to do so (if it gets
+paged out before it runs).  When loading a new process, you should
+leave most pages on disk, and bring them in as demanded when the
+program begins running.  Your VM system should also use the executable
+file itself as backing store for read-only segments, since these
+segments won't change.
+
+There are a few special cases.  Look at the loop in
+@code{load_segment()} in @file{userprog/addrspace.c}.  Each time
+around the loop, @code{read_bytes} represents the number of bytes to
+read from the executable file and @code{zero_bytes} represents the number
+of bytes to initialize to zero following the bytes read.  The two
+always sum to @code{PGSIZE}.  The page handling depends on these
+variables' values:
+
+@itemize @bullet
+@item
+If @code{read_bytes} equals @code{PGSIZE}, the page should be demand
+paged from disk on its first access.
+
+@item 
+If @code{zero_bytes} equals @code{PGSIZE}, the page does not need to
+be read from disk at all because it is all zeroes.  You should handle
+such pages by creating a new page consisting of all zeroes at the
+first page fault.
+
+@item
+If neither @code{read_bytes} nor @code{zero_bytes} equals
+@code{PGSIZE}, then part of the page is to be read from disk and the
+remainder zeroed.  This is a special case, which you should handle by
+reading the partial page from disk at executable load time and zeroing
+the rest of the page.  It is the only case in which loading should not
+be ``lazy''; even real OSes such as Linux do not load partial pages
+lazily.
+@end itemize
+
+FIXME mention that you can test with these special cases eliminated
+
+You may optionally implement sharing: when multiple processes are
+created that use the same executable file, share read-only pages among
+those processes instead of creating separate copies of read-only
+segments for each process.  If you carefully designed your data
+structures in part 1, sharing of read-only pages should not make this
+part significantly harder.
+
+@node Problem 3-3 Memory Mapped Files, Virtual Memory FAQ, Problem 3-2 Paging To and From Disk, Project 3--Virtual Memory
+@section Problem 3-3: Memory Mapped Files
+
+Implement memory mapped files.
+
+You will need to implement the following system calls:
+
+@table @asis
+@item SYS_mmap
+@itemx bool mmap (int @var{fd}, void *@var{addr}, unsigned @var{length})
+
+Maps the file open as @var{fd} into the process's address space
+starting at @var{addr} for @var{length} bytes.  Returns true if
+successful, false on failure.  
+
+@item SYS_munmap
+@itemx bool munmap (void *addr, unsigned length)
+
+Unmaps the segment specified by id.  This cannot be used to unmap
+segments mapped by the executable loader.  Returns 0 on success, -1 on
+failure.  When a file is unmapped, all outstanding changes are written
+to the file, and the segment's pages are removed from the process's
+list of used virtual pages.
+@end table
+
+Calls to @code{mmap} must fail if the address is not page-aligned, if
+the length is not positive and a multiple of @var{PGSIZE}.  You also
+must error check to make sure that the new segment does not overlap
+already existing segments, and fail if it isn't.  If the length passed
+to @code{mmap} is less than the file's length, you should only map the
+first part of the file.  If the length passed to @code{mmap} is longer
+than the file, the file should grow to the requested length.  Similar
+to the code segment, your VM system should be able to use the
+@code{mmap}'d file itself as backing store for the mmap segment, since
+the changes to the @code{mmap} segment will eventually be written to
+the file.  (In fact, you may choose to implement executable mappings
+as a special case of file mappings.)
+
+@node Virtual Memory FAQ,  , Problem 3-3 Memory Mapped Files, Project 3--Virtual Memory
+@section FAQ
+
+@enumerate 1
+@item
+@b{Do we need a working HW 2 to implement HW 3?}
+
+Yes.
+
+@item
+@b{How do I use the hash table provided in @file{lib/hash.c}?}
+
+FIXME
+
+There are two things you need to use this hashtable:
+
+1. You need to decide on a key type. The key should be something
+that is unique for each object as inserting two objects with
+the same key will cause the second to overwrite the first.
+(The keys are compared with ==, so you should stick to
+integers and pointers unless you know how to do operator
+overloading.) You also need to write a hash function that
+converts key values to integers, which you will pass into the
+hash table constructor.
+
+2. Your key needs to be a field of your object type, and you
+will need to supply a 'get' function that given an object
+returns the key.
+
+Here's a quick example of how to construct a hash table. In
+this table the keys are Thread pointers and the objects are
+integers (you will be using different key/value pairs I'm
+sure). In addition, this hash function is pretty puny. You
+should probably use a better one.
+
+@example
+FIXME
+@end example
+
+and to construct the hash table:
+
+HashTable<Thread *, HashObject *> *htable;
+
+htable = new HashTable<Thread *, HashObject *>(ExtractKeyFromHashObject,
+                                            MyKeyToHashValue);
+
+If you have any other questions about hash tables, the CS109
+and CS161 textbooks have good chapters on them, or you can come
+to any of the TA's office hours for further clarification.
+
+@item
+@b{The current implementation of the hash table does not do something
+that we need it to do. What gives?}
+
+You are welcome to modify it.  It is not used by any of the code we
+provided, so modifying it won't affect any code but yours.  Do
+whatever it takes to make it work like you want it to.
+
+@item
+@b{Is the data segment page-aligned?}
+
+No.
+
+@item
+@b{What controls the layout of user programs?}
+
+The linker is responsible for the layout of a user program in
+memory. The linker is directed by a ``linker script'' which tells it
+the names and locations of the various program segments. The
+test/script and testvm/script files are the linker scripts for the
+multiprogramming and virtual memory assignments respectively. You can
+learn more about linker scripts by reading the ``Scripts'' chapter in
+the linker manual, accessible via @samp{info ld}.
+
+@item Page Table Management FAQs
+@enumerate 1
+@item
+@b{How do we manage allocation of pages used for page tables?}
+
+You can use any reasonable algorithm to do so.  However, you should
+make sure that memory used for page tables doesn't grow so much that
+it encroaches deeply on the memory used for data pages.
+
+Here is one reasonable algorithm.  At OS boot time, reserve some fixed
+number of pages for page tables.  Then, each time a new page table
+page is needed, select one of these pages in ``round robin'' fashion.
+If the page in use, clean up any pointers to it.  Then use it for the
+new page table page.
+
+@item
+@b{Our code handles the PageFault exceptions. However, the number of
+page faults handled does not show up in the final stats output. Is
+there a counter that we must increment to correct this problem?}
+
+FIXME 
+
+Yes, you'll need to update kernel->stats->numPageFaults when
+you handle a page fault in your code.
+@end enumerate
+
+@item Paging FAQs
+
+@enumerate 1
+@item
+@b{Can we assume (and enforce) that the user's stack will
+never increase beyond one page?}
+
+No.  This value was useful for project 2, but for this assignment, you
+need to implement an extensible stack segment.
+
+@item
+@b{Does the virtual memory system need to support growth of the data
+segment?}
+
+No.  The size of the data segment is determined by the linker.  We
+still have no dynamic allocation in Pintos (although it is possible to
+``fake'' it at the user level by using memory-mapped files).
+Implementing @code{sbrk()} has been an extra-credit assignment in
+previous years, but adds little additional complexity to a
+well-designed system.
+
+@item
+@b{Does the virtual memory system need to support growth of the stack
+segment?}
+
+Yes. If a page fault appears just below the last stack segment page,
+you must add a new page to the bottom of the stack. It is impossible
+to predict how large the stack will grow at compile time, so we must
+allocate pages as necessary. You should only allocate additional pages
+if they ``appear'' to be stack accesses.
+
+@item
+@b{But what do you mean by ``appear'' to be stack accesses? How big can a
+stack growth be?  Under what circumstances do we grow the stack?}
+
+If it looks like a stack request, then you grow the stack. Yes, that's
+ambiguous. You need to make a reasonable decision about what looks
+like a stack request. For example, you could decide a page, or two
+pages, or ten pages, or more@enddots{}  Or, you could use some other
+heuristic to figure this out.
+
+Make a reasonable decision and document it in your code and in
+your design document.  Please make sure to justify your decision.
+
+@item
+@b{How big should the file(s) we're using as a backing store for memory
+be?}
+
+These files will need to be able to grow based on the number of pages
+you're committed to storing on behalf of the processes currently in
+memory.  They should be able to grow to the full size of the disk.
+@end enumerate
+
+@item Memory Mapped File FAQs
+
+@enumerate 1
+@item
+@b{How do we interact with memory-mapped files?}
+
+Let's say you want to map a file called @file{foo} into your address
+space at address @t{0x10000000}. You open the file, determine its
+length, and then use Mmap:
+
+@example
+#include <stdio.h>
+#include <syscall.h>
+
+int main (void)
+{
+    void *addr = (void *) 0x10000000;
+    int fd = open ("foo");
+    int length = filesize (fd);
+    if (mmap (fd, addr, length))
+        printf ("success!\n");
+}
+@end example
+
+Suppose @file{foo} is a text file and you want to print the first 64
+bytes on the screen (assuming, of course, that the length of the file
+is at least 64).  Without @code{mmap}, you'd need to allocate a
+buffer, use @code{read} to get the data from the file into the buffer,
+and finally use @code{write} to put the buffer out to the display. But
+with the file mapped into your address space, you can directly address
+it like so:
+
+@example
+write (addr, 64, STDOUT_FILENO);
+@end example
+
+Similarly, if you wanted to replace the first byte of the file,
+all you need to do is:
+
+@example
+addr[0] = 'b';
+@end example
+
+When you're done using the memory-mapped file, you simply unmap
+it:
+
+@example
+munmap (addr);
+@end example
+
+@item
+@b{What if two processes memory-map the same file?}
+
+There is no requirement in Pintos that the two processes see
+consistent data.  Unix handles this by making the processes share the
+same physical page, but the @code{mmap} system call also has an
+argument allowing the client to specify whether the page is shared or
+private (i.e.@: copy-on-write).
+
+@item
+@b{What happens if a user removes a @code{mmap}'d file?}
+
+@item
+You should follow the Unix convention and the mapping should still be
+valid.  This is similar to the question in the User Programs FAQ about
+a process with a file descriptor to a file that has been removed.
+
+@item
+@b{What if a process writes to a page that is memory-mapped, but the
+location written to in the memory-mapped page is past the end
+of the memory-mapped file?}
+
+Can't happen.  @code{mmap} extends the file to the requested length,
+and Pintos provides no way to shorten a file.  You can remove a file,
+but the mapping remains valid (see the previous question).
+
+@item
+@b{Do we have to handle memory mapping @code{stdin} or @code{stdout}?}
+
+No.  Memory mapping implies that a file has a length and that a user
+can seek to any location in the file.  Since the console device has
+neither of these properties, @code{mmap} should return false when the
+user attempts to memory map a file descriptor for the console device.
+
+@item
+@b{What happens when a process exits with mmap'd files?}
+
+When a process finishes each of its @code{mmap}'d files is implicitly
+unmapped.  When a process @code{mmap}s a file and then writes into the
+area for the file it is making the assumption the changes will be
+written to the file.
+
+@item
+@b{If a user closes a mmaped file, should be automatically unmap it
+for them?}
+
+No, once created the mapping is valid until @code{munmap} is called
+or the process exits.
+@end enumerate
+@end enumerate
author	Ben Pfaff <blp@cs.stanford.edu>
	Thu, 9 Sep 2004 22:37:50 +0000 (22:37 +0000)
committer	Ben Pfaff <blp@cs.stanford.edu>
	Thu, 9 Sep 2004 22:37:50 +0000 (22:37 +0000)
doc/filesys.texi	[new file with mode: 0644]	patch \| blob
doc/projects.texi	[new file with mode: 0644]	patch \| blob
doc/threads.texi	[new file with mode: 0644]	patch \| blob
doc/userprog.texi	[new file with mode: 0644]	patch \| blob
doc/vm.texi	[new file with mode: 0644]	patch \| blob