From: Ben Pfaff Date: Thu, 9 Sep 2004 22:37:50 +0000 (+0000) Subject: Initial projects. X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?p=pintos-anon;a=commitdiff_plain;h=932e369696f59f4aaf3a2d6e5ece37475548d68b Initial projects. --- diff --git a/doc/filesys.texi b/doc/filesys.texi new file mode 100644 index 0000000..36d92e3 --- /dev/null +++ b/doc/filesys.texi @@ -0,0 +1,447 @@ +@node Project 4--File Systems, , Project 3--Virtual Memory, Top +@chapter Project 4: File Systems + +In the previous two assignments, you made extensive use of a +filesystem without actually worrying about how it was implemented +underneath. For this last assignment, you will fill in the +implementation of the filesystem. You will be working primarily in +the @file{filesys} directory. + +You should build on the code you wrote for the previous assignments. +However, if you wish, you may turn off your VM features, as they are +not vital to making the filesystem work. (You will need to edit +@file{filesys/Makefile.vars} to fully disable VM.) All of the +functionality needed for project 2 (argument passing, syscalls and +multiprogramming) must work in your filesys submission. + +On the other hand, one of the particular charms of working on +operating systems is being able to use what you build, and building +full-featured systems. Therefore, you should strive to make all the +parts work together so that you can run VM and your filesystem at the +same time. Plus, keeping VM is a great way to stress-test your +filesystem implementation. + +FIXME FIXME FIXME +The first step is to understand the default filesystem provided by the +base code. The first things you should look at are +@file{threads/init.c} and @file{filesys/fsutil.c}: there are special +command line arguments to Pintos which are used to format and +manipulate the emulated disk. Specifically, @option{-f} formats the +file system disk for use with Pintos, and @option{-cp @var{src} +@var{dst}} copies @var{src} on the Unix filesystem to @var{dst} on the +file system disk. With this information, you should be able to +compile the base code and try out the command: @samp{pintos -f -cp +./small small} to copy the file @file{small} from your working +directory into the Pintos file system. + +FIXME FIXME FIXME +One thing you should realize immediately is that, until you use the +above operation to copy the shell (or whatever your initial program +is) to the emulated disk, Pintos will be unable to do very much useful +work (it'll try to open the shell and fail, thereby quitting out). You +will also find that you won't be able to do interesting things until +you copy a variety of programs to the disk. A useful technique, once +your inode formats are finalized, is to create a clean reference disk +and copy that over whenever you trash your disk beyond a useful state +(which will probably happen quite often while debugging). + +@menu +* File System New Code:: +* Problem 4-1 Large Files:: +* Problem 4-2 File Growth:: +* Problem 4-3 Subdirectories:: +* Problem 4-4 Buffer Cache:: +* File System Design Document Requirements:: +* File System FAQs:: +@end menu + +@node File System New Code, Problem 4-1 Large Files, Project 4--File Systems, Project 4--File Systems +@section New Code + +Here are some files that are probably new to you. These are in the +@file{filesys} directory except where indicated: + +@table @file +@item fsutil.c +Simple utilities for the filesystem that are accessible from the +kernel command line. + +@item filesys.h +@itemx filesys.c +Top-level interface to the file system. + +@item directory.h +@itemx directory.c +Translates file names to disk file headers; the +directory data structure is stored as a file. + +@item filehdr.h +@itemx filehdr.c +Manages the data structure representing the layout of a +file's data on disk. + +@item file.h +@itemx file.c +Translates file reads and writes to disk sector reads +and writes. + +@item devices/disk.h +@itemx devices/disk.c +Provides access to the physical disk, abstracting away the rather +awful IDE interface. + +@item lib/kernel/bitmap.h +@itemx lib/kernel/bitmap.c +A bitmap data structure along with routines for reading and writing +the bitmap to disk files. +@end table + +Our file system has a Unix-like interface, so you may also wish to +read the Unix man pages for @code{creat}, @code{open}, @code{close}, +@code{read}, @code{write}, @code{lseek}, and @code{unlink}. Our file +system has calls that are similar, but not identical, to these. The +file system translates these calls into physical disk operations. + +All the basic functionality is there in the code above, so that the +filesystem is usable right off the bat. In fact, you've been using it +in the previous two projects. However, it has severe limitations +which you will remove. + +While most of your work will be in @file{filesys}, you should be +prepared for interactions with all previous parts (as usual). + +@node Problem 4-1 Large Files, Problem 4-2 File Growth, File System New Code, Project 4--File Systems +@section Problem 4-1: Large Files + +Modify the file system to allow the maximum size of a file to be as +large as the disk. You can assume that the disk will not be larger +than 8 MB. In the basic file system, each file is limited to a file +size of just under 64 kB. Each file has a header (@code{struct +filehdr}) that is a table of direct pointers to the disk blocks for +that file. Since the header is stored in one disk sector, the maximum +size of a file is limited by the number of pointers that will fit in +one disk sector. Increasing the limit to 8 MB will require you to +implement doubly-indirect blocks. + +@node Problem 4-2 File Growth, Problem 4-3 Subdirectories, Problem 4-1 Large Files, Project 4--File Systems +@section Problem 4-2: File Growth + +Implement extensible files. In the basic file system, the file size +is specified when the file is created. One advantage of this is that +the FileHeader data structure, once created, never changes. In UNIX +and most other file systems, a file is initially created with size 0 +and is then expanded every time a write is made off the end of the +file. Modify the file system to allow this. As one test case, allow +the root directory file to expand beyond its current limit of ten +files. Make sure that concurrent accesses to the file header remain +properly synchronized. + +@node Problem 4-3 Subdirectories, Problem 4-4 Buffer Cache, Problem 4-2 File Growth, Project 4--File Systems +@section Problem 4-3: Subdirectories + +Implement a hierarchical name space. In the basic file system, all +files live in a single directory. Modify this to allow directories to +point to either files or other directories. To do this, you will need +to implement routines that parse path names into a sequence of +directories, as well as routines that change the current working +directory and that list the contents of the current directory. For +performance, allow concurrent updates to different directories, but +use mutual exclusion to ensure that updates to the same directory are +performed atomically (for example, to ensure that a file is deleted +only once). + +Make sure that directories can expand beyond their original size just +as any other file can. + +To take advantage of hierarchical name spaces in user programs, +provide the following syscalls: + +@table @code +@item SYS_chdir +@itemx bool chdir (const char *@var{dir}) +Attempts to change the current working directory of the process to +@var{dir}, which may be either relative or absolute. Returns true if +successful, false on failure. + +@item SYS_mkdir +@itemx bool mkdir (const char *dir) +Attempts to create the directory named @var{dir}, which may be either +relative or absolute. Returns true if successful, false on failure. + +@item SYS_lsdir +@itemx void lsdir (void) +Prints a list of files in the current directory to @code{stdout}, one +per line. +@end table + +Also write the @command{ls} and @command{mkdir} user programs. This +is straightforward once the above syscalls are implemented. If Unix, +these are programs rather than built-in shell commands, but +@command{cd} is a shell command. (Why?) + +@node Problem 4-4 Buffer Cache, File System Design Document Requirements, Problem 4-3 Subdirectories, Project 4--File Systems +@section Problem 4-4: Buffer Cache + +Modify the file system to keep a cache of file blocks. When a request +is made to read or write a block, check to see if it is stored in the +cache, and if so, fetch it immediately from the cache without going to +disk. (Otherwise, fetch the block from disk into cache, evicting an +older entry if necessary.) You are limited to a cache no greater than +64 sectors in size. Be sure to choose an intelligent cache +replacement algorithm. Experiment to see what combination of use, +dirty, and other information results in the best performance, as +measured by the number of disk accesses. (For example, metadata is +generally more valuable to cache than data.) Document your +replacement algoritm in your design document. + +In addition to the basic file caching scheme, your implementation +should also include the following enhancements: + +@table @b +@item write-behind: +Instead of always immediately writing modified data to disk, dirty +blocks can be kept in the cache and written out sometime later. Your +buffer cache should write behind whenever a block is evicted from the +cache. + +@item read-ahead: +Your buffer cache should automatically fetch the next block of a file +into the cache when one block of a file is read, in case that block is +about to be read. +@end table + +For each of these three optimizations, design a file I/O workload that +is likely to benefit from the enhancement, explain why you expect it +to perform better than on the original file system implementation, and +demonstrate the performance improvement. + +Note that write-behind makes your filesystem more fragile in the face +of crashes. Therefore, you should implement some manner to +periodically write all cached blocks to disk. If you have +@code{timer_msleep()} from the first project working, this is an +excellent application for it. + +Likewise, read-ahead is only really useful when done asynchronously. +That is, if a process wants disk block 1 from the file, it needs to +block until disk block 1 is read in, but once that read is complete, +control should return to the process immediately while the read +request for disk block 2 is handled asynchronously. In other words, +the process will block to wait for disk block 1, but should not block +waiting for disk block 2. + +FIXME +When you're implementing this, please make sure you have a scheme for +making any read-ahead and write-behind threads halt when Pintos is +``done'' (when the user program has completed, etc), so that Pintos +will halt normally and print its various statistics. + +@node File System Design Document Requirements, File System FAQs, Problem 4-4 Buffer Cache, Project 4--File Systems +@section Design Document Requirements + +As always, submit a design document file summarizing your design. Be +sure to cover the following points : + +@itemize @bullet +@item +How did you structure your inodes? How many blocks did you access +directly, via single-indirection, and/or via double-indirection? Why? + +@item +How did you structure your buffer cache? How did you perform a lookup +in the cache? How did you choose elements to evict from the cache? + +@item +How and when did you flush the cache? +@end itemize + +@node File System FAQs, , File System Design Document Requirements, Project 4--File Systems +@section FAQ + +@enumerate 1 +@item +@b{What extra credit opportunities are available for this assignment?} + +@itemize @bullet +@item +We'll give out extra credit to groups that implement Unix-style +support for @file{.} and @file{..} in relative paths in their projects. + +@item +We'll give some extra credit if you submit with VM enabled. If you do +this, make sure you show us that you can run multiple programs +concurrently. A particularly good demonstration is running +@file{capitalize} (with a reduced words file that fits comfortably on +your disk, of course). So submit a file system disk that contains a +VM-heavy program like @file{capitalize}, so we can try it out. And also +include the results in your test case file. + +We feel that you will be much more satisfied with your cs140 ``final +product'' if you can get your VM working with your file system. It's +also a great stress test for your FS, but obviously you have to be +pretty confident with your VM if you're going to submit this extra +credit, since you'll still lose points for failing FS-related tests, +even if the problem is in your VM code. + +@item +A point of extra credit can be assigned if a user can recursively +remove directories from the shell command prompt. Note that the +typical semantic is to just fail if a directory is not empty. +@end itemize + +Make sure that you discuss any extra credit in your @file{README} +file. We're likely to miss it if it gets buried in your design +document. + +@item +@b{What exec modes for running Pintos do I absolutely need to +support?} + +FIXME FIXME +The most standard mode is to run your Pintos with all the command +flags on one command line, like this: @samp{pintos -f -cp shell +shell -ex "shell"}. However, you also need to support these flags +individually---especially since that's how the grader tests your +program. Thus, you should be able to run the above instead as: + +FIXME +@example +pintos -f +pintos -cp shell shell +pintos -ex "shell" +@end example + +Note that this also provides a way for you to validate that your disks +are really persistent. This is a common problem with a write behind +cache: if you don't shut down properly it will leave the disk in an +inconsistent state. + +@item +@b{Will you test our file system with a different @code{DISK_SECTOR_SIZE}?} + +No, @code{DISK_SECTOR_SIZE} will not change. + +@item +@b{Will the @code{struct filehdr} take up space on the disk too?} + +Yes. Anything stored in @code{struct filehdr} takes up space on disk, +so you must include this in your calculation of how many entires will +fit in a single disk sector. + +@item +File Growth FAQs + +@enumerate 1 +@item +@b{What is the largest file size that we are supposed to support?} + +The disk we create will be 8 MB or smaller. However, individual files +will have to be smaller than the disk to accommodate the metadata. +You'll need to consider this when deciding your @code{struct filehdr} +(inode) organization. +@end enumerate + +@item +Subdirectory FAQs + +@enumerate 1 +@item +@b{What's the answer to the question in the spec about why +@command{ls} and @command{mkdir} are user programs, while @command{cd} +is a shell command?} + +Each process maintains its own current working directory, so it's much +easier to change the current working directory of the shell process if +@command{cd} is implemented as a shell command rather than as another +user process. In fact, Unix-like systems don't provide any way for +one process to change another process's current working directory. + +@item +@b{When the spec states that directories should be able to grow beyond +ten files, does this mean that there can still be a set maximum number +of files per directory that is greater than ten, or should directories +now support unlimited growth (bounded by the maximum supported file +size)?} + +We're looking for directories that can support arbitrarily large +numbers of files. Now that directories can grow, we want you to +remove the concept of a preset maximum file limit. + +@item +@b{When should the @code{lsdir} system call return?} + +The @code{lsdir} system call should not return until after the +directory has been printed. Here's a code fragment, and the desired +output: + +@example +printf ("Start of directory\n"); +lsdir (); +printf ("End of directory\n"); +@end example + +This code should create the following output: + +@example +Start of directory +... directory contents ... +End of directory +@end example + +@item +@b{Do we have to implement both absolute and relative pathnames?} + +Yes. Implementing @file{.} and @file{..} is extra credit, though. + +@item +@b{Should @code{remove()} also be able to remove directories?} + +Yes. The @code{remove} system call should handle removal of both +regular files and directories. You may assume that directories can +only be deleted if they are empty, as in Unix. +@end enumerate + +@item +Buffer Cache FAQs + +@enumerate 1 +@item +@b{We're limited to a 64-block cache, but can we also keep a copy of +each @code{struct filehdr} for an open file inside @code{struct file}, +the way the stub code does?} + +No, you shouldn't keep any disk sectors stored anywhere outside the +cache. That means you'll have to change the way the file +implementation accesses its corresponding inode right now, since it +currently just creates a new @code{struct filehdr} in its constructor +and reads the corresponding sector in from disk when it's created. + +There are two reasons for not storing inodes in @code{struct file}. +First, keeping extra copies of inodes would be cheating the 64-block +limitation that we place on your cache. Second, if two processes have +the same file open, you will create a huge synchronization headache +for yourself if each @code{struct file} has its own copy of the inode. + +Note that you can store pointers to inodes in @code{struct file} if +you want, and you can store some other small amount of information to +help you find the inode when you need it. + +Similarly, if you want to store one block of data plus some small +amount of metadata for each of your 64 cache entries, that's fine. + +@item +@b{But why can't we store copies of inodes in @code{struct file}? We +don't understand the answer to the previous question.} + +The issue regarding storing @code{struct filehdr}s has to do with +implementation of the buffer cache. Basically, you can't store a +@code{struct filehdr *} in @code{struct filehdr}. Each time you need +to read a @code{struct filehdr}, you'll have to get it either from the +buffer cache or from disk. + +If you look at @code{file_read_at()}, it uses @code{hdr} directly +without having first read in that sector from wherever it was in the +storage hierarchy. You are no longer allowed to do this. You will +need to change @code{file_read_at} (and similar functions) so that it +reads @code{hdr} from the storage hierarchy before using it. +@end enumerate +@end enumerate diff --git a/doc/projects.texi b/doc/projects.texi new file mode 100644 index 0000000..bc5cd13 --- /dev/null +++ b/doc/projects.texi @@ -0,0 +1,14 @@ +@node Top, Project 1--Threads, (dir), (dir) +@top Pintos Projects + +@menu +* Project 1--Threads:: +* Project 2--User Programs:: +* Project 3--Virtual Memory:: +* Project 4--File Systems:: +@end menu + +@include threads.texi +@include userprog.texi +@include vm.texi +@include filesys.texi diff --git a/doc/threads.texi b/doc/threads.texi new file mode 100644 index 0000000..c1fb41d --- /dev/null +++ b/doc/threads.texi @@ -0,0 +1,574 @@ +@node Project 1--Threads, Project 2--User Programs, Top, Top +@chapter Project 1: Threads + +In this assignment, we give you a minimally functional thread system. +Your job is to extend the functionality of this system to gain a +better understanding of synchronization problems. Additionally, you +will use at least part of this increased functionality in future +assignments. + +You will be working in the @file{threads} and @file{devices} +directories for this assignment. Compilation should be done in the +@file{threads} directory. + +@menu +* Understanding Threads:: +* Debugging versus Testing:: +* Tips:: +* Problem 1-1 Alarm Clock:: +* Problem 1-2 Join:: +* Problem 1-3 Priority Scheduling:: +* Problem 1-4 Advanced Scheduler:: +* Threads FAQ:: +@end menu + +@node Understanding Threads, Debugging versus Testing, Project 1--Threads, Project 1--Threads +@section Understanding Threads + +The first step is to read and understand the initial thread system. +Pintos, by default, implements thread creation and thread completion, +a simple scheduler to switch between threads, and synchronization +primitives (semaphores, locks, and condition variables). +@c FIXME: base system doesn't do anything. Debugger sucks. +However, there's a lot of magic going on in some of this code, so you +should compile and run the base system to see a simple test of the +code. You should trace the execution using your favorite debugger to +get a sense for what's going on. + +When a thread is created, you are creating a new context to be +scheduled. You provide a function to be run in this context as an +argument to @code{thread_create()}. The first time the thread is +scheduled and runs, it will start from the beginning of that function +and execute it in the context. When that function returns, that thread +completes. Each thread, therefore, acts like a mini-program running +inside Pintos, with the function passed to @code{thread_create()} +acting like @code{main()}. + +At any given time, Pintos is running exactly one thread, with the +others switched out. The scheduler decides which thread to run next +when it needs to switch between them. (If no thread is ready to run +at any given time, then the special ``idle'' thread runs.) The +synchronization primitives are used to force context switches when one +thread needs to wait for another thread to do something. + +The exact mechanics of a context switch are pretty gruesome and have +been provided for you in @file{switch.S} (this is 80@var{x}86 +assembly; don't worry about understanding it). It involves saving the +state of the currently running thread and restoring the state of the +thread we're switching to. +@c FIXME +Slowly trace through a context switch to see what happens. Be sure to +keep track of each thread's address and state, and what procedures are +on the call stack for each thread. You will notice that when one +thread calls @code{switch_threads()}, another thread starts running, +and the first thing the new thread does is to return from +@code{switch_threads()}. We realize this comment will seem cryptic to +you at this point, but you will understand threads once you understand +why the @code{switch_threads()} that gets called is different from the +@code{switch_threads()} that returns. + +@strong{Warning}: In Pintos, each thread is assigned a small, +fixed-size execution stack just under @w{4 kB} in size. The kernel +does try to detect stack overflow, but it cannot always succeed. You +ma cause bizarre problems, such as mysterious kernel panics, if you +declare large data structures as non-static local variables, +e.g. @samp{int buf[1000];}. Alternatives to stack allocation include +the page allocator in @file{threads/palloc.c} and the block allocator +in @file{threads/malloc.c}. Note that the page allocator doles out +@w{4 kB} chunks and that @code{malloc()} has a @w{2 kB} block size +limit. If you need larger chunks, consider using a linked structure +instead. + +@node Debugging versus Testing, Tips, Understanding Threads, Project 1--Threads +@section Debugging versus Testing + +When you're debugging code, it's useful to be able to be able to run a +program twice and have it do exactly the same thing. On second and +later runs, you can make new observations without having to discard or +verify your old observations. This property is called +``reproducibility.'' The simulator we use, Bochs, can be set up for +reproducibility. If you use the Bochs configuration files we provide, +which specify @samp{ips: @var{n}} where @var{n} is a number of +simulated instructions per second, your simulations can be +reproducible. + +Of course, a simulation can only be reproducible from one run to the +next if its input is the same each time. For simulating an entire +computer, as we do, this means that every part of the computer must be +the same. For example, you must use the same disks, the same version +of Bochs, and you must not hit any keys on the keyboard (because you +could not be sure to hit them at exactly the same point each time) +during the runs. + +While reproducibility is useful for debugging, it is a problem for +testing thread synchronization, an important part of this project. In +particular, when Bochs is set up for reproducibility, timer interrupts +will come at perfectly reproducible points, and therefore so will +thread switches. That means that running the same test several times +doesn't give you any greater confidence in your code's correctness +than does running it only once. + +So, to make your code easier to test, we've added a feature to Bochs +that makes timer interrupts come at random intervals, but in a +perfectly predictable way. In particular, if you put a line +@samp{ips-jitter: @var{seed}}, where @var{seed} is an integer, into +your Bochs configuration file, then timer interrupts will come at +irregularly spaced intervals. Within a single @var{seed} value, +execution will still be reproducible, but timer behavior will change +as @var{seed} is varied. Thus, for the highest degree of confidence +you should test your code with many seed values. + +@node Tips, Problem 1-1 Alarm Clock, Debugging versus Testing, Project 1--Threads +@section Tips + +There should be no busy-waiting in any of your solutions to this +assignment. Furthermore, resist the temptation to directly disable +interrupts in your solution by calling @code{intr_disable()} or +@code{intr_set_level()}, although you may find doing so to be useful +while debugging. Instead, use semaphores, locks and condition +variables to solve synchronization problems. Hint: read the comments +in @file{threads/synch.h} if you're unsure what synchronization +primitives may be used in what situations. + +Given some designs of some problems, there may be one or two instances +in which it is appropriate to directly change the interrupt levels +instead of relying on the given synchroniztion primitives. This must +be justified in your @file{DESIGNDOC} file. If you're not sure you're +justified, ask! + +While all parts of this assignment are required if you intend to earn +full credit on this project, keep in mind that Problem 2 (Join) will +be needed for future assignments, so you'll want to get this one +right. We don't give out solutions, so you're stuck with your Join +code for the whole quarter. Problem 1 (Alarm Clock) could be very +handy, but not strictly required in the future. The upshot of all +this is that you should focus heavily on making sure that your +implementation of Join works correctly, since if it's broken, you will +need to fix it for future assignments. The other parts can be turned +off in the future if you find you can't make them work quite right. + +Also keep in mind that Problem 4 (the MFQS) builds on the features you +implement in Problem 3, so to avoid unnecessary code duplication, it +would be a good idea to divide up the work among your team members +such that you have Problem 3 fully working before you begin to tackle +Problem 4. + +@node Problem 1-1 Alarm Clock, Problem 1-2 Join, Tips, Project 1--Threads +@section Problem 1-2: Alarm Clock + +Improve the implementation of the timer device defined in +@file{devices/timer.c} by reimplementing @code{timer_msleep(0}. +Threads call @code{timer_msleep(@var{x})} to suspend execution until +time has advanced by at least @w{@var{x} milliseconds}. This is +useful for threads that operate in real-time, for example, for +blinking the cursor once per second. There is no requirement that +threads start running immediately after waking up; just put them on +the ready queue after they have waited for approximately the right +amount of time. + +A working implementation of this function is provided. However, the +version provided is poor, because it ``busy waits,'' that is, it spins +in a tight loop checking the current time until the current time has +advanced far enough. This is undesirable because it wastes time that +could potentially be used more profitably by another thread. Your +solution should not busy wait. + +The argument to @code{timer_msleep()} is expressed in milliseconds. +You must convert it into timer ticks, rounding up. The code provided +does this acceptably; there is no need to change it. + +@node Problem 1-2 Join, Problem 1-3 Priority Scheduling, Problem 1-1 Alarm Clock, Project 1--Threads +@section Problem 1-2: Join + +Implement @code{thread_join(struct thread *)} in +@file{threads/thread.c}. There is already a prototype for it in +@file{threads/thread.h}, which you should not change. This function +causes the currently running thread to block until thread passed as an +argument exits. If A is the running thread and B is the argument, +then we say that ``A joins B'' in this case. + +The model for @code{thread_join()} is the @command{wait} system call +in Unix-like systems. (Try reading the manpages.) That system call +can only be used by a parent process to wait for a child's death. You +should implement @code{thread_join()} to have the same restriction. +That is, a thread may only join on its immediate children. + +A thread need not ever be joined. Your solution should properly free +all of a thread's resources, including its @code{struct thread}, +whether it is ever joined or not, and regardless of whether the child +exits before or after its parent. That is, a thread should be freed +exactly once in all cases. + +Joining a given thread is idempotent. That is, joining a thread T +multiple times is equivalent to joining it once, because T has already +exited at the time of the later joins. Thus, joins on T after the +first should return immediately. + +The behavior of calling @code{thread_join()} on an thread that is not +the caller's child is undefined. You need not handle this case +gracefully. + +Consider all the ways a join can occur: nested joins (A joins B when B +is joined on C), multiple joins (A joins B, then A joins C), and so +on. Does your join work if @code{thread_join()} is called on a thread +that has not yet been scheduled for the first time? You should handle +all of these cases. Write test code that demonstrates the cases your +join works for. Don't overdo the output volume, please! + +Be careful to program this function correctly. You will need its +functionality for project 2. + +@node Problem 1-3 Priority Scheduling, Problem 1-4 Advanced Scheduler, Problem 1-2 Join, Project 1--Threads +@section Problem 1-3 Priority Scheduling + +Implement priority scheduling in Pintos. Priority +scheduling is a key building block for real-time systems. Implement functions +@code{thread_set_priority()} to set the priority of a thread and +@code{thread_get_priority()} to get the priority of a thread. There +are already prototypes for these functions in @file{threads/thread.h}, +which you should not change. + +When a thread is added to the ready list that has a higher priority +than the currently running thread, the current thread should +immediately yield the processor to the new thread. Similarly, when +threads are waiting for a lock, semaphore or condition variable, the +highest priority waiting thread should be woken up first. A thread's +priority may be set at any time, including while the thread is waiting +on a lock, semaphore, or condition variable. + +One issue with priority scheduling is ``priority inversion'': if a +high priority thread needs to wait for a low priority thread (for +instance, for a lock held by a low priority thread, or in +@code{thread_join()} for a thread to complete), and a middle priority +thread is on the ready list, then the high priority thread will never +get the CPU because the low priority thread will not get any CPU time. +A partial fix for this problem is to have the waiting thread +``donate'' its priority to the low priority thread while it is holding +the lock, then recall the donation once it has acquired the lock. +Implement this fix. + +You will need to account for all different orders that priority +donation and inversion can occur. Be sure to handle multiple +donations, in which multiple priorities are donated to a thread. You +must also handle nested donation: given high, medium, and low priority +threads H, M, and L, respectively, and supposing H is waiting on a +lock that M holds and M is waiting on a lock that L holds, both M and +L should be boosted to H's priority. + +You only need to implement priority donation when a thread is waiting +for a lock held by a lower-priority thread. You do not need to +implement this fix for semaphores, condition variables or joins. +However, you do need to implement priority scheduling in all cases. + +@node Problem 1-4 Advanced Scheduler, Threads FAQ, Problem 1-3 Priority Scheduling, Project 1--Threads +@section Problem 1-4 Advanced Scheduler + +Implement Solaris's multilevel feedback queue scheduler (MFQS), as +explained below, to reduce the average response time for running jobs +on your system. +@c FIXME need link + +Demonstrate that your scheduling algorithm reduces response time +relative to the original Pintos scheduling algorithm (round robin) for +at least one workload of your own design (i.e. in addition to the +provided test). + +You may assume a static priority for this problem. It is not necessary +to ``re-donate'' a thread's priority if it changes (although you are +free to do so). + +@node Threads FAQ, , Problem 1-4 Advanced Scheduler, Project 1--Threads +@section FAQ + +@enumerate 1 +@item General FAQs + +@enumerate 1 +@item +@b{I am adding a new @file{.h} or @file{.c} file. How do I fix the +@file{Makefile}s?} + +To add a @file{.c} file, edit the top-level @file{Makefile.build}. +You'll want to add your file to variable @samp{@var{dir}_SRC}, where +@var{dir} is the directory where you added the file. For this +project, that means you should add it to @code{threads_SRC}, or +possibly @code{devices_SRC} if you put in the @file{devices} +directory. Then run @code{make}. If your new file doesn't get +compiled, run @code{make clean} and then try again. + +There is no need to edit the @file{Makefile}s to add a @file{.h} file. + +@item +@b{If a thread finishes, should its children be terminated immediately, +or should they finish normally?} + +You should feel free to decide what semantics you think this +should have. You need only provide justification for your +decision. + +@item +@b{Why can't I disable interrupts?} + +Turning off interrupts should only be done for short amounts of time, +or else you end up losing important things such as disk or input +events. Turning off interrupts also increases the interrupt handling +latency, which can make a machine feel sluggish if taken too far. +Therefore, in general, setting the interrupt level should be used +sparingly. Also, any synchronization problem can be easily solved by +turning interrupts off, since while interrupts are off, there is no +concurrency, so there's no possibility for race condition. + +To make sure you understand concurrency well, we are discouraging you +from taking this shortcut at all in your solution. If you are unable +to solve a particular synchronization problem with semaphores, locks, +or conditions, or think that they are inadequate for a particular +reason, you may turn to disabling interrupts. If you want to do this, +we require in your design document a complete justification and +scenario (i.e.@: exact sequence of events) to show why interrupt +manipulation is the best solution. If you are unsure, the TAs can +help you determine if you are using interrupts too haphazardly. We +want to emphasize that there are only limited cases where this is +appropriate. + +@item +@b{Where might interrupt-level manipuation be appropriate?} + +You might find it necessary in some solutions to the Alarm problem. + +You might want it at one small point for the priority scheduling +problem. Note that it is not required to use interrupts for these +problems. There are other, equally correct solutions that do not +require interrupt manipulation. However, if you do manipulate +interrupts and @strong{correctly and fully document it} in your design +document, we will allow limited use of interrupt disabling. +@end enumerate + +@item Alarm Clock FAQs + +@enumerate 1 +@item +@b{Why can't I use most synchronization primitives in an interrupt +handler?} + +As you've discovered, you cannot sleep in an external interrupt +handler. Since many lock, semaphore, and condition variable functions +attempt to sleep, you won't be able to call those in +@code{timer_interrupt()}. You may still use those that never sleep. + +Having said that, you need to make sure that global data does not get +updated by multiple threads simultaneously executing +@code{timer_msleep()}. Here are some pieces of information to think +about: + +@enumerate a +@item +Interrupts are turned off while @code{timer_interrupt()} runs. This +means that @code{timer_interrupt()} will not be interrupted by a +thread running in @code{timer_msleep()}. + +@item +A thread in @code{timer_msleep()}, however, can be interrupted by a +call to @code{timer_interrupt()}, except when that thread has turned +off interrupts. + +@item +Examples of synchronization mechanisms have been presented in lecture. +Going over these examples should help you understand when each type is +useful or needed. +@end enumerate + +@item +@b{What about timer overflow due to the fact that times are defined as +integers? Do I need to check for that?} + +Don't worry about the possibility of timer values overflowing. Timer +values are expressed as signed 63-bit numbers, which at 100 ticks per +second should be good for almost 2,924,712,087 years. +@end enumerate + +@item Join FAQs + +@enumerate 1 +@item +@b{Am I correct to assume that once a thread is deleted, it is no +longer accessible by the parent (i.e.@: the parent can't call +@code{thread_join(child)})?} + +A parent joining a child that has completed should be handled +gracefully and should act as a no-op. +@end enumerate + +@item Priority Scheduling FAQs + +@enumerate 1 +@item +@b{Doesn't the priority scheduling lead to starvation? Or do I have to +implement some sort of aging?} + + +It is true that strict priority scheduling can lead to starvation +because thread may not run if a higher-priority thread is runnable. +In this problem, don't worry about starvation or any sort of aging +technique. Problem 4 will introduce a mechanism for dynamically +changing thread priorities. + +This sort of scheduling is valuable in real-time systems because it +offers the programmer more control over which jobs get processing +time. High priorities are generally reserved for time-critical +tasks. It's not ``fair,'' but it addresses other concerns not +applicable to a general-purpose operating system. + +@item +@b{After a lock has been released, does the program need to switch to +the highest priority thread that needs the lock (assuming that its +priority is higher than that of the current thread)?} + +When a lock is released, the highest priority thread waiting for that +lock should be unblocked and put on the ready to run list. The +scheduler should then run the highest priority thread on the ready +list. + +@item +@b{If a thread calls @code{thread_yield()} and then it turns out that +it has higher priority than any other threads, does the high-priority +thread continue running?} + +Yes. If there is a single highest-priority thread, it continues +running until it blocks or finishes, even if it calls +@code{thread_yield()}. + +@item +@b{If the highest priority thread is added to the ready to run list it +should start execution immediately. Is it immediate enough if I +wait until next timer interrupt occurs?} + +The highest priority thread should run as soon as it is runnable, +preempting whatever thread is currently running. + +@item +@b{What happens to the priority of the donating thread? Do the priorities +get swapped?} + +No. Priority donation only changes the priority of the low-priority +thread. The donating thread's priority stays unchanged. Also note +that priorities aren't additive: if thread A (with priority 5) donates +to thread B (with priority 3), then B's new priority is 5, not 8. + +@item +@b{Can a thread's priority be changed while it is sitting on the ready +queue?} + +Yes. Consider this case: low-priority thread L currently has a lock +that high-priority thread H wants. H donates its priority to L (the +lock holder). L finishes with the lock, and then loses the CPU and is +moved to the ready queue. Now L's old priority is restored while it +is in the ready queue. + +@item +@b{Can a thread's priority change while it is sitting on the queue of a +semaphore?} + +Yes. Same scenario as above except L gets blocked waiting on a new +lock when H restores its priority. + +@item +@b{Why is pubtest3's FIFO test skipping some threads! I know my scheduler +is round-robin'ing them like it's supposed to! Our output is like this:} + +@example +Thread 0 goes. +Thread 2 goes. +Thread 3 goes. +Thread 4 goes. +Thread 0 goes. +Thread 1 goes. +Thread 2 goes. +Thread 3 goes. +Thread 4 goes. +@end example + +@noindent @b{which repeats 5 times and then} + +@example +Thread 1 goes. +Thread 1 goes. +Thread 1 goes. +Thread 1 goes. +Thread 1 goes. +@end example + +This happens because context switches are being invoked by the test +when it explicitly calls @code{thread_yield()}. However, the time +slice timer is still alive and so, every tick (by default), thread 1 +gets switched out (caused by @code{timer_interrupt()} calling +@code{intr_yield_on_return()}) before it gets a chance to run its +mainline. It is by coincidence that Thread 1 is the one that gets +skipped in our example. If we use a different jitter value, the same +behavior is seen where a thread gets started and switched out +completely. + +Solution: Increase the value of @code{TIME_SLICE} in +@file{devices/timer.c} to a very high value, such as 10000, to see +that the threads will round-robin if they aren't interrupted. + +@item +@b{What happens when a thread is added to the ready list which has +higher priority than the currently running thread?} + +The correct behavior is to immediately yield the processor. Your +solution must act this way. + +@item +@b{What range of priorities should be supported and what should the +default priority of a thread be?} + +Your implementation should support priorities from 0 through 59 and +the default priority of a thread should be 29. +@end enumerate + +@item Advanced Scheduler FAQs + +@enumerate 1 +@item +@b{What is the interval between timer interrupts?} + +Timer interrupts occur @code{TIMER_FREQ} times per second. You can +adjust this value by editing @file{devices/timer.h}. The default is +100 Hz. + +@item +@b{Do I have to modify the dispatch table?} + +No, although you are allowed to. It is possible to complete +this problem (i.e.@: demonstrate response time improvement) +without doing so. + +@item +@b{When the scheduler changes the priority of a thread, how does this +affect priority donation?} + +Short (official) answer: Don't worry about it. Your priority donation +code may assume static priority assignment. + +Longer (unofficial) opinion: If you wish to take this into account, +however, your design may end up being ``cleaner.'' You have +considerable freedom in what actually takes place. I believe what +makes the most sense is for scheduler changes to affect the +``original'' (non-donated) priority. This change may actually be +masked by the donated priority. Priority changes should only +propagate with donations, not ``backwards'' from donees to donors. + +@item +@b{What is meant by ``static priority''?} + +Once thread A has donated its priority to thread B, if thread A's +priority changes (due to the scheduler) while the donation still +exists, you do not have to change thread B's donated priority. +However, you are free to do so. + +@item +@b{Do I have to make my dispatch table user-configurable?} + +No. Hard-coding the dispatch table values is fine. +@end enumerate +@end enumerate diff --git a/doc/userprog.texi b/doc/userprog.texi new file mode 100644 index 0000000..2b7edb4 --- /dev/null +++ b/doc/userprog.texi @@ -0,0 +1,689 @@ +@node Project 2--User Programs, Project 3--Virtual Memory, Project 1--Threads, Top +@chapter Project 2: User Programs + +Now that you've worked with Pintos and are familiar with its +infrastructure and thread package, it's time to start working on the +parts of the system that will allow users to run programs on top of +your operating system. The base code already supports loading and +running a single user program at a time with little interactivity +possible. You will allow multiple programs to be loaded in at once, +and to interact with the OS via system calls. + +You will be working out of the @file{userprog} directory for this +assignment. However, you will also be interacting with almost every +other part of the code for this assignment. We will describe the +relevant parts below. If you are confident in your HW1 code, you can +build on top of it. However, if you wish you can start with a fresh +copy of the code and re-implement @code{thread_join()}, which is the +only part of project #1 required for this assignment. + +Up to now, all of the code you have written for Pintos has been part +of the operating system kernel. This means, for example, that all the +test code from the last assignment ran as part of the kernel, with +full access to privileged parts of the system. Once we start running +user programs on top of the operating system, this is no longer true. +This project deals with consequences of the change. + +We allow more than one user program to run at a time. Because user +programs are written and compiled to work under the illusion that they +have the entire machine, when you load into memory and run more than +one process at a time, you must manage things correctly to maintain +this illusion. + +Before we delve into the details of the new code that you'll be +working with, you should probably undo the test cases from project 1. +All you need to do is make sure the original +@file{threads/pintostest.c} is in place. This will stop the tests +from being run. + +@menu +* Project 2 Code to Hack:: +* How User Programs Work:: +* Global Requirements:: +* Problem 2-1 Argument Passing:: +* Problem 2-2 System Calls:: +* User Programs FAQ:: +* 80x86 Calling Convention:: +* System Calls:: +@end menu + +@node Project 2 Code to Hack, How User Programs Work, Project 2--User Programs, Project 2--User Programs +@section Code to Hack + +The easiest way to get an overview of the programming you will be +doing is to simply go over each part you'll be working with. In +@file{userprog}, you'll find a small number of files, but here is +where the bulk of your work will be: + +@table @file +@item addrspace.c +@itemx addrspace.h +An address space keeps track of all the data necessary to execute a +user program. Address space data is stored in @code{struct thread}, +but manipulated only by @file{addrspace.c}. Address spaces need to +keep track of things like paging information for the process (so that +it knows which memory the process is using). Address spaces also +handle loading the program into memory and starting up the process's +execution. + +@item syscall.c +@itemx syscall.h +Whenever a user process wants to access some kernel functionality, it +needs to do so via a system call. This is a skeleton system call +handler. Currently, it just prints a message and terminates the user +process. In part 2 of this project you will add code to do everything +else needed by system calls. + +@item exception.c +@itemx exception.h +When a user process performs a privileged or prohibited operation, it +traps into the kernel as an ``exception'' or ``fault.''@footnote{We +will treat these terms as synonymous. There is no standard +distinction between them, although the Intel processor manuals define +them slightly differently on 80@var{x}86.} These files handle +exceptions. Currently all exceptions simply print a message and +terminate the process. @strong{You should not need to modify this +file for project 2.} + +@item gdt.c +@itemx gdt.c +The 80@var{x}86 is a segmented architecture. The Global Descriptor +Table (GDT) is a table that describes the segments in use. These +files set up the GDT. @strong{You should not need to modify these +files for any of the projects.} However, you can read the code if +you're interested in how the GDT works. + +@item tss.c +@itemx tss.c +The Task-State Segment (TSS) is used for 80@var{x}86 architectural +task switching. Pintos uses the TSS only for switching stacks when a +user process enters an interrupt handler, as does Linux. @strong{You +should not need to modify these files for any of the projects.} +However, you can read the code if you're interested in how the GDT +works. +@end table + +Elsewhere in the kernel, you will need to use some file system code. +You will not actually write a file system until the end of the +quarter, but since user programs need files to do anything +interesting, we have provided a simple file system in the +@file{filesys} directory. You will want to look over the +@file{filesys.h} and @file{file.h} interfaces to understand how to use +the file system. However, @strong{you should not modify the file +system code for this project}. Proper use of the file system routines +now will make life much easier for project 4, when you improve the +file system implementation. + +Finally, in @file{lib/kernel}, you might want to use +@file{bitmap.[ch]}. A bitmap is basically an array of bits, each of +which can be true or false. Bitmaps are typically used to keep track +of the usage of a large array of (identical) resources: if resource +@var{n} is in use, then bit @var{n} of the bitmap is true. You might +find it useful for tracking memory pages, for example. + +@node How User Programs Work, Global Requirements, Project 2 Code to Hack, Project 2--User Programs +@section How User Programs Work + +Pintos can run normal C programs. In fact, it can run any program you +want, provided it's compiled into the proper file format, and uses +only the system calls you implement. (For example, @code{malloc()} +makes use of functionality that isn't provided by any of the syscalls +we require you to support.) The only other limitation is that Pintos +can't run programs using floating point operations, since it doesn't +include the necessary kernel functionality to save and restore the +processor's floating-point unit when switching threads. You can look +in @file{test} directory for some examples. + +Pintos loads ELF executables, where ELF is an executable format used +by Linux, Solaris, and many other Unix and Unix-like systems. +Therefore, you can use any compiler and linker that produce +80@var{x}86 ELF executables to produce programs for Pintos. We +recommend using the tools we provide in the @file{test} directory. By +default, the @file{Makefile} in this directory will compile the test +programs we provide. You can edit the @file{Makefile} to compile your +own test programs as well. + +@node Global Requirements, Problem 2-1 Argument Passing, How User Programs Work, Project 2--User Programs +@section Global Requirements + +For testing and grading purposes, we have some simple requirements for +your output. The kernel should print out the program's name and exit +status whenever a process exits. Aside from this, it should print out +no other messages. You may understand all those debug messages, but +we won't, and it just clutters our ability to see the stuff we care +about. Additionally, while it may be useful to hard-code which +process will run at startup while debugging, before you submit your +code you must make sure that it takes the start-up process name and +arguments from the @samp{-ex} argument. The infrastructure for this +is already there---you just need to make sure you enable it! For +example, running @code{pintos -ex "testprogram 1 2 3 4"} will spawn +@samp{testprogram 1 2 3 4} as the first process. + +@node Problem 2-1 Argument Passing, Problem 2-2 System Calls, Global Requirements, Project 2--User Programs +@section Problem 2-1: Argument Passing + +Currently, @code{thread_execute()} does not support passing arguments +to new processes. UNIX and other operating systems do allow passing +command line arguments to a program, which accesses them via the argc, +argv arguments to main. You must implement this functionality by +extending @code{thread_execute()} so that instead of simply taking a +program file name, it can take a program name with arguments as a +single string. That is, @code{thread_execute("grep foo *.c")} should +be a legal call. @xref{80x86 Calling Convention}, for information on +exactly how this works. + +@strong{This functionality is extremely important.} Almost all our +test cases rely on being able to pass arguments, so if you don't get +this right, a lot of things will not appear to work correctly with our +tests. If the tests fail, so do you. Fortunately, this part +shouldn't be too hard. + +@node Problem 2-2 System Calls, User Programs FAQ, Problem 2-1 Argument Passing, Project 2--User Programs +@section Problem 2-2: System Calls + +Implement the system call handler in @file{userprog/syscall.c} to +properly deal with all the system calls described below. Currently, +it ``handles'' system calls by terminating the process. You will need +to decipher system call arguments and take the appropriate action for +each. + +In addition, implement system calls and system call handling. You are +required to support the following system calls, whose syscall numbers +are defined in @file{lib/syscall-nr.h} and whose C functions called by +user programs are prototyped in @file{lib/user/syscall.h}: + +@table @code +@item SYS_halt +@itemx void halt (void) +Stops Pintos and prints out performance statistics. Note that this +should be seldom used, since then you lose some information about +possible deadlock situations, etc. + +@item SYS_exit +@itemx void exit (int @var{status}) +Terminates the current user program, returning @var{status} to the +kernel. A @var{status} of 0 indicates a successful exit. Other +values may be used to indicate user-defined error conditions. + +@item SYS_exec +@itemx pid_t exec (const char *@var{file}) +Run the executable in @var{file} and return the new process's program +id (pid). If there is an error loading this program, returns pid -1, +which otherwise should not be a valid id number. + +@item SYS_join +@itemx int join (pid_t @var{pid}) +Joins the process @var{pid}, using the join rules from the last +assignment, and returns the process's exit status. If the process was +terminated by the kernel (i.e.@: killed due to an exception), the exit +status should be -1. If the process was not a child process, the +return value is undefined (but kernel operation must not be +disrupted). + +@item SYS_create +@itemx bool create (const char *@var{file}) +Create a new file called @var{file}. Returns -1 if failed, 0 if OK. + +@item SYS_remove +@itemx bool remove (const char *@var{file}) +Delete the file called @var{file}. Returns -1 if failed, 0 if OK. + +@item SYS_open +@itemx int open (const char *@var{file}) +Open the file called @var{file}. Returns a nonnegative integer handle +called a ``file descriptor'' (fd), or -1 if the file could not be +opened. File descriptors numbered 0 and 1 are reserved for the +console. All open files associated with a process should be closed +when the process exits or is terminated. + +@item SYS_filesize +@itemx int filesize (int @var{fd}) +Returns the size, in bytes, of the file open as @var{fd}, or -1 if the +file is invalid. + +@item SYS_read +@itemx int read (int @var{fd}, void *@var{buffer}, unsigned @var{size}) +Read @var{size} bytes from the file open as @var{fd} into +@var{buffer}. Returns the number of bytes actually read, or -1 if the +file could not be read. + +@item SYS_write +@itemx int write (int @var{fd}, const void *@var{buffer}, unsigned @var{size}) +Write @var{size} bytes from @var{buffer} to the open file @var{fd}. +Returns the number of bytes actually written, or -1 if the file could +not be written. + +@item SYS_close +@itemx void close (int @var{fd}) +Close file descriptor @var{fd}. +@end table + +The file defines other syscalls. Ignore them for now. You will +implement some of them in project 3 and the rest in project 4, so be +sure to design your system with extensibility in mind. + +To implement syscalls, you will need to provide a way of copying data +from the user's virtual address space into the kernel and vice versa. +This can be a bit tricky: what if the user provides an invalid +pointer, a pointer into kernel memory, or points to a block that is +partially in one of those regions? You should handle these cases by +terminating the user process. You will need this code before you can +even obtain the system call number, because the system call number is +on the user's stack in the user's virtual address space. We recommend +writing and testing this code before implementing any other system +call functionality. + +You must make sure that system calls are properly synchronized so that +any number of user processes can make them at once. In particular, it +is not safe to call into the filesystem code provided in the +@file{filesys} directory from multiple threads at once. For now, we +recommend adding a single lock that controls access to the filesystem +code. You should acquire this lock before calling any functions in +the @file{filesys} directory, and release it afterward. Because it +calls into @file{filesys} functions, you will have to modify +@file{addrspace_load()} in the same way. @strong{For now, we +recommend against modifying code in the @file{filesys} directory.} + +We have provided you a function for each system call in +@file{lib/user/syscall.c}. These provide a way for user processes to +invoke each system call from a C program. Each of them calls an +assembly language routine in @file{lib/user/syscall-stub.S}, which in +turn invokes the system call interrupt and returns. + +When you're done with this part, and forevermore, Pintos should be +bulletproof. Nothing that a user program can do should ever cause the +OS to crash, halt, assert fail, or otherwise stop running. The sole +exception is a call to the @code{halt} system call. + +@xref{System Calls}, for more information on how syscalls work. + +@node User Programs FAQ, 80x86 Calling Convention, Problem 2-2 System Calls, Project 2--User Programs +@section FAQ + +@enumerate 1 +@item General FAQs + +@enumerate 1 +@item +@b{Do we need a working project 1 to implement project 2?} + +You may find the code for @code{thread_join()} to be useful in +implementing the join syscall, but besides that, you can use +the original code provided for project 1. + +@item +@b{Is there a way I can disassemble user programs?} + +@c FIXME +The @command{objdump} utility can disassemble entire user programs or +object files. Invoke it as @code{objdump -d @var{file}}. You can +also use @code{gdb}'s @command{disassemble} command to disassemble +individual functions in object files compiled with debug information. + +@item +@b{Why can't I use many C include files in my Pintos programs?} + +The C library we provide is very limited. It does not include many of +the features that are expected of a real operating system's C library. +The C library must be built specifically for the operating system (and +architecture), since it must make system calls for I/O and memory +allocation. (Not all functions do, of course, but usually the library +is compiled as a unit.) If you wish to port libraries to Pintos, feel +free. + +@item +@b{How do I compile new user programs? How do I make 'echo' compile?} + +You need to modify @file{tests/Makefile}. + +@item +@b{Help, Solaris only allows 128 open files at once!} + +Solaris limits the number of file descriptors a process may keep open +at any given time. The default limit is 128 open file descriptors. + +To see the current limit for all new processes type @samp{limit} at +the shell prompt and look at the line titled ``descriptors''. To +increase this limit to the maximum allowed type @code{ulimit +descriptors} in a @command{csh} derived shell or @code{unlimit +descriptors} in a @command{sh} derived shell. This will increase the +number of open file descriptors your Pintos process can use, but it +will still be limited. + +Refer to the @command{limit(1)} man page for more information. + +@item +@b{I can't seem to figure out how to read from and write to +memory. What should I do?} + +Here are some pointers: + +FIXME + +@item +@b{I'm also confused about reading from and writing to the stack. Can +you help?} + +FIXME: relevant? + +@itemize @bullet +@item +Only non-@samp{char} values will have issues when writing them to +memory. If a digit is in a string, it is considered a character. +However, the value of @code{argc} would be a non-char. + +@item +You will need to write characters and non-characters into main memory. + +@item +When you add items to the stack, you will be decrementing the stack +pointer. You'll need to decrement the stack pointer before writing to +the location. + +@item +Each character is 1 byte. +@end itemize +@end enumerate + +@item Argument Passing FAQs + +@enumerate 1 +@item +@b{What will be the format of command line arguments?} + +You should assume that command line arguments are delimited by white +space. + +@item +@b{How do I parse all these argument strings?} + +We recommend you look at @code{strtok_r()}, prototyped in +@file{lib/string.h} and implemented with thorough comments in +@file{lib/string.c}. You can find more about it by looking at the man +page (run @code{man strtok_r} at the prompt). + +@item +@b{Why is the top of the stack at @t{0xc0000000}? Isn't that off the +top of user virtual memory? Shouldn't it be @t{0xbfffffff}?} + +When the processor pushes data on the stack, it decrements the stack +pointer first. Thus, the first (4-byte) value pushed on the stack +will be at address @t{0xbffffffc}. + +Also, the stack should always be aligned to a 4-byte boundary, but +@t{0xbfffffff} isn't. +@end enumerate + +@item System Calls FAQs + +@enumerate 1 +@item +@b{What should I do with the parameter passed to @code{exit()}?} + +This value, the exit status of the process, must be returned to the +thread's parent when @code{join()} is called. + +@item +@b{Can I just cast a pointer to a @code{struct file} object to get a +unique file descriptor? Can I just cast a @code{struct thread *} to a +@code{pid_t}? It's so much simpler that way!} + +This is a design decision you will have to make for yourself. +However, note that most operating systems do distinguish between file +descriptors (or pids) and the addresses of their kernel data +structures. You might want to give some thought as to why they do so +before committing yourself. + +@item +@b{Can I set a maximum number of open files per process?} + +From a design standpoint, it would be better not to set an arbitrary +maximum. That said, if your design calls for it, you may impose a +limit of 128 open files per process (as the Solaris machines here do). + +@item +@b{What happens when two (or more) processes have a file open and one of +them removes it?} + +FIXME FIXME FIXME + +You should copy the standard Unix semantics for files. That is, when +a file is removed an process which has a file descriptor for that file +may continue to do operations on that descriptor. This means that +they can read and write from the file. The file will not have a name, +and no other processes will be able to open it, but it will continue +to exist until all file descriptors referring to the file are closed +or the machine shuts down. + +@item +@b{What happens if a system call is passed an invalid argument, such +as Open being called with an invalid filename?} + +Pintos should not crash. You should have your system calls check for +invalid arguments and return error codes. + +@item +@b{I've discovered that some of my user programs need more than one 4 +kB page of stack space. What should I do?} + +You may modify the stack setup code to allocate more than one page of +stack space for each process. + +@item +@b{What do I need to print on thread completion?} + +You should print the complete thread name (as specified in the +@code{SYS_exec} call) followed by the exit status code, +e.g.@: @samp{example 1 2 3 4: 0}. +@end enumerate +@end enumerate + +@node 80x86 Calling Convention, System Calls, User Programs FAQ, Project 2--User Programs +@appendixsec 80@var{x}86 Calling Convention + +What follows is a quick and dirty discussion of the 80@var{x}86 +calling convention. Some of the basics should be familiar from CS +107, and if you've already taken CS 143 or EE 182, then you should +have seen even more of it. I've omitted some of the complexity, since +this isn't a class in how function calls work, so don't expect this to +be exactly correct in full, gory detail. If you do want all the +details, you can refer to @cite{[SysV-i386]}. + +Whenever a function call happens, you need to put the arguments on the +call stack for that function, before the code for that function +executes, so that the callee has access to those values. The caller +has to be responsible for this (be sure you understand why). +Therefore, when you compile a program, the assembly code emitted will +have in it, before every function call, a bunch of instructions that +prepares for the call in whatever manner is conventional for the +machine you're working on. This includes saving registers as needed, +putting stuff on the stack, saving the location to return to somewhere +(so that when the callee finishes, it knows where the caller code is), +and some other bookkeeping stuff. Then you do the jump to the +callee's code, and it goes along, assuming that the stack and +registers are prepared in the appropriate manner. When the callee is +done, it looks at the return location as saved earlier, and jumps back +to that location. The caller may then have to do some cleanup: +clearing arguments and the return value off the stack, restoring +registers that were saved before the call, and so on. + +If you think about it, some of these things should remind you of +context switching. + +As an aside, in general, function calls are not cheap. You have to do +a bunch of memory writes to prepare the stack, you need to save and +restore registers before and after a function call, you need to write +the stack pointer, you have a couple of jumps which probably wrecks +some of your caches. This is why inlining code can be much faster. + +@node Argument Passing to main +@subsection Argument Passing to @code{main()} + +In @code{main()}'s case, there is no caller to prepare the stack +before it runs. Therefore, the kernel needs to do it. Fortunately, +since there's no caller, there are no registers to save, no return +address to deal with, etc. The only difficult detail to take care of, +after loading the code, is putting the arguments to @code{main()} on +the stack. + +(The above is a small lie: most compilers will emit code where main +isn't strictly speaking the first function. This isn't an important +detail. If you want to look into it more, try disassembling a program +and looking around a bit. However, you can just act as if +@code{main()} is the very first function called.) + +Pintos is written for the 80@var{x}86 architecture. Therefore, we +need to adhere to the 80@var{x}86 calling convention, which is +detailed in the FAQ. Basically, you put all the arguments on the +stack and move the stack pointer appropriately. The program will +assume that this has been done when it begins running. + +So, what are the arguments to @code{main()}? Just two: an @samp{int} +(@code{argc}) and a @samp{char **} (@code{argv}). @code{argv} is an +array of strings, and @code{argc} is the number of strings in that +array. However, the hard part isn't these two things. The hard part is +getting all the individual strings in the right place. As we go +through the procedure, let us consider the following example command: +@samp{/bin/ls -l *.h *.c}. + +The first thing to do is to break the command line into individual +strings: @samp{/bin/ls}, @samp{-l}, @samp{*.h}, and @samp{*.c}. These +constitute the arguments of the command, including the program name +itself (which belongs in @code{argv[0]}). + +These individual, null-terminated strings should be placed on the user +stack. They may be placed in any order, as you'll see shortly, +without affecting how main works, but for simplicity let's assume they +are in reverse order (keeping in mind that the stack grows downward on +an 80@var{x}86 machine). As we copy the strings onto the stack, we +record their (virtual) stack addresses. These addresses will become +important when we write the argument vector (two paragraphs down). + +After we push all of the strings onto the stack, we adjust the stack +pointer so that it is word-aligned: that is, we move it down to the +next 4-byte boundary. This is required because we will next be +placing several words of data on the stack, and they must be aligned +in order to be read correctly. In our example, as you'll see below, +the strings start at address @t{0xffed}. One word below that would be +at @t{0xffe9}, so we could in theory put the next word on the stack +there. However, since the stack pointer should always be +word-aligned, we instead leave the stack pointer at @t{0xffe8}. + +Once we align the stack pointer, we then push the elements of the +argument vector (that is, the addresses of the strings @samp{/bin/ls}, +@samp{-l}, @samp{*.h}, and @samp{*.c}) onto the stack. This must be +done in reverse order, such that @code{argv[0]} is at the lowest +virtual address (again, because the stack is growing downward). This +is because we are now writing the actual array of strings; if we write +them in the wrong order, then the strings will be in the wrong order +in the array. This is also why, strictly speaking, it doesn't matter +what order the strings themselves are placed on the stack: as long as +the pointers are in the right order, the strings themselves can really +be anywhere. After we finish, we note the stack address of the first +element of the argument vector, which is @code{argv} itself. + +Finally, we push @code{argv} (that is, the address of the first +element of the @code{argv} array) onto the stack, along with the +length of the argument vector (@code{argc}, 4 in this example). This +must also be done in this order, since @code{argc} is the first +argument to main and therefore is on first (smaller address) on the +stack. We leave the stack pointer to point to the location where +@code{argc} is, because it is at the top of the stack, the location +directly below @code{argc}. + +All of which may sound very confusing, so here's a picture which will +hopefully clarify what's going on. This represents the state of the +stack and the relevant registers right before the beginning of the +user program (assuming for this example a 16-bit virtual address space +with addresses from @t{0x0000} to @t{0xffff}): + +@html +
+@end html +@multitable {@t{0xffff}} {word-align} {@t{/bin/ls\0}} +@item Address @tab Name @tab Data +@item @t{0xfffc} @tab @code{*argv[3]} @tab @samp{*.c\0} +@item @t{0xfff8} @tab @code{*argv[2]} @tab @samp{*.h\0} +@item @t{0xfff5} @tab @code{*argv[1]} @tab @samp{-l\0} +@item @t{0xffed} @tab @code{*argv[0]} @tab @samp{/bin/ls\0} +@item @t{0xffec} @tab word-align @tab @samp{\0} +@item @t{0xffe8} @tab @code{argv[3]} @tab @t{0xfffc} +@item @t{0xffe4} @tab @code{argv[2]} @tab @t{0xfff8} +@item @t{0xffe0} @tab @code{argv[1]} @tab @t{0xfff5} +@item @t{0xffdc} @tab @code{argv[0]} @tab @t{0xffed} +@item @t{0xffd8} @tab @code{argv} @tab @t{0xffdc} +@item @t{0xffd4} @tab @code{argc} @tab 4 +@end multitable +@html +
+@end html + +In this example, the stack pointer would be initialized to @t{0xffd4}. + +Your code should start the stack at the very top of the user virtual +address space, in the page just below virtual address @code{PHYS_BASE} +(defined in @file{threads/mmu.h}). + +@node System Calls, , 80x86 Calling Convention, Project 2--User Programs +@appendixsec System Calls + +We have already been dealing with one way that the operating system +can regain control from a user program: interrupts from timers and I/O +devices. These are ``external'' interrupts, because they are caused +by entities outside the CPU. + +The operating system is also called to deal with software exceptions, +which are events generated in response to the code. These can be +errors such as a page fault or division by zero. However, exceptions +are also the means by which a user program can request services +(``system calls'') from the operating system. + +Some exceptions are ``restartable'': the condition that caused the +exception can be fixed and the instruction retried. For example, page +faults call the operating system, but the user code should re-start on +the load or store that caused the exception (not the next one) so that +the memory access actually occurs. On the 80@var{x}86, restartable +exceptions are called ``faults,'' whereas most non-restartable +exceptions are classed as ``traps.'' Other architectures may define +these terms differently. + +In the 80@var{x}86 architecture, the @samp{int} instruction is the +most commonly used means for invoking system calls. This instruction +is handled in the same way that other software exceptions. In Pintos, +user program invoke @samp{int $0x30} to make a system call. The +system call number and any additional arguments are expected to be +pushed on the stack in the normal fashion before invoking the +interrupt. + +The normal calling convention pushes function arguments on the stack +from right to left and the stack grows downward. Thus, when the +system call handler @code{syscall_handler()} gets control, the system +call number is in the 32-bit word at the caller's stack pointer, the +first argument is in the 32-bit word at the next higher address, and +so on. The caller's stack pointer is accessible to +@code{syscall_handler()} as the @samp{esp} member of the @code{struct +intr_frame} passed to it. + +Here's an example stack frame for calling a system call numbered 10 +with three arguments passed as 1, 2, and 3. The stack addresses are +arbitrary: + +@html +
+@end html +@multitable {Address} {Value} +@item Address @tab Value +@item @t{0xfe7c} @tab 3 +@item @t{0xfe78} @tab 2 +@item @t{0xfe74} @tab 1 +@item @t{0xfe70} @tab 10 +@end multitable +@html +
+@end html + +In this example, the caller's stack pointer would be at @t{0xfe70}. + +The 80@var{x}86 convention for function return values is to place them +in the @samp{EAX} register. System calls that return a value can do +so by modifying the @samp{eax} member of @code{struct intr_frame}. diff --git a/doc/vm.texi b/doc/vm.texi new file mode 100644 index 0000000..812915e --- /dev/null +++ b/doc/vm.texi @@ -0,0 +1,657 @@ +@node Project 3--Virtual Memory, Project 4--File Systems, Project 2--User Programs, Top +@chapter Project 3: Virtual Memory + +By now you should be familiar with the inner workings of Pintos. +You've already come a long way: your OS can properly handle multiple +threads of execution with proper synchronization, and can load +multiple user programs at once. However, when loading user programs, +your OS is limited by how much main memory the simulated machine has. +In this assignment, you will remove that limitation. + +You will be using the @file{vm} directory for this project. There is +no new code to get acquainted with for this assignment. The @file{vm} +directory contains only the @file{Makefile}s. The only change from +@file{userprog} is that this new @file{Makefile} turns on the setting +@option{-DVM}, which you will need for this assignment. All code you +write will either be newly generated files (e.g.@: if you choose to +implement your paging code in their own source files), or will be +modifications to pre-existing code (e.g.@: you will change the +behavior of @file{addrspace.c} significantly). + +You will be building this assignment on the last one. It will benefit +you to get your project 2 in good working order before this assignment +so those bugs don't keep haunting you. + +@menu +* VM Design:: +* Page Faults:: +* Disk as Backing Store:: +* Memory Mapped Files:: +* Stack:: +* Problem 3-1 Page Table Management:: +* Problem 3-2 Paging To and From Disk:: +* Problem 3-3 Memory Mapped Files:: +* Virtual Memory FAQ:: +@end menu + +@node VM Design, Page Faults, Project 3--Virtual Memory, Project 3--Virtual Memory +@section A Word about Design + +It is important for you to note that in addition to getting virtual +memory working, this assignment is also meant to be an open-ended +design problem. We will expect you to come up with a design that +makes sense. You will have the freedom to choose how to do software +translation on TLB misses, how to represent the swap partition, how to +implement paging, etc. In each case, we will expect you to provide a +defensible justification in your design documentation as to why your +choices are reasonable. You should evaluate your design on all the +available criteria: speed of handling a page fault, space overhead in +memory, minimizing the number of page faults, simplicity, etc. + +In keeping with this, you will find that we are going to say as little +as possible about how to do things. Instead we will focus on what end +functionality we require your OS to support. + +@node Page Faults, Disk as Backing Store, VM Design, Project 3--Virtual Memory +@section Page Faults + +For the last assignment, whenever a context switch occurred, the new +process would install its own page table into the machine. The page +table contained all the virtual-to-physical translations for the +process. Whenever the processor needed to look up a translation, it +consulted the page table. As long as the process only accessed +memory that it didn't own, all was well. If the process accessed +memory it didn't own, it ``page faulted'' and @code{page_fault()} +terminated the process. + +When we implement virtual memory, the rules have to change. A page +fault is no longer necessarily an error, since it might only indicate +that the page must be brought in from a disk file or from swap. You +will have to implement a more sophisticated page fault handler to +handle these cases. + +On the 80@var{x}86, the page table format is fixed by hardware. The +top-level data structure is a 4 kB page called the ``page directory'' +(PD) arranged as an array of 1,024 32-bit page directory entries +(PDEs), each of which represents 4 MB of virtual memory. Each PDE may +point to the physical address of another 4 kB page called a ``page +table'' (PT) arranged in the same fashion as an array of 1,024 32-bit +page table entries (PTEs), each of which translates a single 4 kB +virtual page into physical memory. + +Thus, translation of a virtual address into a physical address follows +the three-step process illustrated in the diagram +below:@footnote{Actually, virtual to physical translation on the +80@var{x}86 architecture happens via an intermediate ``linear +address,'' but Pintos (and most other 80@var{x}86 OSes) set up the CPU +so that linear and virtual addresses are one and the same, so that you +can effectively ignore this CPU feature.} + +@enumerate 1 +@item +The top 10 bits of the virtual address (bits 22:31) are used to index +into the page directory. If the PDE is marked ``present,'' the +physical address of a page table is read from the PDE thus obtained. +If the PDE is marked ``not present'' then a page fault occurs. + +@item +The next 10 bits of the virtual address (bits 12:21) are used to index +into the page table. If the PTE is marked ``present,'' the physical +address of a data page is read from the PTE thus obtained. If the PTE +is marked ``not present'' then a page fault occurs. + + +@item +The bottom 12 bits of the virtual address (bits 0:11) are added to the +data page's physical base address, producing the final physical +address. +@end enumerate + +@example +32 22 12 0 ++--------------------------------------------------------------------+ +| Page Directory Index | Page Table Index | Page Offset | ++--------------------------------------------------------------------+ + | | | + _______/ _______/ _____/ + / / / + / Page Directory / Page Table / Data Page + / .____________. / .____________. / .____________. + |1,023|____________| |1,023|____________| | |____________| + |1,022|____________| |1,022|____________| | |____________| + |1,021|____________| |1,021|____________| \__\|____________| + |1,020|____________| |1,020|____________| /|____________| + | | | | | | | | + | | | \____\| |_ | | + | | . | /| . | \ | . | + \____\| . |_ | . | | | . | + /| . | \ | . | | | . | + | . | | | . | | | . | + | | | | | | | | + |____________| | |____________| | |____________| + 4|____________| | 4|____________| | |____________| + 3|____________| | 3|____________| | |____________| + 2|____________| | 2|____________| | |____________| + 1|____________| | 1|____________| | |____________| + 0|____________| \__\0|____________| \____\|____________| + / / +@end example + + +FIXME need to explain virtual and physical memory layout - probably +back in userprog project + +FIXME need to mention that there are many possible implementations and +that the above is just an outline + +@node Disk as Backing Store, Memory Mapped Files, Page Faults, Project 3--Virtual Memory +@section Disk as Backing Store + +In VM systems, since memory is less plentiful than disk, you will +effectively use memory as a cache for disk. Looking at it from +another angle, you will use disk as a backing store for memory. This +provides the abstraction of an (almost) unlimited virtual memory size. +Part of your task in this project is to do this, with the additional +constraint that your performance should be close to that provided by +physical memory. You will use the page tables' ``dirty'' bits to +denote whether pages need to be written back to disk when they're +evicted from main memory and the ``accessed'' bit for page replacement +algorithms. Whenever the hardware writes memory, it sets the dirty +bit, and if it reads or writes to the page, it sets the accessed bit. + +As with any caching system, performance depends on the policy used to +decide which things are kept in memory and which are only stored on +disk. On a page fault, the kernel must decide which page to replace. +Ideally, it will throw out a page that will not be referenced for a +long time, keeping in memory those pages that are soon to be +referenced. Another consideration is that if the replaced page has +been modified, the page must be first saved to disk before the needed +page can be brought in. Many virtual memory systems avoid this extra +overhead by writing modified pages to disk in advance, so that later +page faults can be completed more quickly. + +@node Memory Mapped Files, Stack, Disk as Backing Store, Project 3--Virtual Memory +@section Memory Mapped Files + +The traditional way to access the file system is via @code{read} and +@code{write} system calls, but that requires an extra level of copying +between the kernel and the user level. A secondary interface is +simply to ``map'' the file into the virtual address space. The +program can then use load and store instructions directly on the file +data. (An alternative way of viewing the file system is as ``durable +memory.'' Files just store data structures. If you access data +structures in memory using load and store instructions, why not access +data structures in files the same way?) + +Memory mapped files are typically implemented using system calls. One +system call maps the file to a particular part of the address space. +For example, one might map the file @file{foo}, which is 1000 bytes +long, starting at address 5000. Assuming that nothing else is already +at virtual addresses 5000@dots{}6000, any memory accesses to these +locations will access the corresponding bytes of @file{foo}. + +A consequence of memory mapped files is that address spaces are +sparsely populated with lots of segments, one for each memory mapped +file (plus one each for code, data, and stack). You will implement +memory mapped files for problem 3 of this assignment, but you should +design your solutions to problems 1 and 2 to account for this. + +@node Stack, Problem 3-1 Page Table Management, Memory Mapped Files, Project 3--Virtual Memory +@section Stack + +In project 2, the stack was a single page at the top of the user +virtual address space. The stack's location does not change in this +project, but your kernel should allocate additional pages to the stack +on demand. That is, if the stack grows past its current bottom, the +system should allocate additional pages for the stack as necessary, +unless those pages are unavailable because they are in use by another +segment, in which case some sort of fault should occur. + +@node Problem 3-1 Page Table Management, Problem 3-2 Paging To and From Disk, Stack, Project 3--Virtual Memory +@section Problem 3-1: Page Table Management + +Implement page directory and page table management to support virtual +memory. You will need data structures to accomplish the following +tasks: + +@itemize @bullet +@item +Some way of translating in software from virtual page frames to +physical page frames (consider using a hash table---note +that we provide one in @file{lib/kernel}). + +@item +Some way of translating from physical page frames back to virtual +page frames, so that when you replace a page, you can invalidate +its translation(s). + +@item +Some way of finding a page on disk if it is not in memory. You won't +need this data structure until part 2, but planning ahead is a good +idea. +@end itemize + +You need to do the roughly the following to handle a page fault: + +@enumerate 1 +@item +Determine the location of the physical page backing the virtual +address that faulted. It might be in the file system, in swap, +already be in physical memory and just not set up in the page table, +or it might be an invalid virtual address. + +If the virtual address is invalid, that is, if there's no physical +page backing it, or if the virtual address is above @code{PHYS_BASE}, +meaning that it belongs to the kernel instead of the user, then the +process's memory access must be disallowed. You should terminate the +process at this point, being sure to free all of its resources. + +@item +If the physical page is not in physical memory, bring it into memory. +If necessary to make room, first evict some other page from memory. +(When you do that you need to first remove references to the page from +any page table that refers to it.) + +@item +Each user process's @code{struct thread} has a @samp{pagedir} member +that points to its own per-process page directory. Read the PDE for +the faulting virtual address. + +@item +If the PDE is marked ``not present'' then allocate a new page table +page and initialize the PDE to point to the new page table. As when +you allocated a data page, you might have to first evict some other +page from memory. + +@item +Follow the PDE to the page table. Point the PTE for the faulting +virtual address to the physical page found in step 2. +@end enumerate + +You'll need to modify the ELF loader in @file{userprog/addrspace.c} to +do page table management according to your new design. As supplied, +it reads all the process's pages from disk and initializes the page +tables for them at the same time. For testing purposes, you'll +probably want to leave the code that reads the pages from disk, but +use your new page table management code to construct the page tables +only as page faults occur for them. + +@node Problem 3-2 Paging To and From Disk, Problem 3-3 Memory Mapped Files, Problem 3-1 Page Table Management, Project 3--Virtual Memory +@section Problem 3-2: Paging To and From Disk + +Implement paging to and from disk. + +You will need routines to move a page from memory to disk and from +disk to memory. You may use the Pintos file system for swap space, or +you may use the disk on interface @code{hd1:1}, which is otherwise +unused. A swap disk can theoretically be faster than using the file +system, because it avoid file system overhead and because the swap +disk and file system disk will be on separate hard disk controllers. +You will definitely need to be able to retrieve pages from files in +any case, so to avoid special cases it may be easier to use a file for +swap. You will still be using the basic file system provided with +Pintos. If you do everything correctly, your VM should still work +when you implement your own file system for the next assignment. + +You will need a way to track pages which are used by a process but +which are not in physical memory, to fully handle page faults. Pages +that you store on disk should not be constrained to be in sequential +order, and consequently your swap file (or swap disk) should not +require unused empty space. You will also need a way to track all of +the physical memory pages, in order to find an unused one when needed, +or to evict a page when memory is needed but no empty pages are +available. The data structures that you designed in part 1 should do +most of the work for you. + +You will need a page replacement algorithm. The hardware sets the +accessed and dirty bits when it accesses memory. Therefore, you +should be able to take advantage of this information to implement some +algorithm which attempts to achieve LRU-type behavior. We expect that +your algorithm perform at least as well as a reasonable implementation +of the second-chance (clock) algorithm. You will need to show in your +test cases the value of your page replacement algorithm by +demonstrating for some workload that it pages less frequently using +your algorithm than using some inferior page replacement policy. The +canonical example of a poor page replacement policy is random +replacement. + +Since you will already be paging from disk, you should implement a +``lazy'' loading scheme for new processes. When a process is created, +it will not run immediately. Therefore, it doesn't make sense to load +all its code, data, and stack into memory when the process is created, +since it might incur additional disk accesses to do so (if it gets +paged out before it runs). When loading a new process, you should +leave most pages on disk, and bring them in as demanded when the +program begins running. Your VM system should also use the executable +file itself as backing store for read-only segments, since these +segments won't change. + +There are a few special cases. Look at the loop in +@code{load_segment()} in @file{userprog/addrspace.c}. Each time +around the loop, @code{read_bytes} represents the number of bytes to +read from the executable file and @code{zero_bytes} represents the number +of bytes to initialize to zero following the bytes read. The two +always sum to @code{PGSIZE}. The page handling depends on these +variables' values: + +@itemize @bullet +@item +If @code{read_bytes} equals @code{PGSIZE}, the page should be demand +paged from disk on its first access. + +@item +If @code{zero_bytes} equals @code{PGSIZE}, the page does not need to +be read from disk at all because it is all zeroes. You should handle +such pages by creating a new page consisting of all zeroes at the +first page fault. + +@item +If neither @code{read_bytes} nor @code{zero_bytes} equals +@code{PGSIZE}, then part of the page is to be read from disk and the +remainder zeroed. This is a special case, which you should handle by +reading the partial page from disk at executable load time and zeroing +the rest of the page. It is the only case in which loading should not +be ``lazy''; even real OSes such as Linux do not load partial pages +lazily. +@end itemize + +FIXME mention that you can test with these special cases eliminated + +You may optionally implement sharing: when multiple processes are +created that use the same executable file, share read-only pages among +those processes instead of creating separate copies of read-only +segments for each process. If you carefully designed your data +structures in part 1, sharing of read-only pages should not make this +part significantly harder. + +@node Problem 3-3 Memory Mapped Files, Virtual Memory FAQ, Problem 3-2 Paging To and From Disk, Project 3--Virtual Memory +@section Problem 3-3: Memory Mapped Files + +Implement memory mapped files. + +You will need to implement the following system calls: + +@table @asis +@item SYS_mmap +@itemx bool mmap (int @var{fd}, void *@var{addr}, unsigned @var{length}) + +Maps the file open as @var{fd} into the process's address space +starting at @var{addr} for @var{length} bytes. Returns true if +successful, false on failure. + +@item SYS_munmap +@itemx bool munmap (void *addr, unsigned length) + +Unmaps the segment specified by id. This cannot be used to unmap +segments mapped by the executable loader. Returns 0 on success, -1 on +failure. When a file is unmapped, all outstanding changes are written +to the file, and the segment's pages are removed from the process's +list of used virtual pages. +@end table + +Calls to @code{mmap} must fail if the address is not page-aligned, if +the length is not positive and a multiple of @var{PGSIZE}. You also +must error check to make sure that the new segment does not overlap +already existing segments, and fail if it isn't. If the length passed +to @code{mmap} is less than the file's length, you should only map the +first part of the file. If the length passed to @code{mmap} is longer +than the file, the file should grow to the requested length. Similar +to the code segment, your VM system should be able to use the +@code{mmap}'d file itself as backing store for the mmap segment, since +the changes to the @code{mmap} segment will eventually be written to +the file. (In fact, you may choose to implement executable mappings +as a special case of file mappings.) + +@node Virtual Memory FAQ, , Problem 3-3 Memory Mapped Files, Project 3--Virtual Memory +@section FAQ + +@enumerate 1 +@item +@b{Do we need a working HW 2 to implement HW 3?} + +Yes. + +@item +@b{How do I use the hash table provided in @file{lib/hash.c}?} + +FIXME + +There are two things you need to use this hashtable: + +1. You need to decide on a key type. The key should be something +that is unique for each object as inserting two objects with +the same key will cause the second to overwrite the first. +(The keys are compared with ==, so you should stick to +integers and pointers unless you know how to do operator +overloading.) You also need to write a hash function that +converts key values to integers, which you will pass into the +hash table constructor. + +2. Your key needs to be a field of your object type, and you +will need to supply a 'get' function that given an object +returns the key. + +Here's a quick example of how to construct a hash table. In +this table the keys are Thread pointers and the objects are +integers (you will be using different key/value pairs I'm +sure). In addition, this hash function is pretty puny. You +should probably use a better one. + +@example +FIXME +@end example + +and to construct the hash table: + +HashTable *htable; + +htable = new HashTable(ExtractKeyFromHashObject, + MyKeyToHashValue); + +If you have any other questions about hash tables, the CS109 +and CS161 textbooks have good chapters on them, or you can come +to any of the TA's office hours for further clarification. + +@item +@b{The current implementation of the hash table does not do something +that we need it to do. What gives?} + +You are welcome to modify it. It is not used by any of the code we +provided, so modifying it won't affect any code but yours. Do +whatever it takes to make it work like you want it to. + +@item +@b{Is the data segment page-aligned?} + +No. + +@item +@b{What controls the layout of user programs?} + +The linker is responsible for the layout of a user program in +memory. The linker is directed by a ``linker script'' which tells it +the names and locations of the various program segments. The +test/script and testvm/script files are the linker scripts for the +multiprogramming and virtual memory assignments respectively. You can +learn more about linker scripts by reading the ``Scripts'' chapter in +the linker manual, accessible via @samp{info ld}. + +@item Page Table Management FAQs +@enumerate 1 +@item +@b{How do we manage allocation of pages used for page tables?} + +You can use any reasonable algorithm to do so. However, you should +make sure that memory used for page tables doesn't grow so much that +it encroaches deeply on the memory used for data pages. + +Here is one reasonable algorithm. At OS boot time, reserve some fixed +number of pages for page tables. Then, each time a new page table +page is needed, select one of these pages in ``round robin'' fashion. +If the page in use, clean up any pointers to it. Then use it for the +new page table page. + +@item +@b{Our code handles the PageFault exceptions. However, the number of +page faults handled does not show up in the final stats output. Is +there a counter that we must increment to correct this problem?} + +FIXME + +Yes, you'll need to update kernel->stats->numPageFaults when +you handle a page fault in your code. +@end enumerate + +@item Paging FAQs + +@enumerate 1 +@item +@b{Can we assume (and enforce) that the user's stack will +never increase beyond one page?} + +No. This value was useful for project 2, but for this assignment, you +need to implement an extensible stack segment. + +@item +@b{Does the virtual memory system need to support growth of the data +segment?} + +No. The size of the data segment is determined by the linker. We +still have no dynamic allocation in Pintos (although it is possible to +``fake'' it at the user level by using memory-mapped files). +Implementing @code{sbrk()} has been an extra-credit assignment in +previous years, but adds little additional complexity to a +well-designed system. + +@item +@b{Does the virtual memory system need to support growth of the stack +segment?} + +Yes. If a page fault appears just below the last stack segment page, +you must add a new page to the bottom of the stack. It is impossible +to predict how large the stack will grow at compile time, so we must +allocate pages as necessary. You should only allocate additional pages +if they ``appear'' to be stack accesses. + +@item +@b{But what do you mean by ``appear'' to be stack accesses? How big can a +stack growth be? Under what circumstances do we grow the stack?} + +If it looks like a stack request, then you grow the stack. Yes, that's +ambiguous. You need to make a reasonable decision about what looks +like a stack request. For example, you could decide a page, or two +pages, or ten pages, or more@enddots{} Or, you could use some other +heuristic to figure this out. + +Make a reasonable decision and document it in your code and in +your design document. Please make sure to justify your decision. + +@item +@b{How big should the file(s) we're using as a backing store for memory +be?} + +These files will need to be able to grow based on the number of pages +you're committed to storing on behalf of the processes currently in +memory. They should be able to grow to the full size of the disk. +@end enumerate + +@item Memory Mapped File FAQs + +@enumerate 1 +@item +@b{How do we interact with memory-mapped files?} + +Let's say you want to map a file called @file{foo} into your address +space at address @t{0x10000000}. You open the file, determine its +length, and then use Mmap: + +@example +#include +#include + +int main (void) +{ + void *addr = (void *) 0x10000000; + int fd = open ("foo"); + int length = filesize (fd); + if (mmap (fd, addr, length)) + printf ("success!\n"); +} +@end example + +Suppose @file{foo} is a text file and you want to print the first 64 +bytes on the screen (assuming, of course, that the length of the file +is at least 64). Without @code{mmap}, you'd need to allocate a +buffer, use @code{read} to get the data from the file into the buffer, +and finally use @code{write} to put the buffer out to the display. But +with the file mapped into your address space, you can directly address +it like so: + +@example +write (addr, 64, STDOUT_FILENO); +@end example + +Similarly, if you wanted to replace the first byte of the file, +all you need to do is: + +@example +addr[0] = 'b'; +@end example + +When you're done using the memory-mapped file, you simply unmap +it: + +@example +munmap (addr); +@end example + +@item +@b{What if two processes memory-map the same file?} + +There is no requirement in Pintos that the two processes see +consistent data. Unix handles this by making the processes share the +same physical page, but the @code{mmap} system call also has an +argument allowing the client to specify whether the page is shared or +private (i.e.@: copy-on-write). + +@item +@b{What happens if a user removes a @code{mmap}'d file?} + +@item +You should follow the Unix convention and the mapping should still be +valid. This is similar to the question in the User Programs FAQ about +a process with a file descriptor to a file that has been removed. + +@item +@b{What if a process writes to a page that is memory-mapped, but the +location written to in the memory-mapped page is past the end +of the memory-mapped file?} + +Can't happen. @code{mmap} extends the file to the requested length, +and Pintos provides no way to shorten a file. You can remove a file, +but the mapping remains valid (see the previous question). + +@item +@b{Do we have to handle memory mapping @code{stdin} or @code{stdout}?} + +No. Memory mapping implies that a file has a length and that a user +can seek to any location in the file. Since the console device has +neither of these properties, @code{mmap} should return false when the +user attempts to memory map a file descriptor for the console device. + +@item +@b{What happens when a process exits with mmap'd files?} + +When a process finishes each of its @code{mmap}'d files is implicitly +unmapped. When a process @code{mmap}s a file and then writes into the +area for the file it is making the assumption the changes will be +written to the file. + +@item +@b{If a user closes a mmaped file, should be automatically unmap it +for them?} + +No, once created the mapping is valid until @code{munmap} is called +or the process exits. +@end enumerate +@end enumerate