1 @node Project 3--Virtual Memory, Project 4--File Systems, Project 2--User Programs, Top
2 @chapter Project 3: Virtual Memory
4 By now you should be familiar with the inner workings of Pintos.
5 You've already come a long way: your OS can properly handle multiple
6 threads of execution with proper synchronization, and can load
7 multiple user programs at once. However, when loading user programs,
8 your OS is limited by how much main memory the simulated machine has.
9 In this assignment, you will remove that limitation.
11 You will be using the @file{vm} directory for this project. There is
12 no new code to get acquainted with for this assignment. The @file{vm}
13 directory contains only the @file{Makefile}s. The only change from
14 @file{userprog} is that this new @file{Makefile} turns on the setting
15 @option{-DVM}, which you will need for this assignment. All code you
16 write will either be newly generated files (e.g.@: if you choose to
17 implement your paging code in their own source files), or will be
18 modifications to pre-existing code (e.g.@: you will change the
19 behavior of @file{addrspace.c} significantly).
21 You will be building this assignment on the last one. It will benefit
22 you to get your project 2 in good working order before this assignment
23 so those bugs don't keep haunting you.
28 * Disk as Backing Store::
29 * Memory Mapped Files::
31 * Problem 3-1 Page Table Management::
32 * Problem 3-2 Paging To and From Disk::
33 * Problem 3-3 Memory Mapped Files::
34 * Virtual Memory FAQ::
38 @section A Word about Design
40 It is important for you to note that in addition to getting virtual
41 memory working, this assignment is also meant to be an open-ended
42 design problem. We will expect you to come up with a design that
43 makes sense. You will have the freedom to choose how to do software
44 translation on TLB misses, how to represent the swap partition, how to
45 implement paging, etc. In each case, we will expect you to provide a
46 defensible justification in your design documentation as to why your
47 choices are reasonable. You should evaluate your design on all the
48 available criteria: speed of handling a page fault, space overhead in
49 memory, minimizing the number of page faults, simplicity, etc.
51 In keeping with this, you will find that we are going to say as little
52 as possible about how to do things. Instead we will focus on what end
53 functionality we require your OS to support.
58 For the last assignment, whenever a context switch occurred, the new
59 process would install its own page table into the machine. The page
60 table contained all the virtual-to-physical translations for the
61 process. Whenever the processor needed to look up a translation, it
62 consulted the page table. As long as the process only accessed
63 memory that it didn't own, all was well. If the process accessed
64 memory it didn't own, it ``page faulted'' and @code{page_fault()}
65 terminated the process.
67 When we implement virtual memory, the rules have to change. A page
68 fault is no longer necessarily an error, since it might only indicate
69 that the page must be brought in from a disk file or from swap. You
70 will have to implement a more sophisticated page fault handler to
73 On the 80@var{x}86, the page table format is fixed by hardware. The
74 top-level data structure is a 4 kB page called the ``page directory''
75 (PD) arranged as an array of 1,024 32-bit page directory entries
76 (PDEs), each of which represents 4 MB of virtual memory. Each PDE may
77 point to the physical address of another 4 kB page called a ``page
78 table'' (PT) arranged in the same fashion as an array of 1,024 32-bit
79 page table entries (PTEs), each of which translates a single 4 kB
80 virtual page into physical memory.
82 Thus, translation of a virtual address into a physical address follows
83 the three-step process illustrated in the diagram
84 below:@footnote{Actually, virtual to physical translation on the
85 80@var{x}86 architecture happens via an intermediate ``linear
86 address,'' but Pintos (and most other 80@var{x}86 OSes) set up the CPU
87 so that linear and virtual addresses are one and the same, so that you
88 can effectively ignore this CPU feature.}
92 The top 10 bits of the virtual address (bits 22:31) are used to index
93 into the page directory. If the PDE is marked ``present,'' the
94 physical address of a page table is read from the PDE thus obtained.
95 If the PDE is marked ``not present'' then a page fault occurs.
98 The next 10 bits of the virtual address (bits 12:21) are used to index
99 into the page table. If the PTE is marked ``present,'' the physical
100 address of a data page is read from the PTE thus obtained. If the PTE
101 is marked ``not present'' then a page fault occurs.
105 The bottom 12 bits of the virtual address (bits 0:11) are added to the
106 data page's physical base address, producing the final physical
112 +--------------------------------------------------------------------+
113 | Page Directory Index | Page Table Index | Page Offset |
114 +--------------------------------------------------------------------+
116 _______/ _______/ _____/
118 / Page Directory / Page Table / Data Page
119 / .____________. / .____________. / .____________.
120 |1,023|____________| |1,023|____________| | |____________|
121 |1,022|____________| |1,022|____________| | |____________|
122 |1,021|____________| |1,021|____________| \__\|____________|
123 |1,020|____________| |1,020|____________| /|____________|
126 | | . | /| . | \ | . |
127 \____\| . |_ | . | | | . |
128 /| . | \ | . | | | . |
129 | . | | | . | | | . |
131 |____________| | |____________| | |____________|
132 4|____________| | 4|____________| | |____________|
133 3|____________| | 3|____________| | |____________|
134 2|____________| | 2|____________| | |____________|
135 1|____________| | 1|____________| | |____________|
136 0|____________| \__\0|____________| \____\|____________|
141 FIXME need to explain virtual and physical memory layout - probably
142 back in userprog project
144 FIXME need to mention that there are many possible implementations and
145 that the above is just an outline
147 @node Disk as Backing Store
148 @section Disk as Backing Store
150 In VM systems, since memory is less plentiful than disk, you will
151 effectively use memory as a cache for disk. Looking at it from
152 another angle, you will use disk as a backing store for memory. This
153 provides the abstraction of an (almost) unlimited virtual memory size.
154 Part of your task in this project is to do this, with the additional
155 constraint that your performance should be close to that provided by
156 physical memory. You will use the page tables' ``dirty'' bits to
157 denote whether pages need to be written back to disk when they're
158 evicted from main memory and the ``accessed'' bit for page replacement
159 algorithms. Whenever the hardware writes memory, it sets the dirty
160 bit, and if it reads or writes to the page, it sets the accessed bit.
162 As with any caching system, performance depends on the policy used to
163 decide which things are kept in memory and which are only stored on
164 disk. On a page fault, the kernel must decide which page to replace.
165 Ideally, it will throw out a page that will not be referenced for a
166 long time, keeping in memory those pages that are soon to be
167 referenced. Another consideration is that if the replaced page has
168 been modified, the page must be first saved to disk before the needed
169 page can be brought in. Many virtual memory systems avoid this extra
170 overhead by writing modified pages to disk in advance, so that later
171 page faults can be completed more quickly.
173 @node Memory Mapped Files
174 @section Memory Mapped Files
176 The traditional way to access the file system is via @code{read} and
177 @code{write} system calls, but that requires an extra level of copying
178 between the kernel and the user level. A secondary interface is
179 simply to ``map'' the file into the virtual address space. The
180 program can then use load and store instructions directly on the file
181 data. (An alternative way of viewing the file system is as ``durable
182 memory.'' Files just store data structures. If you access data
183 structures in memory using load and store instructions, why not access
184 data structures in files the same way?)
186 Memory mapped files are typically implemented using system calls. One
187 system call maps the file to a particular part of the address space.
188 For example, one might map the file @file{foo}, which is 1000 bytes
189 long, starting at address 5000. Assuming that nothing else is already
190 at virtual addresses 5000@dots{}6000, any memory accesses to these
191 locations will access the corresponding bytes of @file{foo}.
193 A consequence of memory mapped files is that address spaces are
194 sparsely populated with lots of segments, one for each memory mapped
195 file (plus one each for code, data, and stack). You will implement
196 memory mapped files for problem 3 of this assignment, but you should
197 design your solutions to problems 1 and 2 to account for this.
202 In project 2, the stack was a single page at the top of the user
203 virtual address space. The stack's location does not change in this
204 project, but your kernel should allocate additional pages to the stack
205 on demand. That is, if the stack grows past its current bottom, the
206 system should allocate additional pages for the stack as necessary,
207 unless those pages are unavailable because they are in use by another
208 segment, in which case some sort of fault should occur.
210 @node Problem 3-1 Page Table Management
211 @section Problem 3-1: Page Table Management
213 Implement page directory and page table management to support virtual
214 memory. You will need data structures to accomplish the following
219 Some way of translating in software from virtual page frames to
220 physical page frames (consider using a hash table---note
221 that we provide one in @file{lib/kernel}).
224 Some way of translating from physical page frames back to virtual
225 page frames, so that when you replace a page, you can invalidate
229 Some way of finding a page on disk if it is not in memory. You won't
230 need this data structure until part 2, but planning ahead is a good
234 You need to do the roughly the following to handle a page fault:
238 Determine the location of the physical page backing the virtual
239 address that faulted. It might be in the file system, in swap,
240 already be in physical memory and just not set up in the page table,
241 or it might be an invalid virtual address.
243 If the virtual address is invalid, that is, if there's no physical
244 page backing it, or if the virtual address is above @code{PHYS_BASE},
245 meaning that it belongs to the kernel instead of the user, then the
246 process's memory access must be disallowed. You should terminate the
247 process at this point, being sure to free all of its resources.
250 If the physical page is not in physical memory, bring it into memory.
251 If necessary to make room, first evict some other page from memory.
252 (When you do that you need to first remove references to the page from
253 any page table that refers to it.)
256 Each user process's @code{struct thread} has a @samp{pagedir} member
257 that points to its own per-process page directory. Read the PDE for
258 the faulting virtual address.
261 If the PDE is marked ``not present'' then allocate a new page table
262 page and initialize the PDE to point to the new page table. As when
263 you allocated a data page, you might have to first evict some other
267 Follow the PDE to the page table. Point the PTE for the faulting
268 virtual address to the physical page found in step 2.
271 You'll need to modify the ELF loader in @file{userprog/addrspace.c} to
272 do page table management according to your new design. As supplied,
273 it reads all the process's pages from disk and initializes the page
274 tables for them at the same time. For testing purposes, you'll
275 probably want to leave the code that reads the pages from disk, but
276 use your new page table management code to construct the page tables
277 only as page faults occur for them.
279 @node Problem 3-2 Paging To and From Disk
280 @section Problem 3-2: Paging To and From Disk
282 Implement paging to and from disk.
284 You will need routines to move a page from memory to disk and from
285 disk to memory. You may use the Pintos file system for swap space, or
286 you may use the disk on interface @code{hd1:1}, which is otherwise
287 unused. A swap disk can theoretically be faster than using the file
288 system, because it avoid file system overhead and because the swap
289 disk and file system disk will be on separate hard disk controllers.
290 You will definitely need to be able to retrieve pages from files in
291 any case, so to avoid special cases it may be easier to use a file for
292 swap. You will still be using the basic file system provided with
293 Pintos. If you do everything correctly, your VM should still work
294 when you implement your own file system for the next assignment.
296 You will need a way to track pages which are used by a process but
297 which are not in physical memory, to fully handle page faults. Pages
298 that you store on disk should not be constrained to be in sequential
299 order, and consequently your swap file (or swap disk) should not
300 require unused empty space. You will also need a way to track all of
301 the physical memory pages, in order to find an unused one when needed,
302 or to evict a page when memory is needed but no empty pages are
303 available. The data structures that you designed in part 1 should do
304 most of the work for you.
306 You will need a page replacement algorithm. The hardware sets the
307 accessed and dirty bits when it accesses memory. Therefore, you
308 should be able to take advantage of this information to implement some
309 algorithm which attempts to achieve LRU-type behavior. We expect that
310 your algorithm perform at least as well as a reasonable implementation
311 of the second-chance (clock) algorithm. You will need to show in your
312 test cases the value of your page replacement algorithm by
313 demonstrating for some workload that it pages less frequently using
314 your algorithm than using some inferior page replacement policy. The
315 canonical example of a poor page replacement policy is random
318 Since you will already be paging from disk, you should implement a
319 ``lazy'' loading scheme for new processes. When a process is created,
320 it will not run immediately. Therefore, it doesn't make sense to load
321 all its code, data, and stack into memory when the process is created,
322 since it might incur additional disk accesses to do so (if it gets
323 paged out before it runs). When loading a new process, you should
324 leave most pages on disk, and bring them in as demanded when the
325 program begins running. Your VM system should also use the executable
326 file itself as backing store for read-only segments, since these
327 segments won't change.
329 There are a few special cases. Look at the loop in
330 @code{load_segment()} in @file{userprog/addrspace.c}. Each time
331 around the loop, @code{read_bytes} represents the number of bytes to
332 read from the executable file and @code{zero_bytes} represents the number
333 of bytes to initialize to zero following the bytes read. The two
334 always sum to @code{PGSIZE}. The page handling depends on these
339 If @code{read_bytes} equals @code{PGSIZE}, the page should be demand
340 paged from disk on its first access.
343 If @code{zero_bytes} equals @code{PGSIZE}, the page does not need to
344 be read from disk at all because it is all zeroes. You should handle
345 such pages by creating a new page consisting of all zeroes at the
349 If neither @code{read_bytes} nor @code{zero_bytes} equals
350 @code{PGSIZE}, then part of the page is to be read from disk and the
351 remainder zeroed. This is a special case, which you should handle by
352 reading the partial page from disk at executable load time and zeroing
353 the rest of the page. It is the only case in which loading should not
354 be ``lazy''; even real OSes such as Linux do not load partial pages
358 FIXME mention that you can test with these special cases eliminated
360 You may optionally implement sharing: when multiple processes are
361 created that use the same executable file, share read-only pages among
362 those processes instead of creating separate copies of read-only
363 segments for each process. If you carefully designed your data
364 structures in part 1, sharing of read-only pages should not make this
365 part significantly harder.
367 @node Problem 3-3 Memory Mapped Files
368 @section Problem 3-3: Memory Mapped Files
370 Implement memory mapped files.
372 You will need to implement the following system calls:
376 @itemx bool mmap (int @var{fd}, void *@var{addr}, unsigned @var{length})
378 Maps the file open as @var{fd} into the process's address space
379 starting at @var{addr} for @var{length} bytes. Returns true if
380 successful, false on failure.
383 @itemx bool munmap (void *addr, unsigned length)
385 Unmaps the segment specified by id. This cannot be used to unmap
386 segments mapped by the executable loader. Returns 0 on success, -1 on
387 failure. When a file is unmapped, all outstanding changes are written
388 to the file, and the segment's pages are removed from the process's
389 list of used virtual pages.
392 Calls to @code{mmap} must fail if the address is not page-aligned, if
393 the length is not positive and a multiple of @var{PGSIZE}. You also
394 must error check to make sure that the new segment does not overlap
395 already existing segments, and fail if it isn't. If the length passed
396 to @code{mmap} is less than the file's length, you should only map the
397 first part of the file. If the length passed to @code{mmap} is longer
398 than the file, the file should grow to the requested length. Similar
399 to the code segment, your VM system should be able to use the
400 @code{mmap}'d file itself as backing store for the mmap segment, since
401 the changes to the @code{mmap} segment will eventually be written to
402 the file. (In fact, you may choose to implement executable mappings
403 as a special case of file mappings.)
405 @node Virtual Memory FAQ
410 @b{Do we need a working HW 2 to implement HW 3?}
415 @b{How do I use the hash table provided in @file{lib/hash.c}?}
419 There are two things you need to use this hashtable:
421 1. You need to decide on a key type. The key should be something
422 that is unique for each object as inserting two objects with
423 the same key will cause the second to overwrite the first.
424 (The keys are compared with ==, so you should stick to
425 integers and pointers unless you know how to do operator
426 overloading.) You also need to write a hash function that
427 converts key values to integers, which you will pass into the
428 hash table constructor.
430 2. Your key needs to be a field of your object type, and you
431 will need to supply a 'get' function that given an object
434 Here's a quick example of how to construct a hash table. In
435 this table the keys are Thread pointers and the objects are
436 integers (you will be using different key/value pairs I'm
437 sure). In addition, this hash function is pretty puny. You
438 should probably use a better one.
444 and to construct the hash table:
446 HashTable<Thread *, HashObject *> *htable;
448 htable = new HashTable<Thread *, HashObject *>(ExtractKeyFromHashObject,
451 If you have any other questions about hash tables, the CS109
452 and CS161 textbooks have good chapters on them, or you can come
453 to any of the TA's office hours for further clarification.
456 @b{The current implementation of the hash table does not do something
457 that we need it to do. What gives?}
459 You are welcome to modify it. It is not used by any of the code we
460 provided, so modifying it won't affect any code but yours. Do
461 whatever it takes to make it work like you want it to.
464 @b{Is the data segment page-aligned?}
469 @b{What controls the layout of user programs?}
471 The linker is responsible for the layout of a user program in
472 memory. The linker is directed by a ``linker script'' which tells it
473 the names and locations of the various program segments. The
474 test/script and testvm/script files are the linker scripts for the
475 multiprogramming and virtual memory assignments respectively. You can
476 learn more about linker scripts by reading the ``Scripts'' chapter in
477 the linker manual, accessible via @samp{info ld}.
479 @item Page Table Management FAQs
482 @b{How do we manage allocation of pages used for page tables?}
484 You can use any reasonable algorithm to do so. However, you should
485 make sure that memory used for page tables doesn't grow so much that
486 it encroaches deeply on the memory used for data pages.
488 Here is one reasonable algorithm. At OS boot time, reserve some fixed
489 number of pages for page tables. Then, each time a new page table
490 page is needed, select one of these pages in ``round robin'' fashion.
491 If the page in use, clean up any pointers to it. Then use it for the
495 @b{Our code handles the PageFault exceptions. However, the number of
496 page faults handled does not show up in the final stats output. Is
497 there a counter that we must increment to correct this problem?}
501 Yes, you'll need to update kernel->stats->numPageFaults when
502 you handle a page fault in your code.
509 @b{Can we assume (and enforce) that the user's stack will
510 never increase beyond one page?}
512 No. This value was useful for project 2, but for this assignment, you
513 need to implement an extensible stack segment.
516 @b{Does the virtual memory system need to support growth of the data
519 No. The size of the data segment is determined by the linker. We
520 still have no dynamic allocation in Pintos (although it is possible to
521 ``fake'' it at the user level by using memory-mapped files).
522 Implementing @code{sbrk()} has been an extra-credit assignment in
523 previous years, but adds little additional complexity to a
524 well-designed system.
527 @b{Does the virtual memory system need to support growth of the stack
530 Yes. If a page fault appears just below the last stack segment page,
531 you must add a new page to the bottom of the stack. It is impossible
532 to predict how large the stack will grow at compile time, so we must
533 allocate pages as necessary. You should only allocate additional pages
534 if they ``appear'' to be stack accesses.
537 @b{But what do you mean by ``appear'' to be stack accesses? How big can a
538 stack growth be? Under what circumstances do we grow the stack?}
540 If it looks like a stack request, then you grow the stack. Yes, that's
541 ambiguous. You need to make a reasonable decision about what looks
542 like a stack request. For example, you could decide a page, or two
543 pages, or ten pages, or more@enddots{} Or, you could use some other
544 heuristic to figure this out.
546 Make a reasonable decision and document it in your code and in
547 your design document. Please make sure to justify your decision.
550 @b{How big should the file(s) we're using as a backing store for memory
553 These files will need to be able to grow based on the number of pages
554 you're committed to storing on behalf of the processes currently in
555 memory. They should be able to grow to the full size of the disk.
558 @item Memory Mapped File FAQs
562 @b{How do we interact with memory-mapped files?}
564 Let's say you want to map a file called @file{foo} into your address
565 space at address @t{0x10000000}. You open the file, determine its
566 length, and then use Mmap:
574 void *addr = (void *) 0x10000000;
575 int fd = open ("foo");
576 int length = filesize (fd);
577 if (mmap (fd, addr, length))
578 printf ("success!\n");
582 Suppose @file{foo} is a text file and you want to print the first 64
583 bytes on the screen (assuming, of course, that the length of the file
584 is at least 64). Without @code{mmap}, you'd need to allocate a
585 buffer, use @code{read} to get the data from the file into the buffer,
586 and finally use @code{write} to put the buffer out to the display. But
587 with the file mapped into your address space, you can directly address
591 write (addr, 64, STDOUT_FILENO);
594 Similarly, if you wanted to replace the first byte of the file,
595 all you need to do is:
601 When you're done using the memory-mapped file, you simply unmap
609 @b{What if two processes memory-map the same file?}
611 There is no requirement in Pintos that the two processes see
612 consistent data. Unix handles this by making the processes share the
613 same physical page, but the @code{mmap} system call also has an
614 argument allowing the client to specify whether the page is shared or
615 private (i.e.@: copy-on-write).
618 @b{What happens if a user removes a @code{mmap}'d file?}
621 You should follow the Unix convention and the mapping should still be
622 valid. This is similar to the question in the User Programs FAQ about
623 a process with a file descriptor to a file that has been removed.
626 @b{What if a process writes to a page that is memory-mapped, but the
627 location written to in the memory-mapped page is past the end
628 of the memory-mapped file?}
630 Can't happen. @code{mmap} extends the file to the requested length,
631 and Pintos provides no way to shorten a file. You can remove a file,
632 but the mapping remains valid (see the previous question).
635 @b{Do we have to handle memory mapping @code{stdin} or @code{stdout}?}
637 No. Memory mapping implies that a file has a length and that a user
638 can seek to any location in the file. Since the console device has
639 neither of these properties, @code{mmap} should return false when the
640 user attempts to memory map a file descriptor for the console device.
643 @b{What happens when a process exits with mmap'd files?}
645 When a process finishes each of its @code{mmap}'d files is implicitly
646 unmapped. When a process @code{mmap}s a file and then writes into the
647 area for the file it is making the assumption the changes will be
651 @b{If a user closes a mmaped file, should be automatically unmap it
654 No, once created the mapping is valid until @code{munmap} is called
655 or the process exits.