-@node Project 3--Virtual Memory, Project 4--File Systems, Project 2--User Programs, Top
+@node Project 3--Virtual Memory
@chapter Project 3: Virtual Memory
-By now you should be familiar with the inner workings of Pintos.
-You've already come a long way: your OS can properly handle multiple
-threads of execution with proper synchronization, and can load
-multiple user programs at once. However, when loading user programs,
-your OS is limited by how much main memory the simulated machine has.
-In this assignment, you will remove that limitation.
+By now you should have some familiarity with the inner workings of
+Pintos. Your
+OS can properly handle multiple threads of execution with proper
+synchronization, and can load multiple user programs at once. However,
+the number and size of programs that can run is limited by the machine's
+main memory size. In this assignment, you will remove that limitation.
-You will be using the @file{vm} directory for this project. The
-@file{vm} directory contains only the @file{Makefile}s. The only
+You will build this assignment on top of the last one. Test programs
+from project 2 should also work with project 3. You should take care to
+fix any bugs in your project 2 submission before you start work on
+project 3, because those bugs will most likely cause the same problems
+in project 3.
+
+You will continue to handle Pintos disks and file systems the same way
+you did in the previous assignment (@pxref{Using the File System}).
+
+@menu
+* Project 3 Background::
+* Project 3 Suggested Order of Implementation::
+* Project 3 Requirements::
+* Project 3 FAQ::
+@end menu
+
+@node Project 3 Background
+@section Background
+
+@menu
+* Project 3 Source Files::
+* Memory Terminology::
+* Resource Management Overview::
+* Managing the Supplemental Page Table::
+* Managing the Frame Table::
+* Managing the Swap Table::
+* Managing Memory Mapped Files Back::
+@end menu
+
+@node Project 3 Source Files
+@subsection Source Files
+
+You will work in the @file{vm} directory for this project. The
+@file{vm} directory contains only @file{Makefile}s. The only
change from @file{userprog} is that this new @file{Makefile} turns on
-the setting @option{-DVM}. All code you write will either be newly
-generated files (e.g.@: if you choose to implement your paging code in
-their own source files), or will be modifications to pre-existing code
-(e.g.@: you will change the behavior of @file{process.c}
-significantly).
+the setting @option{-DVM}. All code you write will be in new
+files or in files introduced in earlier projects.
-There are only a couple of source files you will probably be
-encountering for the first time:
+You will probably be encountering just a few files for the first time:
@table @file
@item devices/disk.h
@itemx devices/disk.c
-Provides access to the physical disk, abstracting away the rather
-awful IDE interface.
+Provides access to the physical disk, abstracting away the rather awful
+IDE interface. You will use this interface to access the swap disk.
@end table
-You will be building this assignment on the last one. It will benefit
-you to get your project 2 in good working order before this assignment
-so those bugs don't keep haunting you.
-
-All the test programs from the previous project should also work with
-this project. You should also write programs to test the new features
-introduced in this project.
+@node Memory Terminology
+@subsection Memory Terminology
-You will continue to handle Pintos disks and file systems the same way
-you did in the previous assignment (@pxref{Using the File System}).
+Careful definitions are needed to keep discussion of virtual memory from
+being confusing. Thus, we begin by presenting some terminology for
+memory and storage. Some of these terms should be familiar from project
+2 (@pxref{Virtual Memory Layout}), but much of it is new.
@menu
-* VM Design::
-* Page Faults::
-* Disk as Backing Store::
-* Memory Mapped Files::
-* Stack::
-* Problem 3-1 Page Table Management::
-* Problem 3-2 Paging To and From Disk::
-* Problem 3-3 Memory Mapped Files::
-* Virtual Memory FAQ::
+* Pages::
+* Frames::
+* Page Tables::
+* Swap Slots::
@end menu
-@node VM Design
-@section A Word about Design
-
-It is important for you to note that in addition to getting virtual
-memory working, this assignment is also meant to be an open-ended
-design problem. We will expect you to come up with a design that
-makes sense. You will have the freedom to choose how to handle page
-faults, how to organize the swap disk, how to implement paging, etc.
-In each case, we will expect you to provide a defensible justification
-in your design documentation as to why your choices are reasonable.
-You should evaluate your design on all the available criteria: speed
-of handling a page fault, space overhead in memory, minimizing the
-number of page faults, simplicity, etc.
-
-In keeping with this, you will find that we are going to say as little
-as possible about how to do things. Instead we will focus on what end
-functionality we require your OS to support.
-
-@node Page Faults
-@section Page Faults
-
-For the last assignment, whenever a context switch occurred, the new
-process would install its own page table into the machine. The page
-table contained all the virtual-to-physical translations for the
-process. Whenever the processor needed to look up a translation, it
-consulted the page table. As long as the process only accessed
-memory that it owned, all was well. If the process accessed
-memory it didn't own, it ``page faulted'' and @func{page_fault}
-terminated the process.
-
-When we implement virtual memory, the rules have to change. A page
-fault is no longer necessarily an error, since it might only indicate
-that the page must be brought in from a disk file or from swap. You
-will have to implement a more sophisticated page fault handler to
-handle these cases.
-
-On the 80@var{x}86, the page table format is fixed by hardware. We
-have provided code for managing page tables for you to use in
-@file{userprog/pagedir.c}. The functions in there should provide an
-abstract interface to all the page table functionality that you need
-to complete the project. However, you may still find it worthwhile to
-understand a little about the hardware page table format, so we'll go
-into a little of detail about that in this section.
-
-The top-level paging data structure is a 4 kB page called the ``page
-directory'' (PD) arranged as an array of 1,024 32-bit page directory
-entries (PDEs), each of which represents 4 MB of virtual memory. Each
-PDE may point to the physical address of another 4 kB page called a
-``page table'' (PT) arranged in the same fashion as an array of 1,024
-32-bit page table entries (PTEs), each of which translates a single 4
-kB virtual page into physical memory.
-
-Thus, translation of a virtual address into a physical address follows
-the three-step process illustrated in the diagram
-below:@footnote{Actually, virtual to physical translation on the
-80@var{x}86 architecture happens via an intermediate ``linear
-address,'' but Pintos (and most other 80@var{x}86 OSes) set up the CPU
-so that linear and virtual addresses are one and the same, so that you
-can effectively ignore this CPU feature.}
+@node Pages
+@subsubsection Pages
-@enumerate 1
-@item
-The top 10 bits of the virtual address (bits 22:32) are used to index
-into the page directory. If the PDE is marked ``present,'' the
-physical address of a page table is read from the PDE thus obtained.
-If the PDE is marked ``not present'' then a page fault occurs.
+A @dfn{page}, sometimes called a @dfn{virtual page}, is a continuous
+region of virtual memory 4,096 bytes (the @dfn{page size}) in length. A
+page must be @dfn{page-aligned}, that is, start on a virtual address
+evenly divisible by the page size. Thus, a 32-bit virtual address can
+be divided into a 20-bit @dfn{page number} and a 12-bit @dfn{page
+offset} (or just @dfn{offset}), like this:
-@item
-The next 10 bits of the virtual address (bits 12:22) are used to index
-into the page table. If the PTE is marked ``present,'' the physical
-address of a data page is read from the PTE thus obtained. If the PTE
-is marked ``not present'' then a page fault occurs.
+@example
+@group
+ 31 12 11 0
+ +-------------------+-----------+
+ | Page Number | Offset |
+ +-------------------+-----------+
+ Virtual Address
+@end group
+@end example
+Each process has an independent set of @dfn{user (virtual) pages}, which
+are those pages below virtual address @code{PHYS_BASE}, typically
+@t{0xc0000000} (3 GB). The set of @dfn{kernel (virtual) pages}, on the
+other hand, is global, remaining the same regardless of what thread or
+process is active. The kernel may access both user and kernel pages,
+but a user process may access only its own user pages. @xref{Virtual
+Memory Layout}, for more information.
-@item
-The bottom 12 bits of the virtual address (bits 0:12) are added to the
-data page's physical base address, producing the final physical
-address.
-@end enumerate
+Pintos provides several useful functions for working with virtual
+addresses. @xref{Virtual Addresses}, for details.
+
+@node Frames
+@subsubsection Frames
+
+A @dfn{frame}, sometimes called a @dfn{physical frame} or a @dfn{page
+frame}, is a continuous region of physical memory. Like pages, frames
+must be page-size and page-aligned. Thus, a 32-bit physical address can
+be divided into a 20-bit @dfn{frame number} and a 12-bit @dfn{frame
+offset} (or just @dfn{offset}), like this:
@example
@group
-32 22 12 0
-+--------------------------------------------------------------------+
-| Page Directory Index | Page Table Index | Page Offset |
-+--------------------------------------------------------------------+
- | | |
- _______/ _______/ _____/
- / / /
- / Page Directory / Page Table / Data Page
- / .____________. / .____________. / .____________.
- |1,023|____________| |1,023|____________| | |____________|
- |1,022|____________| |1,022|____________| | |____________|
- |1,021|____________| |1,021|____________| \__\|____________|
- |1,020|____________| |1,020|____________| /|____________|
- | | | | | | | |
- | | | \____\| |_ | |
- | | . | /| . | \ | . |
- \____\| . |_ | . | | | . |
- /| . | \ | . | | | . |
- | . | | | . | | | . |
- | | | | | | | |
- |____________| | |____________| | |____________|
- 4|____________| | 4|____________| | |____________|
- 3|____________| | 3|____________| | |____________|
- 2|____________| | 2|____________| | |____________|
- 1|____________| | 1|____________| | |____________|
- 0|____________| \__\0|____________| \____\|____________|
- / /
+ 31 12 11 0
+ +-------------------+-----------+
+ | Frame Number | Offset |
+ +-------------------+-----------+
+ Physical Address
@end group
@end example
-Header @file{threads/mmu.h} has useful functions for various
-operations on virtual addresses. You should look over the header
-yourself, but its most important functions include these:
+The 80@var{x}86 doesn't provide any way to directly access memory at a
+physical address. Pintos works around this by mapping kernel virtual
+memory directly to physical memory: the first page of kernel virtual
+memory is mapped to the first frame of physical memory, the second page
+to the second frame, and so on. Thus, frames can be accessed through
+kernel virtual memory.
-@table @code
-@item pd_no(@var{va})
-Returns the page directory index in virtual address @var{va}.
+Pintos provides functions for translating between physical addresses and
+kernel virtual addresses. @xref{Virtual Addresses}, for details.
-@item pt_no(@var{va})
-Returns the page table index in virtual address @var{va}.
+@node Page Tables
+@subsubsection Page Tables
-@item pg_ofs(@var{va})
-Returns the page offset in virtual address @var{va}.
+In Pintos, a @dfn{page table} is a data structure that the CPU uses to
+translate a virtual address to a physical address, that is, from a page
+to a frame. The page table format is dictated by the 80@var{x}86
+architecture. Pintos provides page table management code in
+@file{pagedir.c} (@pxref{Page Table}).
-@item pg_round_down(@var{va})
-Returns @var{va} rounded down to the nearest page boundary, that is,
-@var{va} but with its page offset set to 0.
+The diagram below illustrates the relationship between pages and frames.
+The virtual address, on the left, consists of a page number and an
+offset. The page table translates the page number into a frame number,
+which is combined with the unmodified offset to obtain the physical
+address, on the right.
-@item pg_round_up(@var{va})
-Returns @var{va} rounded up to the nearest page boundary.
-@end table
+@example
+@group
+ +----------+
+ .--------------->|Page Table|-----------.
+ / +----------+ |
+ 0 | 12 11 0 0 V 12 11 0
+ +---------+----+ +---------+----+
+ |Page Nr | Ofs| |Frame Nr | Ofs|
+ +---------+----+ +---------+----+
+ Virt Addr | Phys Addr ^
+ \_______________________________________/
+@end group
+@end example
-@node Disk as Backing Store
-@section Disk as Backing Store
-
-In VM systems, since memory is less plentiful than disk, you will
-effectively use memory as a cache for disk. Looking at it from
-another angle, you will use disk as a backing store for memory. This
-provides the abstraction of an (almost) unlimited virtual memory size.
-Part of your task in this project is to do this, with the additional
-constraint that your performance should be close to that provided by
-physical memory. You will use the page tables' ``dirty'' bits to
-denote whether pages need to be written back to disk when they're
-evicted from main memory and the ``accessed'' bit for page replacement
-algorithms. Whenever the hardware writes memory, it sets the dirty
-bit, and if it reads or writes to the page, it sets the accessed bit.
-
-As with any caching system, performance depends on the policy used to
-decide which things are kept in memory and which are only stored on
-disk. On a page fault, the kernel must decide which page to replace.
-Ideally, it will throw out a page that will not be referenced for a
-long time, keeping in memory those pages that are soon to be
-referenced. Another consideration is that if the replaced page has
-been modified, the page must be first saved to disk before the needed
-page can be brought in. Many virtual memory systems avoid this extra
-overhead by writing modified pages to disk in advance, so that later
-page faults can be completed more quickly (but you do not have to
-implement this optimization).
+@node Swap Slots
+@subsubsection Swap Slots
-@node Memory Mapped Files
-@section Memory Mapped Files
-
-The traditional way to access the file system is via @code{read} and
-@code{write} system calls, but that requires an extra level of copying
-between the kernel and the user level. A secondary interface is
-simply to ``map'' the file into the virtual address space. The
-program can then use load and store instructions directly on the file
-data. (An alternative way of viewing the file system is as ``durable
-memory.'' Files just store data structures. If you access data
-structures in memory using load and store instructions, why not access
-data structures in files the same way?)
-
-Memory mapped files are typically implemented using system calls. One
-system call maps the file to a particular part of the address space.
-For example, one might conceptually map the file @file{foo}, which is
-1000 bytes
-long, starting at address 5000. Assuming that nothing else is already
-at virtual addresses 5000@dots{}6000, any memory accesses to these
-locations will access the corresponding bytes of @file{foo}.
-
-A consequence of memory mapped files is that address spaces are
-sparsely populated with lots of segments, one for each memory mapped
-file (plus one each for code, data, and stack). You will implement
-memory mapped files in problem 3-3. You should
-design your solutions to problems 3-1 and 3-2 to anticipate this.
-
-@node Stack
-@section Stack
-
-In project 2, the stack was a single page at the top of the user
-virtual address space. The stack's location does not change in this
-project, but your kernel should allocate additional pages to the stack
-on demand. That is, if the stack grows past its current bottom, the
-system should allocate additional pages for the stack as necessary
-(unless those pages are unavailable because they are in use by another
-segment).
-
-It is impossible to predict how large the stack will grow at compile
-time, so we must allocate pages as necessary. You should only allocate
-additional pages if they ``appear'' to be stack accesses. You must
-devise a heuristic that attempts to distinguish stack accesses from
-other accesses. Document and explain the heuristic in your
-design documentation.
-
-The first stack page need not be loaded lazily. You can initialize it
-with the command line at load time, with no need to wait for it to be
-faulted in. Even if you did wait, the very first instruction in the
-user program is likely to be one that faults in the page.
-
-Stack facts:
-
-@itemize
-@item
-The user program's current stack pointer is in the @struct{intr_frame}'s
-@code{esp} member.
+A @dfn{swap slot} is a continuous, page-size region of disk space on the
+swap disk. Although hardware limitations dictating the placement of
+slots are looser than for pages and frames, swap slots should be
+page-aligned because there is no downside in doing so.
-@item
-Only buggy 80@var{x}86 user programs write to memory within the
-stack but below the stack pointer. This is because more advanced OSes
-may interrupt a process at any time to deliver a ``signal'' and this
-uses the stack.@footnote{This rule is common but not universal. One
-modern exception is the
-@uref{http://www.x86-64.org/documentation/abi.pdf, @var{x}86-64 System
-V ABI}, which designates 128 bytes below the stack pointer as a ``red
-zone'' that may not be modified by signal or interrupt handlers.}
+@node Resource Management Overview
+@subsection Resource Management Overview
-@item
-The 80@var{x}86 @code{push} instruction may cause a page fault 4 bytes
-below the stack pointer, because it checks access permissions before it
-adjusts the stack pointer. (Otherwise, the instruction would not be
-restartable in a straightforward fashion.)
+You will need to design the following data structures:
-@item
-Similarly, the 80@var{x}86 @code{pusha} instruction, which pushes all 32
-bytes of the 8 general-purpose registers at once, may cause a page fault
-32 bytes below the stack pointer.
+@table @asis
+@item Supplemental page table
-@item
-Most OSes impose some sort of limit on the stack size. Sometimes it is
-user-adjustable.
-@end itemize
+Enables page fault handling by supplementing the page table.
+@xref{Managing the Supplemental Page Table}.
-@node Problem 3-1 Page Table Management
-@section Problem 3-1: Page Table Management
+@item Frame table
-Implement page directory and page table management to support virtual
-memory. You will need data structures to accomplish the following
-tasks:
+Allows efficient implementation of eviction policy.
+@xref{Managing the Frame Table}.
-@itemize @bullet
-@item
-Some way of translating in software from virtual page frames to
-physical page frames. Consider using a hash table (@pxref{Hash
-Table}).
+@item Swap table
-It is possible to do this translation without adding a new data
-structure, by modifying the code in @file{userprog/pagedir.c}. However,
-if you do that you'll need to carefully study and understand section 3.7
-in @bibref{IA32-v3}, and in practice it is probably easier to add a new
-data structure.
+Tracks usage of swap slots.
+@xref{Managing the Swap Table}.
-@item
-Some way of finding a page on disk if it is not in memory. You won't
-need this data structure until problem 3-2, but planning ahead is a
-good idea.
+@item Table of file mappings
-You can generalize the virtual-to-physical page table, so that it allows
-you to locate a page wherever it is in physical memory or on disk, or
-you can make this a separate table.
-
-@item
-Some way of translating from physical page frames back to virtual page
-frames, so that when you evict a physical page from its frame, you can
-invalidate its translation(s).
-@end itemize
+Processes may map files into their virtual memory space. You need a
+table to track which files are mapped into which pages.
+@end table
-The page fault handler, @func{page_fault} in
+You do not necessarily need to implement four completely distinct data
+structures: it may be convenient to wholly or partially merge related
+resources into a unified data structure.
+
+For each data structure, you need to determine what information each
+element should contain. You also need to decide on the data structure's
+scope, either local (per-process) or global (applying to the whole
+system), and how many instances are required within its scope.
+
+To simplify your design, you may store these data structures in
+non-pageable memory. That means that you can be sure that pointers
+among them will remain valid.
+
+Possible choices of data structures include arrays, lists, bitmaps, and
+hash tables. An array is often the simplest approach, but a sparsely
+populated array wastes memory. Lists are also simple, but traversing a
+long list to find a particular position wastes time. Both arrays and
+lists can be resized, but lists more efficiently support insertion and
+deletion in the middle.
+
+Pintos includes a bitmap data structure in @file{lib/kernel/bitmap.c}
+and @file{lib/kernel/bitmap.h}. A bitmap is an array of bits, each of
+which can be true or false. Bitmaps are typically used to track usage
+in a set of (identical) resources: if resource @var{n} is in use, then
+bit @var{n} of the bitmap is true. Pintos bitmaps are fixed in size,
+although you could extend their implementation to support resizing.
+
+Pintos also includes a hash table data structure (@pxref{Hash Table}).
+Pintos hash tables efficiently support insertions and deletions over a
+wide range of table sizes.
+
+Although more complex data structures may yield performance or other
+benefits, they may also needlessly complicate your implementation.
+Thus, we do not recommend implementing any advanced data structure
+(e.g.@: a balanced binary tree) as part of your design.
+
+@node Managing the Supplemental Page Table
+@subsection Managing the Supplemental Page Table
+
+The @dfn{supplemental page table} supplements the page table with
+additional data about each page. It is needed because of the
+limitations imposed by the page table's format. Such a data structure
+is often called a ``page table'' also; we add the word ``supplemental''
+to reduce confusion.
+
+The supplemental page table is used for at least two purposes. Most
+importantly, on a page fault, the kernel looks up the virtual page that
+faulted in the supplemental page table to find out what data should be
+there. Second, the kernel consults the supplemental page table when a
+process terminates, to decide what resources to free.
+
+You may organize the supplemental page table as you wish. There are at
+least two basic approaches to its organization: in terms of segments or
+in terms of pages. Optionally, you may use the page table itself as an
+index to track the members of the supplemental page table. You will
+have to modify the Pintos page table implementation in @file{pagedir.c}
+to do so. We recommend this approach for advanced students only.
+@xref{Page Table Entry Format}, for more information.
+
+The most important user of the supplemental page table is the page fault
+handler. In project 2, a page fault always indicated a bug in the
+kernel or a user program. In project 3, this is no longer true. Now, a
+page fault might only indicate that the page must be brought in from a
+file or swap. You will have to implement a more sophisticated page
+fault handler to handle these cases. Your page fault handler, which you
+should implement by modifying @func{page_fault} in
@file{threads/exception.c}, needs to do roughly the following:
@enumerate 1
@item
-Locate the page backing the virtual
-address that faulted. It might be in the file system, in swap,
-or it might be an invalid virtual address.
-If you implement sharing, it might even
-already be in physical memory and just not set up in the page table,
-
-If the virtual address is invalid, that is, if there's nothing
-assigned to go there, or if the virtual address is above
-@code{PHYS_BASE}, meaning that it belongs to the kernel instead of the
-user, then the process's memory access must be disallowed. You should
-terminate the process at this point, being sure to free all of its
-resources.
+Locate the page that faulted in the supplemental page table. If the
+memory reference is valid, use the supplemental page table entry to
+locate the data that goes in the page, which might be in the file
+system, or in a swap slot, or it might simply be an all-zero page. If
+you implement sharing, the page's data might even already be in a page
+frame, but not in the page table.
+
+If the page is unmapped, that is, if there's no data there, or if the
+page lies within kernel virtual memory, or if the access is an attempt
+to write to a read-only page, then the access is invalid. Any invalid
+access terminates the process and thereby frees all of its resources.
@item
-If the page is not in physical memory, fetch it by appropriate means.
-If necessary to make room, first evict some other page from memory.
-(When you do that you need to first remove references to the page from
-any page table that refers to it.)
+Obtain a frame to store the page. @xref{Managing the Frame Table}, for
+details.
-@item
-Point the page table entry for the faulting virtual address to the
-physical page. You can use the functions in @file{userprog/pagedir.c}.
-@end enumerate
+If you implement sharing, the data you need may already be in a frame,
+in which case you must be able to locate that frame.
-You'll need to modify the ELF loader in @file{userprog/process.c} to
-do page table management according to your new design. As supplied,
-it reads all the process's pages from disk and initializes the page
-tables for them at the same time. For testing purposes, you'll
-probably want to leave the code that reads the pages from disk, but
-use your new page table management code to construct the page tables
-only as page faults occur for them.
-
-You should use the @func{palloc_get_page} function to get the page
-frames that you use for storing user virtual pages. Be sure to pass
-the @code{PAL_USER} flag to this function when you do so, because that
-allocates pages from a ``user pool'' separate from the ``kernel pool''
-that other calls to @func{palloc_get_page} make.
-
-There are many possible ways to implement virtual memory. The above
-is simply an outline of our suggested implementation.
-
-@node Problem 3-2 Paging To and From Disk
-@section Problem 3-2: Paging To and From Disk
-
-Implement paging to and from files and the swap disk. You may use the
-disk on interface @code{hd1:1} as the swap disk, using the disk
-interface prototyped in @code{devices/disk.h}. From the @file{vm/build}
-directory, use the command @code{pintos make-disk swap.dsk @var{n}} to
-create an @var{n} MB swap disk named @file{swap.dsk}. Afterward,
-@file{swap.dsk} will automatically be attached when you run
-@command{pintos}.
-
-You will need routines to move a page from memory to disk and from
-disk to memory, where ``disk'' is either a file or the swap disk. If
-you do everything correctly, your VM should still work when you
-implement your own file system for the next assignment.
-
-You will need a way to track pages which are used by a process but
-which are not in physical memory, to fully handle page faults. Pages
-that you write to swap should not be constrained to be in sequential
-order. You will also need a way to track all of the physical memory
-pages, to find an unused one when needed, or to evict a page
-when memory is needed but no empty pages are available. The data
-structures that you designed for problem 3-1 should do most of the work for
-you.
-
-You will need a page replacement algorithm. The hardware sets the
-accessed and dirty bits when it accesses memory. You can gain access
-to this information using the functions prototyped in
-@file{userprog/pagedir.h}. You should be able to take advantage of
-this information to implement some algorithm which attempts to achieve
-LRU-type behavior. We expect that your algorithm perform at least as
-well as a reasonable implementation of the second-chance (clock)
-algorithm. You will need to show in your test cases the value of your
-page replacement algorithm by demonstrating for some workload that it
-pages less frequently using your algorithm than using some inferior
-page replacement policy. The canonical example of a poor page
-replacement policy is random replacement.
-
-You must write your code so that we can choose a page replacement
-policy at Pintos startup time. By default, the LRU-like algorithm
-must be in effect, but we must be able to choose random replacement by
-invoking @command{pintos} with the @option{-o random-paging} option.
-Passing this option sets @code{enable_random_paging}, declared in
-@file{threads/init.h}, to true.
-
-Since you will already be paging from disk, you should implement a
-``lazy'' loading scheme for new processes. When a process is created,
-it will not run immediately. Therefore, it doesn't make sense to load
-all its code, data, and stack into memory when the process is created,
-since it might incur additional disk accesses to do so (if it gets
-paged out before it runs). When loading a new process, you should
-leave most pages on disk, and bring them in as demanded when the
-program begins running. Your VM system should also use the executable
-file itself as backing store for read-only segments, since these
-segments won't change.
-
-There are a few special cases. Look at the loop in
-@func{load_segment} in @file{userprog/process.c}. Each time
-around the loop, @code{read_bytes} represents the number of bytes to
-read from the executable file and @code{zero_bytes} represents the number
-of bytes to initialize to zero following the bytes read. The two
-always sum to @code{PGSIZE}. The page handling depends on these
-variables' values:
-
-@itemize @bullet
@item
-If @code{read_bytes} equals @code{PGSIZE}, the page should be demand
-paged from disk on its first access.
+Fetch the data into the frame, by reading it from the file system or
+swap, zeroing it, etc.
-@item
-If @code{zero_bytes} equals @code{PGSIZE}, the page does not need to
-be read from disk at all because it is all zeroes. You should handle
-such pages by creating a new page consisting of all zeroes at the
-first page fault.
+If you implement sharing, the page you need may already be in a frame,
+in which case no action is necessary in this step.
@item
-If neither @code{read_bytes} nor @code{zero_bytes} equals
-@code{PGSIZE}, then part of the page is to be read from disk and the
-remainder zeroed. This is a special case. You are allowed to handle
-it by reading the partial page from disk at executable load time and
-zeroing the rest of the page. This is the only case in which we will
-allow you to load a page in a non-``lazy'' fashion. Many real OSes
-such as Linux do not load partial pages lazily.
-@end itemize
-
-Incidentally, if you have trouble handling the third case above, you
-can eliminate it temporarily by linking the test programs with a
-special ``linker script.'' Read @file{Makefile.userprog} for
-details. We will not test your submission with this special linker
-script, so the code you turn in must properly handle all cases.
-
-For extra credit, you may implement sharing: when multiple processes
-are created that use the same executable file, share read-only pages
-among those processes instead of creating separate copies of read-only
-segments for each process. If you carefully designed your data
-structures in problem 3-1, sharing of read-only pages should not make this
-part significantly harder.
-
-@node Problem 3-3 Memory Mapped Files
-@section Problem 3-3: Memory Mapped Files
-
-Implement memory mapped files.
-
-You will need to implement the following system calls:
-
-@table @code
-@item SYS_mmap
-@itemx mapid_t mmap (int @var{fd}, void *@var{addr})
-
-Maps the file open as @var{fd} into the process's virtual address
-space. The entire file is mapped into consecutive virtual pages
-starting at @var{addr}.
-
-If the file's length is not a multiple of @code{PGSIZE}, then some
-bytes in the final mapped page ``stick out'' beyond the end of the
-file. Set these bytes to zero when the page is faulted in from disk,
-and discard them when the page is written back to disk.
-
-If successful, this function returns a ``mapping ID'' that must
-uniquely identify the mapping within the given process. On failure,
-it must return -1, which otherwise should not be a valid mapping id.
-
-A call to @code{mmap} may fail if the file open as @var{fd} has a
-length of zero bytes. It must fail if @var{addr} is not page-aligned
-or if the range of pages mapped overlaps any existing set of mapped
-pages, including the stack or pages mapped at executable load time.
+Point the page table entry for the faulting virtual address to the
+physical page. You can use the functions in @file{userprog/pagedir.c}.
+@end enumerate
-Your VM system should use the @code{mmap}'d file itself as backing
-store for the mapping. That is, to evict a page mapped by
-@code{mmap}, write it to the file it was mapped from. (In fact, you
-may choose to implement executable mappings as a special case of file
-mappings.)
+@node Managing the Frame Table
+@subsection Managing the Frame Table
-@item SYS_munmap
-@itemx bool munmap (mapid_t @var{mapping})
+The @dfn{frame table} contains one entry for each frame that contains a
+user page. Each entry in the frame table contains a pointer to the
+page, if any, that currently occupies it, and other data of your choice.
+The frame table allows Pintos to efficiently implement an eviction
+policy, by choosing a page to evict when no frames are free.
-Unmaps the mapping designated by @var{mapping}, which must be a
-mapping ID returned by a previous call to @code{mmap} by the same
-process that has not yet been unmapped.
-@end table
+The frames used for user pages should be obtained from the ``user
+pool,'' by calling @code{palloc_get_page(PAL_USER)}. You must use
+@code{PAL_USER} to avoid allocating from the ``kernel pool,'' which
+could cause some test cases to fail unexpectedly (@pxref{Why
+PAL_USER?}). If you modify @file{palloc.c} as part of your frame table
+implementation, be sure to retain the distinction between the two pools.
-All mappings are implicitly unmapped when a process exits, whether via
-@code{exit} or by any other means. When a mapping is unmapped, whether
-implicitly or explicitly, all outstanding changes are written to the
-file, and the pages are removed from the process's list of used
-virtual pages.
+The most important operation on the frame table is obtaining an unused
+frame. This is easy when a frame is free. When none is free, a frame
+must be made free by evicting some page from its frame. If no frame can
+be evicted without allocating a swap slot, but swap is full, some
+process must be killed to free memory (the choice of process to kill is
+up to you).
-@node Virtual Memory FAQ
-@section FAQ
+The process of eviction comprises roughly the following steps:
@enumerate 1
@item
-@b{Do we need a working HW 2 to implement HW 3?}
+Choose a frame to evict, using your page replacement algorithm. The
+``accessed'' and ``dirty'' bits in the page table, described below, will
+come in handy.
-Yes.
+@item
+Remove references to the frame from any page table that refers to it.
+
+Unless you have implemented sharing, only a single page should refer to
+a frame at any given time.
@item
-@anchor{Hash Table}
-@b{How do I use the hash table provided in @file{lib/kernel/hash.c}?}
-
-First, you need to embed a @struct{hash_elem} as a member of the
-object that the hash table will contain. Each @struct{hash_elem} allows
-the object to a member of at most one hash table at a given time. All
-the hash table functions that deal with hash table items actually use
-the address of a @struct{hash_elem}. You can convert a pointer to a
-@struct{hash_elem} member into a pointer to the structure in which
-member is embedded using the @code{hash_entry} macro.
-
-Second, you need to decide on a key type. The key should be something
-that is unique for each object, because a given hash table may not
-contain two objects with equal keys. Then you need to write two
-functions. The first is a @dfn{hash function} that converts a key
-into an integer. Some sample hash functions that you can use or just
-examine are given in @file{lib/kernel/hash.c}. The second function
-needed is a @dfn{comparison function} that compares a pair and returns
-true if the first is less than the second. These two functions have
-to be compatible with the prototypes for @code{hash_hash_func} and
-@code{hash_less_func} in @file{lib/kernel/hash.h}.
-
-Here's a quick example. Suppose you want to put @struct{thread}s
-in a hash table. First, add a @struct{hash_elem} to the thread
-structure by adding a line to its definition:
+If necessary, write the page to the file system or to swap.
+@end enumerate
-@example
-struct hash_elem h_elem; /* Hash table element. */
-@end example
+The evicted frame may then be used to store a different page.
-We'll choose the @code{tid} member in @struct{thread} as the key,
-and write a hash function and a comparison function:
+@menu
+* Accessed and Dirty Bits::
+@end menu
+
+@node Accessed and Dirty Bits
+@subsubsection Accessed and Dirty Bits
+
+80@var{x}86 hardware provides some assistance for implementing page
+replacement algorithms, through a pair of bits in the page table entry
+(PTE) for each page. On any read or write to a page, the CPU sets the
+@dfn{accessed bit} to 1 in the page's PTE, and on any write, the CPU
+sets the @dfn{dirty bit} to 1. The CPU never resets these bits to 0,
+but the OS may do so.
+
+You need to be aware of @dfn{aliases}, that is, two (or more) pages that
+refer to the same frame. When an aliased frame is accessed, the
+accessed and dirty bits are updated in only one page table entry (the
+one for the page used for access). The accessed and dirty bits for the
+other aliases are not updated.
+
+In Pintos, every user virtual page is aliased to its kernel virtual
+page. You must manage these aliases somehow. For example, your code
+could check and update the accessed and dirty bits for both addresses.
+Alternatively, the kernel could avoid the problem by only accessing user
+data through the user virtual address.
+
+Other aliases should only arise if you implement sharing for extra
+credit (@pxref{VM Extra Credit}), or if there is a bug in your code.
+
+@xref{Page Table Accessed and Dirty Bits}, for details of the functions
+to work with accessed and dirty bits.
+
+@node Managing the Swap Table
+@subsection Managing the Swap Table
+
+The swap table tracks in-use and free swap slots. It should allow
+picking an unused swap slot for evicting a page from its frame to the
+swap disk. It should allow freeing a swap slot when its page is read
+back or the process whose page was swapped is terminated.
+
+You may use the disk on interface @code{hd1:1} as the swap disk, using
+the disk interface prototyped in @code{devices/disk.h}. From the
+@file{vm/build} directory, use the command @code{pintos-mkdisk swap.dsk
+@var{n}} to create an @var{n} MB swap disk named @file{swap.dsk}.
+Afterward, @file{swap.dsk} will automatically be attached as
+@code{hd1:1} when you run @command{pintos}. Alternatively, you can tell
+@command{pintos} to use a temporary @var{n}-MB swap disk for a single
+run with @option{--swap-disk=@var{n}}.
+
+Swap slots should be allocated lazily, that is, only when they are
+actually required by eviction. Reading data pages from the executable
+and writing them to swap immediately at process startup is not lazy.
+Swap slots should not be reserved to store particular pages.
+
+Free a swap slot when its contents are read back into a frame.
+
+@node Managing Memory Mapped Files Back
+@subsection Managing Memory Mapped Files
+
+The file system is most commonly accessed with @code{read} and
+@code{write} system calls. A secondary interface is to ``map'' the file
+into virtual pages, using the @code{mmap} system call. The program can
+then use memory instructions directly on the file data.
+
+Suppose file @file{foo} is @t{0x1000} bytes (4 kB, or one page) long.
+If @file{foo} is mapped into memory starting at address @t{0x5000}, then
+any memory accesses to locations @t{0x5000}@dots{}@t{0x5fff} will access
+the corresponding bytes of @file{foo}.
+
+Here's a program that uses @code{mmap} to print a file to the console.
+It opens the file specified on the command line, maps it at virtual
+address @t{0x10000000}, writes the mapped data to the console (fd 1),
+and unmaps the file.
@example
-/* Returns a hash for E. */
-unsigned
-thread_hash (const struct hash_elem *e, void *aux UNUSED)
+#include <stdio.h>
+#include <syscall.h>
+int main (int argc UNUSED, char *argv[])
@{
- struct thread *t = hash_entry (e, struct thread, h_elem);
- return hash_int (t->tid);
-@}
+ void *data = (void *) 0x10000000; /* @r{Address at which to map.} */
-/* Returns true if A's tid is less than B's tid. */
-bool
-thread_less (const struct hash_elem *a_, const struct hash_elem *b_,
- void *aux UNUSED)
-@{
- struct thread *a = hash_entry (a_, struct thread, h_elem);
- struct thread *b = hash_entry (b_, struct thread, h_elem);
- return a->tid < b->tid;
+ int fd = open (argv[1]); /* @r{Open file.} */
+ mapid_t map = mmap (fd, data); /* @r{Map file.} */
+ write (1, data, filesize (fd)); /* @r{Write file to console.} */
+ munmap (map); /* @r{Unmap file (optional).} */
+ return 0;
@}
@end example
-Then we can create a hash table like this:
+A similar program with full error handling is included as @file{mcat.c}
+in the @file{examples} directory, which also contains @file{mcp.c} as a
+second example of @code{mmap}.
-@example
-struct hash threads;
+Your submission must be able to track what memory is used by memory
+mapped files. This is necessary to properly handle page faults in the
+mapped regions and to ensure that mapped files do not overlap any other
+segments within the process.
-hash_init (&threads, thread_hash, thread_less, NULL);
-@end example
+@node Project 3 Suggested Order of Implementation
+@section Suggested Order of Implementation
-Finally, if @code{@var{t}} is a pointer to a @struct{thread},
-then we can insert it into the hash table with:
+We suggest the following initial order of implementation:
-@example
-hash_insert (&threads, &@var{t}->h_elem);
-@end example
+@enumerate 1
+@item
+Frame table (@pxref{Managing the Frame Table}). Change @file{process.c}
+to use your frame table allocator.
-If you have any other questions about hash tables, the CS109
-and CS161 textbooks have good chapters on them, or you can come
-to any of the TA's office hours for further clarification.
+Do not implement swapping yet. If you run out of frames, fail the
+allocator or panic the kernel.
-@item
-@b{What are the @var{aux} parameters to the hash table functions good
-for?}
-
-In simple cases you won't have any need for the @var{aux} parameters.
-In these cases you can just pass a null pointer to @func{hash_init}
-for @var{aux} and ignore the values passed to the hash function and
-comparison functions. (You'll get a compiler warning if you don't use
-the @var{aux} parameter, but you can turn that off with the
-@code{UNUSED} macro, as shown above, or you can just ignore it.)
-
-@var{aux} is useful when you have some property of the data in the
-hash table that's both constant and needed for hashing or comparisons,
-but which is not stored in the data items themselves. For example, if
-the items in a hash table contain fixed-length strings, but the items
-themselves don't indicate what that fixed length is, you could pass
-the length as an @var{aux} parameter.
+After this step, your kernel should still pass all the project 2 test
+cases.
@item
-@b{The current implementation of the hash table does not do something
-that we need it to do. What gives?}
+Supplemental page table and page fault handler (@pxref{Managing the
+Supplemental Page Table}). Change @file{process.c} to record the
+necessary information in the supplemental page table when loading an
+executable and setting up its stack. Implement loading of code and data
+segments in the page fault handler. For now, consider only valid
+accesses.
+
+After this step, your kernel should pass all of the project 2
+functionality test cases, but only some of the robustness tests.
+@end enumerate
-You are welcome to modify it. It is not used by any of the code we
-provided, so modifying it won't affect any code but yours. Do
-whatever it takes to make it work the way you want.
+From here, you can implement stack growth, mapped files, and page
+reclamation on process exit in parallel.
-@item
-@b{What controls the layout of user programs?}
+The next step is to implement eviction (@pxref{Managing the Frame
+Table}). Initially you could choose the page to evict randomly. At
+this point, you need to consider how to manage accessed and dirty bits
+and aliasing of user and kernel pages. Synchronization is also a
+concern: how do you deal with it if process A faults on a page whose
+frame process B is in the process of evicting? Finally, implement a
+eviction strategy such as the clock algorithm.
-The linker is responsible for the layout of a user program in
-memory. The linker is directed by a ``linker script'' which tells it
-the names and locations of the various program segments. You can
-learn more about linker scripts by reading the ``Scripts'' chapter in
-the linker manual, accessible via @samp{info ld}.
-@end enumerate
+@node Project 3 Requirements
+@section Requirements
+
+This assignment is an open-ended design problem. We are going to say as
+little as possible about how to do things. Instead we will focus on
+what functionality we require your OS to support. We will expect
+you to come up with a design that makes sense. You will have the
+freedom to choose how to handle page faults, how to organize the swap
+disk, how to implement paging, etc.
@menu
-* Problem 3-1 and 3-2 FAQ::
-* Problem 3-3 Memory Mapped File FAQ::
+* Project 3 Design Document::
+* Paging::
+* Stack Growth::
+* Memory Mapped Files::
@end menu
-@node Problem 3-1 and 3-2 FAQ
-@subsection Problem 3-1 and 3-2 FAQ
+@node Project 3 Design Document
+@subsection Design Document
+
+Before you turn in your project, you must copy @uref{vm.tmpl, , the
+project 3 design document template} into your source tree under the name
+@file{pintos/src/vm/DESIGNDOC} and fill it in. We recommend that you
+read the design document template before you start working on the
+project. @xref{Project Documentation}, for a sample design document
+that goes along with a fictitious project.
+
+@node Paging
+@subsection Paging
+
+Implement paging for segments loaded from executables. All of these
+pages should be loaded lazily, that is, only as the kernel intercepts
+page faults for them. Upon eviction, pages modified since load (e.g.@:
+as indicated by the ``dirty bit'') should be written to swap.
+Unmodified pages, including read-only pages, should never be written to
+swap because they can always be read back from the executable.
+
+Implement a global page replacement algorithm that approximates LRU.
+Your algorithm should perform at least as well as the ``second chance''
+or ``clock'' algorithm.
+
+Your design should allow for parallelism. If one page fault requires
+I/O, in the meantime processes that do not fault should continue
+executing and other page faults that do not require I/O should be able
+to complete. This will require some synchronization effort.
+
+You'll need to modify the core of the program loader, which is the loop
+in @func{load_segment} in @file{userprog/process.c}. Each time around
+the loop, @code{page_read_bytes} receives the number of bytes to read
+from the executable file and @code{page_zero_bytes} receives the number
+of bytes to initialize to zero following the bytes read. The two always
+sum to @code{PGSIZE} (4,096). The handling of a page depends on these
+variables' values:
-@enumerate 1
+@itemize @bullet
@item
-@b{Does the virtual memory system need to support growth of the data
-segment?}
+If @code{page_read_bytes} equals @code{PGSIZE}, the page should be demand
+paged from disk on its first access.
-No. The size of the data segment is determined by the linker. We
-still have no dynamic allocation in Pintos (although it is possible to
-``fake'' it at the user level by using memory-mapped files). However,
-implementing it would add little additional complexity to a
-well-designed system.
+@item
+If @code{page_zero_bytes} equals @code{PGSIZE}, the page does not need to
+be read from disk at all because it is all zeroes. You should handle
+such pages by creating a new page consisting of all zeroes at the
+first page fault.
@item
-@b{Why do I need to pass @code{PAL_USER} to @func{palloc_get_page}
-when I allocate physical page frames?}@anchor{Why PAL_USER?}
-
-You can layer some other allocator on top of @func{palloc_get_page}
-if you like, but it should be the underlying mechanism, directly or
-indirectly, for two reasons. First, running out of pages in the user
-pool just causes user programs to page, but running out of pages in
-the kernel pool will cause all kinds of problems, because many kernel
-functions depend on being able to allocate memory. Second, you can
-use the @option{-ul} option to @command{pintos} to limit the size of
-the user pool, which makes it easy to test your VM implementation with
-various user memory sizes.
-@end enumerate
+Otherwise, neither @code{page_read_bytes} nor @code{page_zero_bytes}
+equals @code{PGSIZE}. In this case, an initial part of the page is to
+be read from disk and the remainder zeroed.
+@end itemize
-@node Problem 3-3 Memory Mapped File FAQ
-@subsection Problem 3-3: Memory Mapped File FAQ
+@node Stack Growth
+@subsection Stack Growth
-@enumerate 1
-@item
-@b{How do we interact with memory-mapped files?}
+Implement stack growth. In project 2, the stack was a single page at
+the top of the user virtual address space, and programs were limited to
+that much stack. Now, if the stack grows past its current size,
+allocate additional pages as necessary.
-Let's say you want to map a file called @file{foo} into your address
-space at address @t{0x10000000}. You open the file then use @code{mmap}:
+Allocate additional pages only if they ``appear'' to be stack accesses.
+Devise a heuristic that attempts to distinguish stack accesses from
+other accesses.
-@example
-#include <stdio.h>
-#include <syscall.h>
+User programs are buggy if they write to the stack below the stack
+pointer, because typical real OSes may interrupt a process at any time
+to deliver a ``signal,'' which pushes data on the stack.@footnote{This rule is
+common but not universal. One modern exception is the
+@uref{http://www.x86-64.org/documentation/abi.pdf, @var{x}86-64 System V
+ABI}, which designates 128 bytes below the stack pointer as a ``red
+zone'' that may not be modified by signal or interrupt handlers.}
+However, the 80@var{x}86 @code{PUSH} instruction checks access
+permissions before it adjusts the stack pointer, so it may cause a page
+fault 4 bytes below the stack pointer. (Otherwise, @code{PUSH} would
+not be restartable in a straightforward fashion.) Similarly, the
+@code{PUSHA} instruction pushes 32 bytes at once, so it can fault 32
+bytes below the stack pointer.
+
+You will need to be able to obtain the current value of the user
+program's stack pointer. Within a system call or a page fault generated
+by a user program, you can retrieve it from @code{esp} member of the
+@struct{intr_frame} passed to @func{syscall_handler} or
+@func{page_fault}, respectively. If you verify user pointers before
+accessing them (@pxref{Accessing User Memory}), these are the only cases
+you need to handle. On the other hand, if you depend on page faults to
+detect invalid memory access, you will need to handle another case,
+where a page fault occurs in the kernel. Reading @code{esp} out of the
+@struct{intr_frame} passed to @func{page_fault} in that case will obtain
+the kernel stack pointer, not the user stack pointer. You will need to
+arrange another way, e.g.@: by saving @code{esp} into @struct{thread} on
+the initial transition from user to kernel mode.
+
+You may impose some absolute limit on stack size, as do most OSes.
+Some OSes make the limit user-adjustable, e.g.@: with the
+@command{ulimit} command on many Unix systems. On many GNU/Linux systems,
+the default limit is 8 MB.
+
+The first stack page need not be allocated lazily. You can initialize
+it with the command line arguments at load time, with no need to wait
+for it to be faulted in. (Even if you did wait, the very first
+instruction in the user program is likely to be one that faults in the
+page.)
+
+All stack pages should be candidates for eviction. An evicted stack
+page should be written to swap.
-int main (void)
-@{
- void *addr = (void *) 0x10000000;
- int fd = open ("foo");
- mapid_t map = mmap (fd, addr);
- if (map != -1)
- printf ("success!\n");
-@}
-@end example
+@node Memory Mapped Files
+@subsection Memory Mapped Files
-Suppose @file{foo} is a text file and you want to print the first 64
-bytes on the screen (assuming, of course, that the length of the file
-is at least 64). Without @code{mmap}, you'd need to allocate a
-buffer, use @code{read} to get the data from the file into the buffer,
-and finally use @code{write} to put the buffer out to the display. But
-with the file mapped into your address space, you can directly address
-it like so:
+Implement memory mapped files, including the following system calls.
-@example
-write (addr, 64, STDOUT_FILENO);
-@end example
+@deftypefn {System Call} mapid_t mmap (int @var{fd}, void *@var{addr})
+Maps the file open as @var{fd} into the process's virtual address
+space. The entire file is mapped into consecutive virtual pages
+starting at @var{addr}.
-Similarly, if you wanted to replace the first byte of the file,
-all you need to do is:
+Your VM system must lazily load pages in @code{mmap} regions and use the
+@code{mmap}'d file itself as backing store for the mapping. That is,
+evicting a page mapped by @code{mmap} writes it back to the file it was
+mapped from.
-@example
-addr[0] = 'b';
-@end example
+If the file's length is not a multiple of @code{PGSIZE}, then some
+bytes in the final mapped page ``stick out'' beyond the end of the
+file. Set these bytes to zero when the page is faulted in from disk,
+and discard them when the page is written back to disk.
-When you're done using the memory-mapped file, you simply unmap
-it:
+If successful, this function returns a ``mapping ID'' that
+uniquely identifies the mapping within the process. On failure,
+it must return -1, which otherwise should not be a valid mapping id,
+and the process's mappings must be unchanged.
-@example
-munmap (map);
-@end example
+A call to @code{mmap} may fail if the file open as @var{fd} has a
+length of zero bytes. It must fail if @var{addr} is not page-aligned
+or if the range of pages mapped overlaps any existing set of mapped
+pages, including the stack or pages mapped at executable load time.
+It must also fail if @var{addr} is 0, because some Pintos code assumes
+virtual page 0 is not mapped. Finally, file descriptors 0 and 1,
+representing console input and output, are not mappable.
+@end deftypefn
-@item
-@b{What if two processes map the same file into memory?}
+@deftypefn {System Call} void munmap (mapid_t @var{mapping})
+Unmaps the mapping designated by @var{mapping}, which must be a
+mapping ID returned by a previous call to @code{mmap} by the same
+process that has not yet been unmapped.
+@end deftypefn
-There is no requirement in Pintos that the two processes see
-consistent data. Unix handles this by making the two mappings share the
-same physical page, but the @code{mmap} system call also has an
-argument allowing the client to specify whether the page is shared or
+All mappings are implicitly unmapped when a process exits, whether via
+@code{exit} or by any other means. When a mapping is unmapped, whether
+implicitly or explicitly, all pages written to by the process are
+written back to the file, and pages not written must not be. The pages
+are then removed from the process's list of virtual pages.
+
+Closing or removing a file does not unmap any of its mappings. Once
+created, a mapping is valid until @code{munmap} is called or the process
+exits, following the Unix convention. @xref{Removing an Open File}, for
+more information.
+
+If two or more processes map the same file, there is no requirement that
+they see consistent data. Unix handles this by making the two mappings
+share the same physical page, but the @code{mmap} system call also has
+an argument allowing the client to specify whether the page is shared or
private (i.e.@: copy-on-write).
-@item
-@b{What happens if a user removes a @code{mmap}'d file?}
+@node Project 3 FAQ
+@section FAQ
-The mapping should remain valid, following the Unix convention.
-@xref{Removing an Open File}, for more information.
+@table @b
+@item How much code will I need to write?
+
+Here's a summary of our reference solution, produced by the
+@command{diffstat} program. The final row gives total lines inserted
+and deleted; a changed line counts as both an insertion and a deletion.
+
+This summary is relative to the Pintos base code, but the reference
+solution for project 3 starts from the reference solution to project 2.
+@xref{Project 2 FAQ}, for the summary of project 2.
+
+The reference solution represents just one possible solution. Many
+other solutions are also possible and many of those differ greatly from
+the reference solution. Some excellent solutions may not modify all the
+files modified by the reference solution, and some may modify files not
+modified by the reference solution.
+
+@verbatim
+ Makefile.build | 4
+ devices/timer.c | 42 ++
+ threads/init.c | 5
+ threads/interrupt.c | 2
+ threads/thread.c | 31 +
+ threads/thread.h | 37 +-
+ userprog/exception.c | 12
+ userprog/pagedir.c | 10
+ userprog/process.c | 319 +++++++++++++-----
+ userprog/syscall.c | 545 ++++++++++++++++++++++++++++++-
+ userprog/syscall.h | 1
+ vm/frame.c | 162 +++++++++
+ vm/frame.h | 23 +
+ vm/page.c | 297 ++++++++++++++++
+ vm/page.h | 50 ++
+ vm/swap.c | 85 ++++
+ vm/swap.h | 11
+ 17 files changed, 1532 insertions(+), 104 deletions(-)
+@end verbatim
+
+@item Do we need a working Project 2 to implement Project 3?
-@item
-@b{What if a process writes to a page that is memory-mapped, but the
-location written to in the memory-mapped page is past the end
-of the memory-mapped file?}
+Yes.
-Can't happen. @code{mmap} maps an entire file and Pintos provides no
-way to shorten a file. (Until project 4, there's no way to extend a
-file either.) You can remove a file, but the mapping remains valid
-(see the previous question).
+@item What extra credit is available?
+@anchor{VM Extra Credit}
-@item
-@b{Do we have to handle memory mapping @code{stdin} or @code{stdout}?}
+You may implement sharing: when multiple processes are created that use
+the same executable file, share read-only pages among those processes
+instead of creating separate copies of read-only segments for each
+process. If you carefully designed your data structures,
+sharing of read-only pages should not make this part significantly
+harder.
-No. Memory mapping implies that a file has a length and that a user
-can seek to any location in the file. Since the console device has
-neither of these properties, @code{mmap} should return false when the
-user attempts to memory map a file descriptor for the console device.
+@item Does the virtual memory system need to support data segment growth?
-@item
-@b{If a user closes a mapped file, should it be automatically
-unmapped?}
+No. The size of the data segment is determined by the linker. We still
+have no dynamic allocation in Pintos (although it is possible to
+``fake'' it at the user level by using memory-mapped files). Supporting
+data segment growth should add little additional complexity to a
+well-designed system.
-No. Once created the mapping is valid until @code{munmap} is called
-or the process exits.
-@end enumerate
+@item Why should I use @code{PAL_USER} for allocating page frames?
+@anchor{Why PAL_USER?}
+
+Passing @code{PAL_USER} to @func{palloc_get_page} causes it to allocate
+memory from the user pool, instead of the main kernel pool. Running out
+of pages in the user pool just causes user programs to page, but running
+out of pages in the kernel pool will cause many failures because so many
+kernel functions need to obtain memory.
+You can layer some other allocator on top of @func{palloc_get_page} if
+you like, but it should be the underlying mechanism.
+
+Also, you can use the @option{-ul} option to @command{pintos} to limit
+the size of the user pool, which makes it easy to test your VM
+implementation with various user memory sizes.
+@end table