pintos-os.org Git - pintos-anon/blob - doc/vm.texi

   1 @node Project 3--Virtual Memory, Project 4--File Systems, Project 2--User Programs, Top
   2 @chapter Project 3: Virtual Memory
   3
   4 By now you should be familiar with the inner workings of Pintos.
   5 You've already come a long way: your OS can properly handle multiple
   6 threads of execution with proper synchronization, and can load
   7 multiple user programs at once.  However, when loading user programs,
   8 your OS is limited by how much main memory the simulated machine has.
   9 In this assignment, you will remove that limitation.
  10
  11 You will be using the @file{vm} directory for this project.  There is
  12 no new code to get acquainted with for this assignment.  The @file{vm}
  13 directory contains only the @file{Makefile}s.  The only change from
  14 @file{userprog} is that this new @file{Makefile} turns on the setting
  15 @option{-DVM}.  All code you write will either be newly generated
  16 files (e.g.@: if you choose to implement your paging code in their own
  17 source files), or will be modifications to pre-existing code (e.g.@:
  18 you will change the behavior of @file{process.c} significantly).
  19
  20 You will be building this assignment on the last one.  It will benefit
  21 you to get your project 2 in good working order before this assignment
  22 so those bugs don't keep haunting you.
  23
  24 All the test programs from the previous project should also work with
  25 this project.  You should also write programs to test the new features
  26 introduced in this project.
  27
  28 Your submission should define @code{THREAD_JOIN_IMPLEMENTED} in
  29 @file{constants.h} (@pxref{Conditional Compilation}).
  30
  31 @menu
  32 * VM Design::
  33 * Page Faults::
  34 * Disk as Backing Store::
  35 * Memory Mapped Files::
  36 * Stack::
  37 * Problem 3-1 Page Table Management::
  38 * Problem 3-2 Paging To and From Disk::
  39 * Problem 3-3 Memory Mapped Files::
  40 * Virtual Memory FAQ::
  41 @end menu
  42
  43 @node VM Design
  44 @section A Word about Design
  45
  46 It is important for you to note that in addition to getting virtual
  47 memory working, this assignment is also meant to be an open-ended
  48 design problem.  We will expect you to come up with a design that
  49 makes sense.  You will have the freedom to choose how to handle page
  50 faults, how to organize the swap disk, how to implement paging, etc.
  51 In each case, we will expect you to provide a defensible justification
  52 in your design documentation as to why your choices are reasonable.
  53 You should evaluate your design on all the available criteria: speed
  54 of handling a page fault, space overhead in memory, minimizing the
  55 number of page faults, simplicity, etc.
  56
  57 In keeping with this, you will find that we are going to say as little
  58 as possible about how to do things.  Instead we will focus on what end
  59 functionality we require your OS to support.
  60
  61 @node Page Faults
  62 @section Page Faults
  63
  64 For the last assignment, whenever a context switch occurred, the new
  65 process would install its own page table into the machine.  The page
  66 table contained all the virtual-to-physical translations for the
  67 process.  Whenever the processor needed to look up a translation, it
  68 consulted the page table.  As long as the process only accessed
  69 memory that it didn't own, all was well.  If the process accessed
  70 memory it didn't own, it ``page faulted'' and @code{page_fault()}
  71 terminated the process.
  72
  73 When we implement virtual memory, the rules have to change.  A page
  74 fault is no longer necessarily an error, since it might only indicate
  75 that the page must be brought in from a disk file or from swap.  You
  76 will have to implement a more sophisticated page fault handler to
  77 handle these cases.
  78
  79 On the 80@var{x}86, the page table format is fixed by hardware.  The
  80 top-level data structure is a 4 kB page called the ``page directory''
  81 (PD) arranged as an array of 1,024 32-bit page directory entries
  82 (PDEs), each of which represents 4 MB of virtual memory.  Each PDE may
  83 point to the physical address of another 4 kB page called a ``page
  84 table'' (PT) arranged in the same fashion as an array of 1,024 32-bit
  85 page table entries (PTEs), each of which translates a single 4 kB
  86 virtual page into physical memory.
  87
  88 Thus, translation of a virtual address into a physical address follows
  89 the three-step process illustrated in the diagram
  90 below:@footnote{Actually, virtual to physical translation on the
  91 80@var{x}86 architecture happens via an intermediate ``linear
  92 address,'' but Pintos (and most other 80@var{x}86 OSes) set up the CPU
  93 so that linear and virtual addresses are one and the same, so that you
  94 can effectively ignore this CPU feature.}
  95
  96 @enumerate 1
  97 @item
  98 The top 10 bits of the virtual address (bits 22:31) are used to index
  99 into the page directory.  If the PDE is marked ``present,'' the
 100 physical address of a page table is read from the PDE thus obtained.
 101 If the PDE is marked ``not present'' then a page fault occurs.
 102
 103 @item
 104 The next 10 bits of the virtual address (bits 12:21) are used to index
 105 into the page table.  If the PTE is marked ``present,'' the physical
 106 address of a data page is read from the PTE thus obtained.  If the PTE
 107 is marked ``not present'' then a page fault occurs.
 108
 109
 110 @item
 111 The bottom 12 bits of the virtual address (bits 0:11) are added to the
 112 data page's physical base address, producing the final physical
 113 address.
 114 @end enumerate
 115
 116 @example
 117 32                    22                     12                      0
 118 +--------------------------------------------------------------------+
 119 | Page Directory Index |   Page Table Index   |    Page Offset       |
 120 +--------------------------------------------------------------------+
 121              |                    |                     |
 122      _______/             _______/                _____/
 123     /                    /                       /
 124    /    Page Directory  /      Page Table       /    Data Page
 125   /     .____________. /     .____________.    /   .____________.
 126   |1,023|____________| |1,023|____________|    |   |____________|
 127   |1,022|____________| |1,022|____________|    |   |____________|
 128   |1,021|____________| |1,021|____________|    \__\|____________|
 129   |1,020|____________| |1,020|____________|       /|____________|
 130   |     |            | |     |            |        |            |
 131   |     |            | \____\|            |_       |            |
 132   |     |      .     |      /|      .     | \      |      .     |
 133   \____\|      .     |_      |      .     |  |     |      .     |
 134        /|      .     | \     |      .     |  |     |      .     |
 135         |      .     |  |    |      .     |  |     |      .     |
 136         |            |  |    |            |  |     |            |
 137         |____________|  |    |____________|  |     |____________|
 138        4|____________|  |   4|____________|  |     |____________|
 139        3|____________|  |   3|____________|  |     |____________|
 140        2|____________|  |   2|____________|  |     |____________|
 141        1|____________|  |   1|____________|  |     |____________|
 142        0|____________|  \__\0|____________|  \____\|____________|
 143                            /                      /
 144 @end example
 145
 146 Header @file{threads/mmu.h} has useful functions for various
 147 operations on virtual addresses.  You should look over the header
 148 yourself, but its most important functions include these:
 149
 150 @table @code
 151 @item pd_no(@var{va})
 152 Returns the page directory index in virtual address @var{va}.
 153
 154 @item pt_no(@var{va})
 155 Returns the page table index in virtual address @var{va}.
 156
 157 @item pg_ofs(@var{va})
 158 Returns the page offset in virtual address @var{va}.
 159
 160 @item pg_round_down(@var{va})
 161 Returns @var{va} rounded down to the nearest page boundary, that is,
 162 @var{va} but with its page offset set to 0.
 163
 164 @item pg_round_up(@var{va})
 165 Returns @var{va} rounded up to the nearest page boundary.
 166 @end table
 167
 168 @node Disk as Backing Store
 169 @section Disk as Backing Store
 170
 171 In VM systems, since memory is less plentiful than disk, you will
 172 effectively use memory as a cache for disk.  Looking at it from
 173 another angle, you will use disk as a backing store for memory.  This
 174 provides the abstraction of an (almost) unlimited virtual memory size.
 175 Part of your task in this project is to do this, with the additional
 176 constraint that your performance should be close to that provided by
 177 physical memory.  You will use the page tables' ``dirty'' bits to
 178 denote whether pages need to be written back to disk when they're
 179 evicted from main memory and the ``accessed'' bit for page replacement
 180 algorithms.  Whenever the hardware writes memory, it sets the dirty
 181 bit, and if it reads or writes to the page, it sets the accessed bit.
 182
 183 As with any caching system, performance depends on the policy used to
 184 decide which things are kept in memory and which are only stored on
 185 disk.  On a page fault, the kernel must decide which page to replace.
 186 Ideally, it will throw out a page that will not be referenced for a
 187 long time, keeping in memory those pages that are soon to be
 188 referenced.  Another consideration is that if the replaced page has
 189 been modified, the page must be first saved to disk before the needed
 190 page can be brought in.  Many virtual memory systems avoid this extra
 191 overhead by writing modified pages to disk in advance, so that later
 192 page faults can be completed more quickly.
 193
 194 @node Memory Mapped Files
 195 @section Memory Mapped Files
 196
 197 The traditional way to access the file system is via @code{read} and
 198 @code{write} system calls, but that requires an extra level of copying
 199 between the kernel and the user level.  A secondary interface is
 200 simply to ``map'' the file into the virtual address space.  The
 201 program can then use load and store instructions directly on the file
 202 data.  (An alternative way of viewing the file system is as ``durable
 203 memory.''  Files just store data structures.  If you access data
 204 structures in memory using load and store instructions, why not access
 205 data structures in files the same way?)
 206
 207 Memory mapped files are typically implemented using system calls.  One
 208 system call maps the file to a particular part of the address space.
 209 For example, one might map the file @file{foo}, which is 1000 bytes
 210 long, starting at address 5000.  Assuming that nothing else is already
 211 at virtual addresses 5000@dots{}6000, any memory accesses to these
 212 locations will access the corresponding bytes of @file{foo}.
 213
 214 A consequence of memory mapped files is that address spaces are
 215 sparsely populated with lots of segments, one for each memory mapped
 216 file (plus one each for code, data, and stack).  You will implement
 217 memory mapped files for problem 3 of this assignment, but you should
 218 design your solutions to problems 1 and 2 to account for this.
 219
 220 @node Stack
 221 @section Stack
 222
 223 In project 2, the stack was a single page at the top of the user
 224 virtual address space.  The stack's location does not change in this
 225 project, but your kernel should allocate additional pages to the stack
 226 on demand.  That is, if the stack grows past its current bottom, the
 227 system should allocate additional pages for the stack as necessary,
 228 unless those pages are unavailable because they are in use by another
 229 segment, in which case some sort of fault should occur.
 230
 231 @node Problem 3-1 Page Table Management
 232 @section Problem 3-1: Page Table Management
 233
 234 Implement page directory and page table management to support virtual
 235 memory.  You will need data structures to accomplish the following
 236 tasks:
 237
 238 @itemize @bullet
 239 @item
 240 Some way of translating in software from virtual page frames to
 241 physical page frames (consider using a hash table---note
 242 that we provide one in @file{lib/kernel}).
 243
 244 @item
 245 Some way of translating from physical page frames back to virtual
 246 page frames, so that when you replace a page, you can invalidate
 247 its translation(s).
 248
 249 @item
 250 Some way of finding a page on disk if it is not in memory.  You won't
 251 need this data structure until part 2, but planning ahead is a good
 252 idea.
 253 @end itemize
 254
 255 You need to do the roughly the following to handle a page fault:
 256
 257 @enumerate 1
 258 @item
 259 Determine the location of the physical page backing the virtual
 260 address that faulted.  It might be in the file system, in swap,
 261 already be in physical memory and just not set up in the page table,
 262 or it might be an invalid virtual address.
 263
 264 If the virtual address is invalid, that is, if there's no physical
 265 page backing it, or if the virtual address is above @code{PHYS_BASE},
 266 meaning that it belongs to the kernel instead of the user, then the
 267 process's memory access must be disallowed.  You should terminate the
 268 process at this point, being sure to free all of its resources.
 269
 270 @item
 271 If the physical page is not in physical memory, bring it into memory.
 272 If necessary to make room, first evict some other page from memory.
 273 (When you do that you need to first remove references to the page from
 274 any page table that refers to it.)
 275
 276 @item
 277 Each user process's @code{struct thread} has a @samp{pagedir} member
 278 that points to its own per-process page directory.  Read the PDE for
 279 the faulting virtual address.
 280
 281 @item
 282 If the PDE is marked ``not present'' then allocate a new page table
 283 page and initialize the PDE to point to the new page table.  As when
 284 you allocated a data page, you might have to first evict some other
 285 page from memory.
 286
 287 @item
 288 Follow the PDE to the page table.  Point the PTE for the faulting
 289 virtual address to the physical page found in step 2.
 290 @end enumerate
 291
 292 You'll need to modify the ELF loader in @file{userprog/process.c} to
 293 do page table management according to your new design.  As supplied,
 294 it reads all the process's pages from disk and initializes the page
 295 tables for them at the same time.  For testing purposes, you'll
 296 probably want to leave the code that reads the pages from disk, but
 297 use your new page table management code to construct the page tables
 298 only as page faults occur for them.
 299
 300 There are many possible ways to implement virtual memory.  The above
 301 is simply an outline of our suggested implementation.  You may choose
 302 any implementation you like, as long as it accomplishes the goal.
 303
 304 @node Problem 3-2 Paging To and From Disk
 305 @section Problem 3-2: Paging To and From Disk
 306
 307 Implement paging to and from files and the swap disk.  You may use the
 308 disk on interface @code{hd1:1} as the swap disk.
 309
 310 You will need routines to move a page from memory to disk and from
 311 disk to memory, where ``disk'' is either a file or the swap disk.  If
 312 you do everything correctly, your VM should still work when you
 313 implement your own file system for the next assignment.
 314
 315 You will need a way to track pages which are used by a process but
 316 which are not in physical memory, to fully handle page faults.  Pages
 317 that you write to swap should not be constrained to be in sequential
 318 order.  You will also need a way to track all of the physical memory
 319 pages, in order to find an unused one when needed, or to evict a page
 320 when memory is needed but no empty pages are available.  The data
 321 structures that you designed in part 1 should do most of the work for
 322 you.
 323
 324 You will need a page replacement algorithm.  The hardware sets the
 325 accessed and dirty bits when it accesses memory.  Therefore, you
 326 should be able to take advantage of this information to implement some
 327 algorithm which attempts to achieve LRU-type behavior.  We expect that
 328 your algorithm perform at least as well as a reasonable implementation
 329 of the second-chance (clock) algorithm.  You will need to show in your
 330 test cases the value of your page replacement algorithm by
 331 demonstrating for some workload that it pages less frequently using
 332 your algorithm than using some inferior page replacement policy.  The
 333 canonical example of a poor page replacement policy is random
 334 replacement.
 335
 336 Since you will already be paging from disk, you should implement a
 337 ``lazy'' loading scheme for new processes.  When a process is created,
 338 it will not run immediately.  Therefore, it doesn't make sense to load
 339 all its code, data, and stack into memory when the process is created,
 340 since it might incur additional disk accesses to do so (if it gets
 341 paged out before it runs).  When loading a new process, you should
 342 leave most pages on disk, and bring them in as demanded when the
 343 program begins running.  Your VM system should also use the executable
 344 file itself as backing store for read-only segments, since these
 345 segments won't change.
 346
 347 There are a few special cases.  Look at the loop in
 348 @code{load_segment()} in @file{userprog/process.c}.  Each time
 349 around the loop, @code{read_bytes} represents the number of bytes to
 350 read from the executable file and @code{zero_bytes} represents the number
 351 of bytes to initialize to zero following the bytes read.  The two
 352 always sum to @code{PGSIZE}.  The page handling depends on these
 353 variables' values:
 354
 355 @itemize @bullet
 356 @item
 357 If @code{read_bytes} equals @code{PGSIZE}, the page should be demand
 358 paged from disk on its first access.
 359
 360 @item
 361 If @code{zero_bytes} equals @code{PGSIZE}, the page does not need to
 362 be read from disk at all because it is all zeroes.  You should handle
 363 such pages by creating a new page consisting of all zeroes at the
 364 first page fault.
 365
 366 @item
 367 If neither @code{read_bytes} nor @code{zero_bytes} equals
 368 @code{PGSIZE}, then part of the page is to be read from disk and the
 369 remainder zeroed.  This is a special case.  You may handle it by
 370 reading the partial page from disk at executable load time and zeroing
 371 the rest of the page.  This is the only case in which we will allow
 372 you to load a page in a non-``lazy'' fashion.  Many real OSes such as
 373 Linux do not load partial pages lazily.
 374 @end itemize
 375
 376 Incidentally, if you have trouble handling the third case above, you
 377 can eliminate it temporarily by linking the test programs with a
 378 special ``linker script.''  Read @file{tests/userprog/Makefile} for
 379 details.  We will not test your submission with this special linker
 380 script, so the code you turn in must properly handle all cases.
 381
 382 You may optionally implement sharing: when multiple processes are
 383 created that use the same executable file, share read-only pages among
 384 those processes instead of creating separate copies of read-only
 385 segments for each process.  If you carefully designed your data
 386 structures in part 1, sharing of read-only pages should not make this
 387 part significantly harder.
 388
 389 @node Problem 3-3 Memory Mapped Files
 390 @section Problem 3-3: Memory Mapped Files
 391
 392 Implement memory mapped files.
 393
 394 You will need to implement the following system calls:
 395
 396 @table @asis
 397 @item SYS_mmap
 398 @itemx bool mmap (int @var{fd}, void *@var{addr}, unsigned @var{length})
 399
 400 Maps the file open as @var{fd} into the process's address space
 401 starting at @var{addr} for @var{length} bytes.  Returns true if
 402 successful, false on failure.
 403
 404 @item SYS_munmap
 405 @itemx bool munmap (void *addr, unsigned length)
 406
 407 Unmaps the segment specified by id.  This cannot be used to unmap
 408 segments mapped by the executable loader.  Returns 0 on success, -1 on
 409 failure.  When a file is unmapped, all outstanding changes are written
 410 to the file, and the segment's pages are removed from the process's
 411 list of used virtual pages.
 412 @end table
 413
 414 Calls to @code{mmap} must fail if the address is not page-aligned, if
 415 the length is not positive and a multiple of @var{PGSIZE}.  You also
 416 must error check to make sure that the new segment does not overlap
 417 already existing segments, and fail if it isn't.  If the length passed
 418 to @code{mmap} is less than the file's length, you should only map the
 419 first part of the file.  If the length passed to @code{mmap} is longer
 420 than the file, the file should grow to the requested length.  Similar
 421 to the code segment, your VM system should be able to use the
 422 @code{mmap}'d file itself as backing store for the mmap segment, since
 423 the changes to the @code{mmap} segment will eventually be written to
 424 the file.  (In fact, you may choose to implement executable mappings
 425 as a special case of file mappings.)
 426
 427 @node Virtual Memory FAQ
 428 @section FAQ
 429
 430 @enumerate 1
 431 @item
 432 @b{Do we need a working HW 2 to implement HW 3?}
 433
 434 Yes.
 435
 436 @item
 437 @b{How do I use the hash table provided in @file{lib/kernel/hash.c}?}
 438
 439 First, you need to embed a @code{hash_elem} object as a member of the
 440 object that the hash table will contain.  Each @code{hash_elem} allows
 441 the object to a member of at most one hash table at a given time.  All
 442 the hash table functions that deal with hash table items actually use
 443 the address of a @code{hash_elem}.  You can convert a pointer to a
 444 @code{hash_elem} member into a pointer to the structure in which
 445 member is embedded using the @code{hash_entry} macro.
 446
 447 Second, you need to decide on a key type.  The key should be something
 448 that is unique for each object, because a given hash table may not
 449 contain two objects with equal keys.  Then you need to write two
 450 functions.  The first is a @dfn{hash function} that converts a key
 451 into an integer.  Some sample hash functions that you can use or just
 452 examine are given in @file{lib/kernel/hash.c}.  The second function
 453 needed is a @dfn{comparison function} that compares a pair and returns
 454 true if the first is less than the second.  These two functions have
 455 to be compatible with the prototypes for @code{hash_hash_func} and
 456 @code{hash_less_func} in @file{lib/kernel/hash.h}.
 457
 458 Here's a quick example.  Suppose you want to put @code{struct thread}s
 459 in a hash table.  First, add a @code{hash_elem} to the thread
 460 structure by adding a line to its definition:
 461
 462 @example
 463 hash_elem h_elem;               /* Hash table element. */
 464 @end example
 465
 466 We'll choose the @code{tid} member in @code{struct thread} as the key,
 467 and write a hash function and a comparison function:
 468
 469 @example
 470 /* Returns a hash for E. */
 471 unsigned
 472 thread_hash (const hash_elem *e, void *aux UNUSED)
 473 @{
 474   struct thread *t = hash_entry (e, struct thread, h_elem);
 475   return hash_int (t->tid);
 476 @}
 477
 478 /* Returns true if A's tid is less than B's tid. */
 479 bool
 480 thread_less (const hash_elem *a_, const hash_elem *b_, void *aux UNUSED)
 481 @{
 482   struct thread *a = hash_entry (a_, struct thread, h_elem);
 483   struct thread *b = hash_entry (b_, struct thread, h_elem);
 484   return a->tid < b->tid;
 485 @}
 486 @end example
 487
 488 Then we can create a hash table like this:
 489
 490 @example
 491 struct hash threads;
 492
 493 hash_init (&threads, thread_hash, thread_less, NULL);
 494 @end example
 495
 496 Finally, if @code{@var{t}} is a pointer to a @code{struct thread},
 497 then we can insert it into the hash table with:
 498
 499 @example
 500 hash_insert (&threads, &@var{t}->h_elem);
 501 @end example
 502
 503 If you have any other questions about hash tables, the CS109
 504 and CS161 textbooks have good chapters on them, or you can come
 505 to any of the TA's office hours for further clarification.
 506
 507 @item
 508 @b{The current implementation of the hash table does not do something
 509 that we need it to do. What gives?}
 510
 511 You are welcome to modify it.  It is not used by any of the code we
 512 provided, so modifying it won't affect any code but yours.  Do
 513 whatever it takes to make it work the way you want.
 514
 515 @item
 516 @b{What controls the layout of user programs?}
 517
 518 The linker is responsible for the layout of a user program in
 519 memory. The linker is directed by a ``linker script'' which tells it
 520 the names and locations of the various program segments. The
 521 test/script and testvm/script files are the linker scripts for the
 522 multiprogramming and virtual memory assignments respectively. You can
 523 learn more about linker scripts by reading the ``Scripts'' chapter in
 524 the linker manual, accessible via @samp{info ld}.
 525
 526 @item Page Table Management FAQs
 527 @enumerate 1
 528 @item
 529 @b{Do page tables need to created lazily?}
 530
 531 No.  You can create the page tables at load time (or @code{mmap} time)
 532 if you like.
 533
 534 @item
 535 @b{Our code handles the PageFault exceptions. However, the number of
 536 page faults handled does not show up in the final stats output. Is
 537 there a counter that we must increment to correct this problem?}
 538
 539 FIXME
 540
 541 Yes, you'll need to update kernel->stats->numPageFaults when
 542 you handle a page fault in your code.
 543 @end enumerate
 544
 545 @item Paging FAQs
 546
 547 @enumerate 1
 548 @item
 549 @b{Does the virtual memory system need to support growth of the stack
 550 segment?}
 551
 552 Yes. If a page fault appears just below the last stack segment page,
 553 you must add a new page to the bottom of the stack. It is impossible
 554 to predict how large the stack will grow at compile time, so we must
 555 allocate pages as necessary. You should only allocate additional pages
 556 if they ``appear'' to be stack accesses.
 557
 558 @item
 559 @b{Does the first stack page need to be loaded lazily?}
 560
 561 No, you can initialize the first stack page with the command line at
 562 load time.  There's no need to wait for it to be faulted in.  Even if
 563 you did wait, the very first instruction in the user program is likely
 564 to be one that faults in the page.
 565
 566 @item
 567 @b{Does the virtual memory system need to support growth of the data
 568 segment?}
 569
 570 No.  The size of the data segment is determined by the linker.  We
 571 still have no dynamic allocation in Pintos (although it is possible to
 572 ``fake'' it at the user level by using memory-mapped files).
 573 Implementing @code{sbrk()} has been an extra-credit assignment in
 574 previous years, but adds little additional complexity to a
 575 well-designed system.
 576
 577 @item
 578 @b{But what do you mean by ``appear'' to be stack accesses? How big can a
 579 stack growth be?  Under what circumstances do we grow the stack?}
 580
 581 If it looks like a stack request, then you grow the stack. Yes, that's
 582 ambiguous. You need to make a reasonable decision about what looks
 583 like a stack request. For example, you could decide a page, or two
 584 pages, or ten pages, or more@enddots{}  Or, you could use some other
 585 heuristic to figure this out.
 586
 587 Make a reasonable decision and document it in your code and in
 588 your design document.  Please make sure to justify your decision.
 589
 590 @item
 591 @b{How big should the file(s) we're using as a backing store for memory
 592 be?}
 593
 594 These files will need to be able to grow based on the number of pages
 595 you're committed to storing on behalf of the processes currently in
 596 memory.  They should be able to grow to the full size of the disk.
 597 @end enumerate
 598
 599 @item Memory Mapped File FAQs
 600
 601 @enumerate 1
 602 @item
 603 @b{How do we interact with memory-mapped files?}
 604
 605 Let's say you want to map a file called @file{foo} into your address
 606 space at address @t{0x10000000}. You open the file, determine its
 607 length, and then use Mmap:
 608
 609 @example
 610 #include <stdio.h>
 611 #include <syscall.h>
 612
 613 int main (void)
 614 @{
 615     void *addr = (void *) 0x10000000;
 616     int fd = open ("foo");
 617     int length = filesize (fd);
 618     if (mmap (fd, addr, length))
 619         printf ("success!\n");
 620 @}
 621 @end example
 622
 623 Suppose @file{foo} is a text file and you want to print the first 64
 624 bytes on the screen (assuming, of course, that the length of the file
 625 is at least 64).  Without @code{mmap}, you'd need to allocate a
 626 buffer, use @code{read} to get the data from the file into the buffer,
 627 and finally use @code{write} to put the buffer out to the display. But
 628 with the file mapped into your address space, you can directly address
 629 it like so:
 630
 631 @example
 632 write (addr, 64, STDOUT_FILENO);
 633 @end example
 634
 635 Similarly, if you wanted to replace the first byte of the file,
 636 all you need to do is:
 637
 638 @example
 639 addr[0] = 'b';
 640 @end example
 641
 642 When you're done using the memory-mapped file, you simply unmap
 643 it:
 644
 645 @example
 646 munmap (addr);
 647 @end example
 648
 649 @item
 650 @b{What if two processes memory-map the same file?}
 651
 652 There is no requirement in Pintos that the two processes see
 653 consistent data.  Unix handles this by making the processes share the
 654 same physical page, but the @code{mmap} system call also has an
 655 argument allowing the client to specify whether the page is shared or
 656 private (i.e.@: copy-on-write).
 657
 658 @item
 659 @b{What happens if a user removes a @code{mmap}'d file?}
 660
 661 @item
 662 You should follow the Unix convention and the mapping should still be
 663 valid.  This is similar to the question in the User Programs FAQ about
 664 a process with a file descriptor to a file that has been removed.
 665
 666 @item
 667 @b{What if a process writes to a page that is memory-mapped, but the
 668 location written to in the memory-mapped page is past the end
 669 of the memory-mapped file?}
 670
 671 Can't happen.  @code{mmap} extends the file to the requested length,
 672 and Pintos provides no way to shorten a file.  You can remove a file,
 673 but the mapping remains valid (see the previous question).
 674
 675 @item
 676 @b{Do we have to handle memory mapping @code{stdin} or @code{stdout}?}
 677
 678 No.  Memory mapping implies that a file has a length and that a user
 679 can seek to any location in the file.  Since the console device has
 680 neither of these properties, @code{mmap} should return false when the
 681 user attempts to memory map a file descriptor for the console device.
 682
 683 @item
 684 @b{What happens when a process exits with mmap'd files?}
 685
 686 When a process finishes each of its @code{mmap}'d files is implicitly
 687 unmapped.  When a process @code{mmap}s a file and then writes into the
 688 area for the file it is making the assumption the changes will be
 689 written to the file.
 690
 691 @item
 692 @b{If a user closes a mmaped file, should be automatically unmap it
 693 for them?}
 694
 695 No, once created the mapping is valid until @code{munmap} is called
 696 or the process exits.
 697 @end enumerate
 698 @end enumerate