pintos-os.org Git - pintos-anon/blob - doc/vm.texi

   1 @node Project 3--Virtual Memory, Project 4--File Systems, Project 2--User Programs, Top
   2 @chapter Project 3: Virtual Memory
   3
   4 By now you should be familiar with the inner workings of Pintos.
   5 You've already come a long way: your OS can properly handle multiple
   6 threads of execution with proper synchronization, and can load
   7 multiple user programs at once.  However, when loading user programs,
   8 your OS is limited by how much main memory the simulated machine has.
   9 In this assignment, you will remove that limitation.
  10
  11 You will be using the @file{vm} directory for this project.  The
  12 @file{vm} directory contains only the @file{Makefile}s.  The only
  13 change from @file{userprog} is that this new @file{Makefile} turns on
  14 the setting @option{-DVM}.  All code you write will either be newly
  15 generated files (e.g.@: if you choose to implement your paging code in
  16 their own source files), or will be modifications to pre-existing code
  17 (e.g.@: you will change the behavior of @file{process.c}
  18 significantly).
  19
  20 There are only a couple of source files you will probably be
  21 encountering for the first time:
  22
  23 @table @file
  24 @item devices/disk.h
  25 @itemx devices/disk.c
  26 Provides access to the physical disk, abstracting away the rather
  27 awful IDE interface.
  28 @end table
  29
  30 You will be building this assignment on the last one.  It will benefit
  31 you to get your project 2 in good working order before this assignment
  32 so those bugs don't keep haunting you.
  33
  34 All the test programs from the previous project should also work with
  35 this project.  You should also write programs to test the new features
  36 introduced in this project.
  37
  38 Your submission should define @code{THREAD_JOIN_IMPLEMENTED} in
  39 @file{constants.h} (@pxref{Conditional Compilation}).
  40
  41 @menu
  42 * VM Design::
  43 * Page Faults::
  44 * Disk as Backing Store::
  45 * Memory Mapped Files::
  46 * Stack::
  47 * Problem 3-1 Page Table Management::
  48 * Problem 3-2 Paging To and From Disk::
  49 * Problem 3-3 Memory Mapped Files::
  50 * Virtual Memory FAQ::
  51 @end menu
  52
  53 @node VM Design
  54 @section A Word about Design
  55
  56 It is important for you to note that in addition to getting virtual
  57 memory working, this assignment is also meant to be an open-ended
  58 design problem.  We will expect you to come up with a design that
  59 makes sense.  You will have the freedom to choose how to handle page
  60 faults, how to organize the swap disk, how to implement paging, etc.
  61 In each case, we will expect you to provide a defensible justification
  62 in your design documentation as to why your choices are reasonable.
  63 You should evaluate your design on all the available criteria: speed
  64 of handling a page fault, space overhead in memory, minimizing the
  65 number of page faults, simplicity, etc.
  66
  67 In keeping with this, you will find that we are going to say as little
  68 as possible about how to do things.  Instead we will focus on what end
  69 functionality we require your OS to support.
  70
  71 @node Page Faults
  72 @section Page Faults
  73
  74 For the last assignment, whenever a context switch occurred, the new
  75 process would install its own page table into the machine.  The page
  76 table contained all the virtual-to-physical translations for the
  77 process.  Whenever the processor needed to look up a translation, it
  78 consulted the page table.  As long as the process only accessed
  79 memory that it didn't own, all was well.  If the process accessed
  80 memory it didn't own, it ``page faulted'' and @func{page_fault}
  81 terminated the process.
  82
  83 When we implement virtual memory, the rules have to change.  A page
  84 fault is no longer necessarily an error, since it might only indicate
  85 that the page must be brought in from a disk file or from swap.  You
  86 will have to implement a more sophisticated page fault handler to
  87 handle these cases.
  88
  89 On the 80@var{x}86, the page table format is fixed by hardware.  We
  90 have provided code for managing page tables for you to use in
  91 @file{userprog/pagedir.c}.  The functions in there should provide an
  92 abstract interface to all the page table functionality that you need
  93 to complete the project.  However, you may still find it worthwhile to
  94 understand a little about the hardware page table format, so we'll go
  95 into a little of detail about that in this section.
  96
  97 The top-level paging data structure is a 4 kB page called the ``page
  98 directory'' (PD) arranged as an array of 1,024 32-bit page directory
  99 entries (PDEs), each of which represents 4 MB of virtual memory.  Each
 100 PDE may point to the physical address of another 4 kB page called a
 101 ``page table'' (PT) arranged in the same fashion as an array of 1,024
 102 32-bit page table entries (PTEs), each of which translates a single 4
 103 kB virtual page into physical memory.
 104
 105 Thus, translation of a virtual address into a physical address follows
 106 the three-step process illustrated in the diagram
 107 below:@footnote{Actually, virtual to physical translation on the
 108 80@var{x}86 architecture happens via an intermediate ``linear
 109 address,'' but Pintos (and most other 80@var{x}86 OSes) set up the CPU
 110 so that linear and virtual addresses are one and the same, so that you
 111 can effectively ignore this CPU feature.}
 112
 113 @enumerate 1
 114 @item
 115 The top 10 bits of the virtual address (bits 22:32) are used to index
 116 into the page directory.  If the PDE is marked ``present,'' the
 117 physical address of a page table is read from the PDE thus obtained.
 118 If the PDE is marked ``not present'' then a page fault occurs.
 119
 120 @item
 121 The next 10 bits of the virtual address (bits 12:22) are used to index
 122 into the page table.  If the PTE is marked ``present,'' the physical
 123 address of a data page is read from the PTE thus obtained.  If the PTE
 124 is marked ``not present'' then a page fault occurs.
 125
 126
 127 @item
 128 The bottom 12 bits of the virtual address (bits 0:12) are added to the
 129 data page's physical base address, producing the final physical
 130 address.
 131 @end enumerate
 132
 133 @example
 134 @group
 135 32                    22                     12                      0
 136 +--------------------------------------------------------------------+
 137 | Page Directory Index |   Page Table Index   |    Page Offset       |
 138 +--------------------------------------------------------------------+
 139              |                    |                     |
 140      _______/             _______/                _____/
 141     /                    /                       /
 142    /    Page Directory  /      Page Table       /    Data Page
 143   /     .____________. /     .____________.    /   .____________.
 144   |1,023|____________| |1,023|____________|    |   |____________|
 145   |1,022|____________| |1,022|____________|    |   |____________|
 146   |1,021|____________| |1,021|____________|    \__\|____________|
 147   |1,020|____________| |1,020|____________|       /|____________|
 148   |     |            | |     |            |        |            |
 149   |     |            | \____\|            |_       |            |
 150   |     |      .     |      /|      .     | \      |      .     |
 151   \____\|      .     |_      |      .     |  |     |      .     |
 152        /|      .     | \     |      .     |  |     |      .     |
 153         |      .     |  |    |      .     |  |     |      .     |
 154         |            |  |    |            |  |     |            |
 155         |____________|  |    |____________|  |     |____________|
 156        4|____________|  |   4|____________|  |     |____________|
 157        3|____________|  |   3|____________|  |     |____________|
 158        2|____________|  |   2|____________|  |     |____________|
 159        1|____________|  |   1|____________|  |     |____________|
 160        0|____________|  \__\0|____________|  \____\|____________|
 161                            /                      /
 162 @end group
 163 @end example
 164
 165 Header @file{threads/mmu.h} has useful functions for various
 166 operations on virtual addresses.  You should look over the header
 167 yourself, but its most important functions include these:
 168
 169 @table @code
 170 @item pd_no(@var{va})
 171 Returns the page directory index in virtual address @var{va}.
 172
 173 @item pt_no(@var{va})
 174 Returns the page table index in virtual address @var{va}.
 175
 176 @item pg_ofs(@var{va})
 177 Returns the page offset in virtual address @var{va}.
 178
 179 @item pg_round_down(@var{va})
 180 Returns @var{va} rounded down to the nearest page boundary, that is,
 181 @var{va} but with its page offset set to 0.
 182
 183 @item pg_round_up(@var{va})
 184 Returns @var{va} rounded up to the nearest page boundary.
 185 @end table
 186
 187 @node Disk as Backing Store
 188 @section Disk as Backing Store
 189
 190 In VM systems, since memory is less plentiful than disk, you will
 191 effectively use memory as a cache for disk.  Looking at it from
 192 another angle, you will use disk as a backing store for memory.  This
 193 provides the abstraction of an (almost) unlimited virtual memory size.
 194 Part of your task in this project is to do this, with the additional
 195 constraint that your performance should be close to that provided by
 196 physical memory.  You will use the page tables' ``dirty'' bits to
 197 denote whether pages need to be written back to disk when they're
 198 evicted from main memory and the ``accessed'' bit for page replacement
 199 algorithms.  Whenever the hardware writes memory, it sets the dirty
 200 bit, and if it reads or writes to the page, it sets the accessed bit.
 201
 202 As with any caching system, performance depends on the policy used to
 203 decide which things are kept in memory and which are only stored on
 204 disk.  On a page fault, the kernel must decide which page to replace.
 205 Ideally, it will throw out a page that will not be referenced for a
 206 long time, keeping in memory those pages that are soon to be
 207 referenced.  Another consideration is that if the replaced page has
 208 been modified, the page must be first saved to disk before the needed
 209 page can be brought in.  Many virtual memory systems avoid this extra
 210 overhead by writing modified pages to disk in advance, so that later
 211 page faults can be completed more quickly.
 212
 213 @node Memory Mapped Files
 214 @section Memory Mapped Files
 215
 216 The traditional way to access the file system is via @code{read} and
 217 @code{write} system calls, but that requires an extra level of copying
 218 between the kernel and the user level.  A secondary interface is
 219 simply to ``map'' the file into the virtual address space.  The
 220 program can then use load and store instructions directly on the file
 221 data.  (An alternative way of viewing the file system is as ``durable
 222 memory.''  Files just store data structures.  If you access data
 223 structures in memory using load and store instructions, why not access
 224 data structures in files the same way?)
 225
 226 Memory mapped files are typically implemented using system calls.  One
 227 system call maps the file to a particular part of the address space.
 228 For example, one might conceptually map the file @file{foo}, which is
 229 1000 bytes
 230 long, starting at address 5000.  Assuming that nothing else is already
 231 at virtual addresses 5000@dots{}6000, any memory accesses to these
 232 locations will access the corresponding bytes of @file{foo}.
 233
 234 A consequence of memory mapped files is that address spaces are
 235 sparsely populated with lots of segments, one for each memory mapped
 236 file (plus one each for code, data, and stack).  You will implement
 237 memory mapped files in problem 3-3.  You should
 238 design your solutions to problems 3-1 and 3-2 to anticipate this.
 239
 240 @node Stack
 241 @section Stack
 242
 243 In project 2, the stack was a single page at the top of the user
 244 virtual address space.  The stack's location does not change in this
 245 project, but your kernel should allocate additional pages to the stack
 246 on demand.  That is, if the stack grows past its current bottom, the
 247 system should allocate additional pages for the stack as necessary
 248 (unless those pages are unavailable because they are in use by another
 249 segment).
 250
 251 It is impossible to predict how large the stack will grow at compile
 252 time, so we must allocate pages as necessary.  You should only
 253 allocate additional pages if they ``appear'' to be stack accesses.
 254 You must devise a heuristic that attempts to distinguish stack
 255 accesses from other accesses.  Document and explain the heuristic in
 256 your design documentation.
 257
 258 The first stack page need not be loaded lazily.  You can initialize it
 259 with the command line at load time, with no need to wait for it to be
 260 faulted in.  Even if you did wait, the very first instruction in the
 261 user program is likely to be one that faults in the page.
 262
 263 @node Problem 3-1 Page Table Management
 264 @section Problem 3-1: Page Table Management
 265
 266 Implement page directory and page table management to support virtual
 267 memory.  You will need data structures to accomplish the following
 268 tasks:
 269
 270 @itemize @bullet
 271 @item
 272 Some way of translating in software from virtual page frames to
 273 physical page frames.  Consider using a hash table (@pxref{Hash
 274 Table}).
 275
 276 @item
 277 Some way of translating from physical page frames back to virtual page
 278 frames, so that when you evict a physical page from its frame, you can
 279 invalidate its translation(s).
 280
 281 It is possible to do this translation without adding a new data
 282 structure, by modifying the code in @file{userprog/pagedir.c}.  However,
 283 if you do that you'll need to carefully study and understand section 3.7
 284 in @bibref{IA32-v3}, and in practice it is probably easier to add a new
 285 data structure.
 286
 287 @item
 288 Some way of finding a page on disk if it is not in memory.  You won't
 289 need this data structure until problem 3-2, but planning ahead is a
 290 good idea.
 291 @end itemize
 292
 293 The page fault handler, @func{page_fault} in
 294 @file{threads/exception.c}, needs to do roughly the following:
 295
 296 @enumerate 1
 297 @item
 298 Locate the page backing the virtual
 299 address that faulted.  It might be in the file system, in swap,
 300 or it might be an invalid virtual address.
 301 If you implement sharing, it might even
 302 already be in physical memory and just not set up in the page table,
 303
 304 If the virtual address is invalid, that is, if there's nothing
 305 assigned to go there, or if the virtual address is above
 306 @code{PHYS_BASE}, meaning that it belongs to the kernel instead of the
 307 user, then the process's memory access must be disallowed.  You should
 308 terminate the process at this point, being sure to free all of its
 309 resources.
 310
 311 @item
 312 If the page is not in physical memory, fetch it by appropriate means.
 313 If necessary to make room, first evict some other page from memory.
 314 (When you do that you need to first remove references to the page from
 315 any page table that refers to it.)
 316
 317 @item
 318 Point the page table entry for the faulting virtual address to the
 319 physical page.  You can use the functions in @file{userprog/pagedir.c}.
 320 @end enumerate
 321
 322 You'll need to modify the ELF loader in @file{userprog/process.c} to
 323 do page table management according to your new design.  As supplied,
 324 it reads all the process's pages from disk and initializes the page
 325 tables for them at the same time.  For testing purposes, you'll
 326 probably want to leave the code that reads the pages from disk, but
 327 use your new page table management code to construct the page tables
 328 only as page faults occur for them.
 329
 330 You should use the @func{palloc_get_page} function to get the page
 331 frames that you use for storing user virtual pages.  Be sure to pass
 332 the @code{PAL_USER} flag to this function when you do so, because that
 333 allocates pages from a ``user pool'' separate from the ``kernel pool''
 334 that other calls to @func{palloc_get_page} make.
 335
 336 There are many possible ways to implement virtual memory.  The above
 337 is simply an outline of our suggested implementation.
 338
 339 @node Problem 3-2 Paging To and From Disk
 340 @section Problem 3-2: Paging To and From Disk
 341
 342 Implement paging to and from files and the swap disk.  You may use the
 343 disk on interface @code{hd1:1} as the swap disk, using the disk
 344 interface prototyped in @code{devices/disk.h}.
 345
 346 You will need routines to move a page from memory to disk and from
 347 disk to memory, where ``disk'' is either a file or the swap disk.  If
 348 you do everything correctly, your VM should still work when you
 349 implement your own file system for the next assignment.
 350
 351 You will need a way to track pages which are used by a process but
 352 which are not in physical memory, to fully handle page faults.  Pages
 353 that you write to swap should not be constrained to be in sequential
 354 order.  You will also need a way to track all of the physical memory
 355 pages, to find an unused one when needed, or to evict a page
 356 when memory is needed but no empty pages are available.  The data
 357 structures that you designed for problem 3-1 should do most of the work for
 358 you.
 359
 360 You will need a page replacement algorithm.  The hardware sets the
 361 accessed and dirty bits when it accesses memory.  You can gain access
 362 to this information using the functions prototyped in
 363 @file{userprog/pagedir.h}.  You should be able to take advantage of
 364 this information to implement some algorithm which attempts to achieve
 365 LRU-type behavior.  We expect that your algorithm perform at least as
 366 well as a reasonable implementation of the second-chance (clock)
 367 algorithm.  You will need to show in your test cases the value of your
 368 page replacement algorithm by demonstrating for some workload that it
 369 pages less frequently using your algorithm than using some inferior
 370 page replacement policy.  The canonical example of a poor page
 371 replacement policy is random replacement.
 372
 373 Since you will already be paging from disk, you should implement a
 374 ``lazy'' loading scheme for new processes.  When a process is created,
 375 it will not run immediately.  Therefore, it doesn't make sense to load
 376 all its code, data, and stack into memory when the process is created,
 377 since it might incur additional disk accesses to do so (if it gets
 378 paged out before it runs).  When loading a new process, you should
 379 leave most pages on disk, and bring them in as demanded when the
 380 program begins running.  Your VM system should also use the executable
 381 file itself as backing store for read-only segments, since these
 382 segments won't change.
 383
 384 There are a few special cases.  Look at the loop in
 385 @func{load_segment} in @file{userprog/process.c}.  Each time
 386 around the loop, @code{read_bytes} represents the number of bytes to
 387 read from the executable file and @code{zero_bytes} represents the number
 388 of bytes to initialize to zero following the bytes read.  The two
 389 always sum to @code{PGSIZE}.  The page handling depends on these
 390 variables' values:
 391
 392 @itemize @bullet
 393 @item
 394 If @code{read_bytes} equals @code{PGSIZE}, the page should be demand
 395 paged from disk on its first access.
 396
 397 @item
 398 If @code{zero_bytes} equals @code{PGSIZE}, the page does not need to
 399 be read from disk at all because it is all zeroes.  You should handle
 400 such pages by creating a new page consisting of all zeroes at the
 401 first page fault.
 402
 403 @item
 404 If neither @code{read_bytes} nor @code{zero_bytes} equals
 405 @code{PGSIZE}, then part of the page is to be read from disk and the
 406 remainder zeroed.  This is a special case.  You are allowed to handle
 407 it by reading the partial page from disk at executable load time and
 408 zeroing the rest of the page.  This is the only case in which we will
 409 allow you to load a page in a non-``lazy'' fashion.  Many real OSes
 410 such as Linux do not load partial pages lazily.
 411 @end itemize
 412
 413 Incidentally, if you have trouble handling the third case above, you
 414 can eliminate it temporarily by linking the test programs with a
 415 special ``linker script.''  Read @file{tests/userprog/Makefile} for
 416 details.  We will not test your submission with this special linker
 417 script, so the code you turn in must properly handle all cases.
 418
 419 For extra credit, you may implement sharing: when multiple processes
 420 are created that use the same executable file, share read-only pages
 421 among those processes instead of creating separate copies of read-only
 422 segments for each process.  If you carefully designed your data
 423 structures in problem 3-1, sharing of read-only pages should not make this
 424 part significantly harder.
 425
 426 @node Problem 3-3 Memory Mapped Files
 427 @section Problem 3-3: Memory Mapped Files
 428
 429 Implement memory mapped files.
 430
 431 You will need to implement the following system calls:
 432
 433 @table @code
 434 @item SYS_mmap
 435 @itemx bool mmap (int @var{fd}, void *@var{addr}, unsigned @var{length})
 436
 437 Maps the file open as @var{fd} into the process's address space
 438 starting at @var{addr} for @var{length} bytes.  Returns true if
 439 successful, false on failure.  Failure cases include the following:
 440
 441 @itemize @bullet
 442 @item
 443 @var{addr} is not page-aligned.
 444
 445 @item
 446 @var{length} is not positive.
 447
 448 @item
 449 The range of pages mapped overlaps any existing set of mapped pages,
 450 including the stack or pages mapped at executable load time.
 451 @end itemize
 452
 453 @var{length} is treated as if it were rounded up to the nearest
 454 multiple of the page size, that is, as if the first statement in the
 455 system call's implementation were
 456 @example
 457 length = ROUND_UP (length, PGSIZE);
 458 @end example
 459 (The @code{ROUND_UP} macro is defined in @file{<round.h>}.)
 460 The remainder of this description assumes that this has been done.
 461
 462 If @var{length} is less than @var{fd}'s length, you should only map
 463 the first @var{length} bytes of the file.  If @var{length} is greater
 464 than @var{fd}'s length, when the file's length is also rounded up to a
 465 page multiple, the call should fail.  Ideally it would extend the
 466 file, but our file system does not yet support growing files.
 467
 468 If @var{length} is greater than @var{fd}'s (unrounded) length, then some
 469 bytes in the final mapped page ``stick out'' beyond the end of the
 470 file.  Set these bytes to zero when the page is faulted in from
 471 disk, and discard them when the page is written back to disk.
 472
 473 Your VM system should use the @code{mmap}'d file itself as
 474 backing store for the mapped segment.  That is, to evict a page mapped by
 475 @code{mmap} must be evicted, write it to the file it was mapped from.
 476 (In fact, you may choose to implement executable mappings as a special
 477 case of file mappings.)
 478
 479 @item SYS_munmap
 480 @itemx bool munmap (void *addr, unsigned length)
 481
 482 Unmaps @var{length} bytes starting at @var{addr}.  Returns true on
 483 success, false on failure.  Failure cases include the following:
 484
 485 @itemize @bullet
 486 @item
 487 @var{addr} is not page-aligned.
 488
 489 @item
 490 @var{length} is not positive.
 491
 492 @item
 493 One or more pages within the range to be unmapped were not mapped
 494 using the @code{mmap} system call.
 495 @end itemize
 496
 497 As with @code{mmap}, @var{length} is treated as if it were rounded up
 498 to the nearest multiple of the page size.
 499
 500 It is valid to unmap only some of the pages that were mapped in a
 501 previous system call.
 502 @end table
 503
 504 All mappings are implicitly unmapped when a process exits, whether via
 505 @code{exit} or by any other means.  When a file is unmapped, whether
 506 implicitly or explicitly, all outstanding changes are written to the
 507 file, and the pages are removed from the process's list of used
 508 virtual pages.
 509
 510 @node Virtual Memory FAQ
 511 @section FAQ
 512
 513 @enumerate 1
 514 @item
 515 @b{Do we need a working HW 2 to implement HW 3?}
 516
 517 Yes.
 518
 519 @item
 520 @anchor{Hash Table}
 521 @b{How do I use the hash table provided in @file{lib/kernel/hash.c}?}
 522
 523 First, you need to embed a @code{hash_elem} object as a member of the
 524 object that the hash table will contain.  Each @code{hash_elem} allows
 525 the object to a member of at most one hash table at a given time.  All
 526 the hash table functions that deal with hash table items actually use
 527 the address of a @code{hash_elem}.  You can convert a pointer to a
 528 @code{hash_elem} member into a pointer to the structure in which
 529 member is embedded using the @code{hash_entry} macro.
 530
 531 Second, you need to decide on a key type.  The key should be something
 532 that is unique for each object, because a given hash table may not
 533 contain two objects with equal keys.  Then you need to write two
 534 functions.  The first is a @dfn{hash function} that converts a key
 535 into an integer.  Some sample hash functions that you can use or just
 536 examine are given in @file{lib/kernel/hash.c}.  The second function
 537 needed is a @dfn{comparison function} that compares a pair and returns
 538 true if the first is less than the second.  These two functions have
 539 to be compatible with the prototypes for @code{hash_hash_func} and
 540 @code{hash_less_func} in @file{lib/kernel/hash.h}.
 541
 542 Here's a quick example.  Suppose you want to put @struct{thread}s
 543 in a hash table.  First, add a @code{hash_elem} to the thread
 544 structure by adding a line to its definition:
 545
 546 @example
 547 hash_elem h_elem;               /* Hash table element. */
 548 @end example
 549
 550 We'll choose the @code{tid} member in @struct{thread} as the key,
 551 and write a hash function and a comparison function:
 552
 553 @example
 554 /* Returns a hash for E. */
 555 unsigned
 556 thread_hash (const hash_elem *e, void *aux UNUSED)
 557 @{
 558   struct thread *t = hash_entry (e, struct thread, h_elem);
 559   return hash_int (t->tid);
 560 @}
 561
 562 /* Returns true if A's tid is less than B's tid. */
 563 bool
 564 thread_less (const hash_elem *a_, const hash_elem *b_,
 565              void *aux UNUSED)
 566 @{
 567   struct thread *a = hash_entry (a_, struct thread, h_elem);
 568   struct thread *b = hash_entry (b_, struct thread, h_elem);
 569   return a->tid < b->tid;
 570 @}
 571 @end example
 572
 573 Then we can create a hash table like this:
 574
 575 @example
 576 struct hash threads;
 577
 578 hash_init (&threads, thread_hash, thread_less, NULL);
 579 @end example
 580
 581 Finally, if @code{@var{t}} is a pointer to a @struct{thread},
 582 then we can insert it into the hash table with:
 583
 584 @example
 585 hash_insert (&threads, &@var{t}->h_elem);
 586 @end example
 587
 588 If you have any other questions about hash tables, the CS109
 589 and CS161 textbooks have good chapters on them, or you can come
 590 to any of the TA's office hours for further clarification.
 591
 592 @item
 593 @b{What are the @var{aux} parameters to the hash table functions good
 594 for?}
 595
 596 In simple cases you won't have any need for the @var{aux} parameters.
 597 In these cases you can just pass a null pointer to @func{hash_init}
 598 for @var{aux} and ignore the values passed to the hash function and
 599 comparison functions.  (You'll get a compiler warning if you don't use
 600 the @var{aux} parameter, but you can turn that off with the
 601 @code{UNUSED} macro, as shown above, or you can just ignore it.)
 602
 603 @var{aux} is useful when you have some property of the data in the
 604 hash table that's both constant and needed for hashing or comparisons,
 605 but which is not stored in the data items themselves.  For example, if
 606 the items in a hash table contain fixed-length strings, but the items
 607 themselves don't indicate what that fixed length is, you could pass
 608 the length as an @var{aux} parameter.
 609
 610 @item
 611 @b{The current implementation of the hash table does not do something
 612 that we need it to do. What gives?}
 613
 614 You are welcome to modify it.  It is not used by any of the code we
 615 provided, so modifying it won't affect any code but yours.  Do
 616 whatever it takes to make it work the way you want.
 617
 618 @item
 619 @b{What controls the layout of user programs?}
 620
 621 The linker is responsible for the layout of a user program in
 622 memory. The linker is directed by a ``linker script'' which tells it
 623 the names and locations of the various program segments.  You can
 624 learn more about linker scripts by reading the ``Scripts'' chapter in
 625 the linker manual, accessible via @samp{info ld}.
 626 @end enumerate
 627
 628 @menu
 629 * Problem 3-1 and 3-2 FAQ::
 630 * Problem 3-3 Memory Mapped File FAQ::
 631 @end menu
 632
 633 @node Problem 3-1 and 3-2 FAQ
 634 @subsection Problem 3-1 and 3-2 FAQ
 635
 636 @enumerate 1
 637 @item
 638 @b{Does the virtual memory system need to support growth of the data
 639 segment?}
 640
 641 No.  The size of the data segment is determined by the linker.  We
 642 still have no dynamic allocation in Pintos (although it is possible to
 643 ``fake'' it at the user level by using memory-mapped files).  However,
 644 implementing it would add little additional complexity to a
 645 well-designed system.
 646
 647 @item
 648 @b{Why do I need to pass @code{PAL_USER} to @func{palloc_get_page}
 649 when I allocate physical page frames?}@anchor{Why PAL_USER?}
 650
 651 You can layer some other allocator on top of @func{palloc_get_page}
 652 if you like, but it should be the underlying mechanism, directly or
 653 indirectly, for two reasons.  First, running out of pages in the user
 654 pool just causes user programs to page, but running out of pages in
 655 the kernel pool will cause all kinds of problems, because many kernel
 656 functions depend on being able to allocate memory.  Second, you can
 657 use the @option{-ul} option to @command{pintos} to limit the size of
 658 the user pool, which makes it easy to test your VM implementation with
 659 various user memory sizes.
 660 @end enumerate
 661
 662 @node Problem 3-3 Memory Mapped File FAQ
 663 @subsection Problem 3-3: Memory Mapped File FAQ
 664
 665 @enumerate 1
 666 @item
 667 @b{How do we interact with memory-mapped files?}
 668
 669 Let's say you want to map a file called @file{foo} into your address
 670 space at address @t{0x10000000}. You open the file, determine its
 671 length, and then use @code{mmap}:
 672
 673 @example
 674 #include <stdio.h>
 675 #include <syscall.h>
 676
 677 int main (void)
 678 @{
 679     void *addr = (void *) 0x10000000;
 680     int fd = open ("foo");
 681     int length = filesize (fd);
 682     if (mmap (fd, addr, length))
 683         printf ("success!\n");
 684 @}
 685 @end example
 686
 687 Suppose @file{foo} is a text file and you want to print the first 64
 688 bytes on the screen (assuming, of course, that the length of the file
 689 is at least 64).  Without @code{mmap}, you'd need to allocate a
 690 buffer, use @code{read} to get the data from the file into the buffer,
 691 and finally use @code{write} to put the buffer out to the display. But
 692 with the file mapped into your address space, you can directly address
 693 it like so:
 694
 695 @example
 696 write (addr, 64, STDOUT_FILENO);
 697 @end example
 698
 699 Similarly, if you wanted to replace the first byte of the file,
 700 all you need to do is:
 701
 702 @example
 703 addr[0] = 'b';
 704 @end example
 705
 706 When you're done using the memory-mapped file, you simply unmap
 707 it:
 708
 709 @example
 710 munmap (addr, length);
 711 @end example
 712
 713 @item
 714 @b{What if two processes memory-map the same file?}
 715
 716 There is no requirement in Pintos that the two processes see
 717 consistent data.  Unix handles this by making the processes share the
 718 same physical page, but the @code{mmap} system call also has an
 719 argument allowing the client to specify whether the page is shared or
 720 private (i.e.@: copy-on-write).
 721
 722 @item
 723 @b{What happens if a user removes a @code{mmap}'d file?}
 724
 725 You should follow the Unix convention and the mapping should still be
 726 valid.  @xref{Removing an Open File}, for more information.
 727
 728 @item
 729 @b{What if a process writes to a page that is memory-mapped, but the
 730 location written to in the memory-mapped page is past the end
 731 of the memory-mapped file?}
 732
 733 Can't happen.  @code{mmap} checks that the mapped region is within the
 734 file's length and Pintos provides no way to shorten a file.  (Until
 735 project 4, there's no way to extend a file either.)  You can remove a
 736 file, but the mapping remains valid (see the previous question).
 737
 738 @item
 739 @b{Do we have to handle memory mapping @code{stdin} or @code{stdout}?}
 740
 741 No.  Memory mapping implies that a file has a length and that a user
 742 can seek to any location in the file.  Since the console device has
 743 neither of these properties, @code{mmap} should return false when the
 744 user attempts to memory map a file descriptor for the console device.
 745
 746 @item
 747 @b{What happens when a process exits with mapped files?}
 748
 749 When a process finishes, each of its mapped files is implicitly
 750 unmapped.  When a process @code{mmap}s a file and then writes into the
 751 area for the file it is making the assumption the changes will be
 752 written to the file.
 753
 754 @item
 755 @b{If a user closes a mapped file, should it be automatically
 756 unmapped?}
 757
 758 No, once created the mapping is valid until @code{munmap} is called
 759 or the process exits.
 760 @end enumerate