2 @appendix Reference Guide
4 This chapter is a reference for the Pintos code. The reference guide
5 does not cover all of the code in Pintos, but it does cover those
6 pieces that students most often find troublesome. You may find that
7 you want to read each part of the reference guide as you work on the
8 project where it becomes important.
10 We recommend using ``tags'' to follow along with references to function
11 and variable names (@pxref{Tags}).
17 * Interrupt Handling::
27 This section covers the Pintos loader and basic kernel
32 * Low-Level Kernel Initialization::
33 * High-Level Kernel Initialization::
34 * Physical Memory Map::
38 @subsection The Loader
40 The first part of Pintos that runs is the loader, in
41 @file{threads/loader.S}. The PC BIOS loads the loader into memory.
42 The loader, in turn, is responsible for initializing the CPU, loading
43 the rest of Pintos into memory, and then jumping to its start. It's
44 not important to understand exactly what the loader does, but if
45 you're interested, read on. You should probably read along with the
46 loader's source. You should also understand the basics of the
47 80@var{x}86 architecture as described by chapter 3, ``Basic Execution
48 Environment,'' of @bibref{IA32-v1}.
50 Because the PC BIOS loads the loader, the loader has to play by the
51 BIOS's rules. In particular, the BIOS only loads 512 bytes (one disk
52 sector) into memory. This is a severe restriction and it means that,
53 practically speaking, the loader has to be written in assembly
56 The Pintos loader first initializes the CPU. The first important part of
57 this is to enable the A20 line, that is, the CPU's address line
58 numbered 20. For historical reasons, PCs boot with this address
59 line fixed at 0, which means that attempts to access memory beyond the
60 first 1 MB (2 raised to the 20th power) will fail. Pintos wants to
61 access more memory than this, so we have to enable it.
63 Next, the loader asks the BIOS for the PC's memory size. Again for
64 historical reasons, the function that we call in the BIOS to do this
65 can only detect up to 64 MB of RAM, so that's the practical limit that
66 Pintos can support. The memory size is stashed away in a location in
67 the loader that the kernel can read after it boots.
69 Third, the loader creates a basic page table. This page table maps
70 the 64 MB at the base of virtual memory (starting at virtual address
71 0) directly to the identical physical addresses. It also maps the
72 same physical memory starting at virtual address
73 @code{LOADER_PHYS_BASE}, which defaults to @t{0xc0000000} (3 GB). The
74 Pintos kernel only wants the latter mapping, but there's a
75 chicken-and-egg problem if we don't include the former: our current
76 virtual address is roughly @t{0x7c00}, the location where the BIOS
77 loaded us, and we can't jump to @t{0xc0007c00} until we turn on the
78 page table, but if we turn on the page table without jumping there,
79 then we've just pulled the rug out from under ourselves.
81 After the page table is initialized, we load the CPU's control
82 registers to turn on protected mode and paging, and then we set up the
83 segment registers. We aren't yet equipped to handle interrupts in
84 protected mode, so we disable interrupts.
86 Finally it's time to load the kernel from disk. We use a simple but
87 inflexible method to do this: we program the IDE disk
88 controller directly. We assume that the kernel is stored starting
89 from the second sector of the first IDE disk (the first sector normally
90 contains the boot loader). We also assume that the BIOS has
91 already set up the IDE controller for us. We read
92 @code{KERNEL_LOAD_PAGES} pages of data (4 kB per page) from the disk directly
93 into virtual memory, starting @code{LOADER_KERN_BASE} bytes past
94 @code{LOADER_PHYS_BASE}, which by default means that we load the
95 kernel starting 1 MB into physical memory.
97 Then we jump to the start of the compiled kernel image. Using the
98 ``linker script'' in @file{threads/kernel.lds.S}, the kernel has
99 arranged to begin with the assembly module
100 @file{threads/start.S}. This assembly module just calls
101 @func{main}, which never returns.
103 There's one more trick: the Pintos kernel command line
104 is stored in the boot loader. The @command{pintos} program actually
105 modifies a copy of the boot loader on disk each time it runs the kernel,
107 in whatever command line arguments the user supplies to the kernel,
108 and then the kernel at boot time reads those arguments out of the boot
109 loader in memory. This is not an elegant solution, but it is simple
112 @node Kernel Initialization
113 @subsection Kernel Initialization
115 The kernel proper starts with the @func{main} function. The
116 @func{main} function is written in C, as will be most of the code we
117 encounter in Pintos from here on out.
119 When @func{main} starts, the system is in a pretty raw state. We're
120 in 32-bit protected mode with paging enabled, but hardly anything else is
121 ready. Thus, the @func{main} function consists primarily of calls
122 into other Pintos modules' initialization functions.
123 These are usually named @func{@var{module}_init}, where
124 @var{module} is the module's name, @file{@var{module}.c} is the
125 module's source code, and @file{@var{module}.h} is the module's
128 The first step in @func{main} is to call @func{bss_init}, which clears
129 out the kernel's ``BSS'', which is the traditional name for a
130 segment that should be initialized to all zeros. In most C
131 implementations, whenever you
132 declare a variable outside a function without providing an
133 initializer, that variable goes into the BSS. Because it's all zeros, the
134 BSS isn't stored in the image that the loader brought into memory. We
135 just use @func{memset} to zero it out.
137 Next, @func{main} calls @func{read_command_line} to break the kernel command
138 line into arguments, then @func{parse_options} to read any options at
139 the beginning of the command line. (Actions specified on the
140 command line execute later.)
142 @func{thread_init} initializes the thread system. We will defer full
143 discussion to our discussion of Pintos threads below. It is called so
144 early in initialization because a valid thread structure is a
145 prerequisite for acquiring a lock, and lock acquisition in turn is
146 important to other Pintos subsystems. Then we initialize the console
147 and print a startup message to the console.
149 The next block of functions we call initializes the kernel's memory
150 system. @func{palloc_init} sets up the kernel page allocator, which
151 doles out memory one or more pages at a time (@pxref{Page Allocator}).
152 @func{malloc_init} sets
153 up the allocator that handles allocations of arbitrary-size blocks of
154 memory (@pxref{Block Allocator}).
155 @func{paging_init} sets up a page table for the kernel (@pxref{Page
158 In projects 2 and later, @func{main} also calls @func{tss_init} and
161 The next set of calls initializes the interrupt system.
162 @func{intr_init} sets up the CPU's @dfn{interrupt descriptor table}
163 (IDT) to ready it for interrupt handling (@pxref{Interrupt
164 Infrastructure}), then @func{timer_init} and @func{kbd_init} prepare for
165 handling timer interrupts and keyboard interrupts, respectively.
166 @func{input_init} sets up to merge serial and keyboard input into one
168 projects 2 and later, we also prepare to handle interrupts caused by
169 user programs using @func{exception_init} and @func{syscall_init}.
171 Now that interrupts are set up, we can start the scheduler
172 with @func{thread_start}, which creates the idle thread and enables
174 With interrupts enabled, interrupt-driven serial port I/O becomes
176 @func{serial_init_queue} to switch to that mode. Finally,
177 @func{timer_calibrate} calibrates the timer for accurate short delays.
179 If the file system is compiled in, as it will starting in project 2, we
180 initialize the IDE disks with @func{ide_init}, then the
181 file system with @func{filesys_init}.
183 Boot is complete, so we print a message.
185 Function @func{run_actions} now parses and executes actions specified on
186 the kernel command line, such as @command{run} to run a test (in project
187 1) or a user program (in later projects).
189 Finally, if @option{-q} was specified on the kernel command line, we
190 call @func{power_off} to terminate the machine simulator. Otherwise,
191 @func{main} calls @func{thread_exit}, which allows any other running
192 threads to continue running.
194 @node Physical Memory Map
195 @subsection Physical Memory Map
197 @multitable {@t{00000000}--@t{00000000}} {Hardware} {Some much longer explanatory text}
198 @headitem Memory Range
202 @item @t{00000000}--@t{000003ff} @tab CPU @tab Real mode interrupt table.
203 @item @t{00000400}--@t{000005ff} @tab BIOS @tab Miscellaneous data area.
204 @item @t{00000600}--@t{00007bff} @tab --- @tab ---
205 @item @t{00007c00}--@t{00007dff} @tab Pintos @tab Loader.
206 @item @t{0000e000}--@t{0000efff} @tab Pintos
207 @tab Stack for loader; kernel stack and @struct{thread} for initial
209 @item @t{0000f000}--@t{0000ffff} @tab Pintos
210 @tab Page directory for startup code.
211 @item @t{00010000}--@t{00020000} @tab Pintos
212 @tab Page tables for startup code.
213 @item @t{00020000}--@t{0009ffff} @tab Pintos
214 @tab Kernel code, data, and uninitialized data segments.
215 @item @t{000a0000}--@t{000bffff} @tab Video @tab VGA display memory.
216 @item @t{000c0000}--@t{000effff} @tab Hardware
217 @tab Reserved for expansion card RAM and ROM.
218 @item @t{000f0000}--@t{000fffff} @tab BIOS @tab ROM BIOS.
219 @item @t{00100000}--@t{03ffffff} @tab Pintos @tab Dynamic memory allocation.
232 @subsection @code{struct thread}
234 The main Pintos data structure for threads is @struct{thread},
235 declared in @file{threads/thread.h}.
237 @deftp {Structure} {struct thread}
238 Represents a thread or a user process. In the projects, you will have
239 to add your own members to @struct{thread}. You may also change or
240 delete the definitions of existing members.
242 Every @struct{thread} occupies the beginning of its own page of
243 memory. The rest of the page is used for the thread's stack, which
244 grows downward from the end of the page. It looks like this:
248 4 kB +---------------------------------+
262 sizeof (struct thread) +---------------------------------+
268 0 kB +---------------------------------+
272 This has two consequences. First, @struct{thread} must not be allowed
273 to grow too big. If it does, then there will not be enough room for the
274 kernel stack. The base @struct{thread} is only a few bytes in size. It
275 probably should stay well under 1 kB.
277 Second, kernel stacks must not be allowed to grow too large. If a stack
278 overflows, it will corrupt the thread state. Thus, kernel functions
279 should not allocate large structures or arrays as non-static local
280 variables. Use dynamic allocation with @func{malloc} or
281 @func{palloc_get_page} instead (@pxref{Memory Allocation}).
284 @deftypecv {Member} {@struct{thread}} {tid_t} tid
285 The thread's thread identifier or @dfn{tid}. Every thread must have a
286 tid that is unique over the entire lifetime of the kernel. By
287 default, @code{tid_t} is a @code{typedef} for @code{int} and each new
288 thread receives the numerically next higher tid, starting from 1 for
289 the initial process. You can change the type and the numbering scheme
293 @deftypecv {Member} {@struct{thread}} {enum thread_status} status
294 @anchor{Thread States}
295 The thread's state, one of the following:
297 @defvr {Thread State} @code{THREAD_RUNNING}
298 The thread is running. Exactly one thread is running at a given time.
299 @func{thread_current} returns the running thread.
302 @defvr {Thread State} @code{THREAD_READY}
303 The thread is ready to run, but it's not running right now. The
304 thread could be selected to run the next time the scheduler is
305 invoked. Ready threads are kept in a doubly linked list called
309 @defvr {Thread State} @code{THREAD_BLOCKED}
310 The thread is waiting for something, e.g.@: a lock to become
311 available, an interrupt to be invoked. The thread won't be scheduled
312 again until it transitions to the @code{THREAD_READY} state with a
313 call to @func{thread_unblock}. This is most conveniently done
314 indirectly, using one of the Pintos synchronization primitives that
315 block and unblock threads automatically (@pxref{Synchronization}).
317 There is no @i{a priori} way to tell what a blocked thread is waiting
318 for, but a backtrace can help (@pxref{Backtraces}).
321 @defvr {Thread State} @code{THREAD_DYING}
322 The thread will be destroyed by the scheduler after switching to the
327 @deftypecv {Member} {@struct{thread}} {char} name[16]
328 The thread's name as a string, or at least the first few characters of
332 @deftypecv {Member} {@struct{thread}} {uint8_t *} stack
333 Every thread has its own stack to keep track of its state. When the
334 thread is running, the CPU's stack pointer register tracks the top of
335 the stack and this member is unused. But when the CPU switches to
336 another thread, this member saves the thread's stack pointer. No
337 other members are needed to save the thread's registers, because the
338 other registers that must be saved are saved on the stack.
340 When an interrupt occurs, whether in the kernel or a user program, an
341 @struct{intr_frame} is pushed onto the stack. When the interrupt occurs
342 in a user program, the @struct{intr_frame} is always at the very top of
343 the page. @xref{Interrupt Handling}, for more information.
346 @deftypecv {Member} {@struct{thread}} {int} priority
347 A thread priority, ranging from @code{PRI_MIN} (0) to @code{PRI_MAX}
348 (63). Lower numbers correspond to lower priorities, so that
349 priority 0 is the lowest priority and priority 63 is the highest.
350 Pintos as provided ignores thread priorities, but you will implement
351 priority scheduling in project 1 (@pxref{Priority Scheduling}).
354 @deftypecv {Member} {@struct{thread}} {@struct{list_elem}} elem
355 A ``list element'' used to put the thread into doubly linked lists,
356 either @code{ready_list} (the list of threads ready to run) or a list of
357 threads waiting on a semaphore in @func{sema_down}. It can do double
358 duty because a thread waiting on a semaphore is not ready, and vice
362 @deftypecv {Member} {@struct{thread}} {uint32_t *} pagedir
363 Only present in project 2 and later. @xref{Page Tables}.
366 @deftypecv {Member} {@struct{thread}} {unsigned} magic
367 Always set to @code{THREAD_MAGIC}, which is just an arbitrary number defined
368 in @file{threads/thread.c}, and used to detect stack overflow.
369 @func{thread_current} checks that the @code{magic} member of the running
370 thread's @struct{thread} is set to @code{THREAD_MAGIC}. Stack overflow
371 tends to change this value, triggering the assertion. For greatest
372 benefit, as you add members to @struct{thread}, leave @code{magic} at
376 @node Thread Functions
377 @subsection Thread Functions
379 @file{threads/thread.c} implements several public functions for thread
380 support. Let's take a look at the most useful:
382 @deftypefun void thread_init (void)
383 Called by @func{main} to initialize the thread system. Its main
384 purpose is to create a @struct{thread} for Pintos's initial thread.
385 This is possible because the Pintos loader puts the initial
386 thread's stack at the top of a page, in the same position as any other
389 Before @func{thread_init} runs,
390 @func{thread_current} will fail because the running thread's
391 @code{magic} value is incorrect. Lots of functions call
392 @func{thread_current} directly or indirectly, including
393 @func{lock_acquire} for locking a lock, so @func{thread_init} is
394 called early in Pintos initialization.
397 @deftypefun void thread_start (void)
398 Called by @func{main} to start the scheduler. Creates the idle
399 thread, that is, the thread that is scheduled when no other thread is
400 ready. Then enables interrupts, which as a side effect enables the
401 scheduler because the scheduler runs on return from the timer interrupt, using
402 @func{intr_yield_on_return} (@pxref{External Interrupt Handling}).
405 @deftypefun void thread_tick (void)
406 Called by the timer interrupt at each timer tick. It keeps track of
407 thread statistics and triggers the scheduler when a time slice expires.
410 @deftypefun void thread_print_stats (void)
411 Called during Pintos shutdown to print thread statistics.
414 @deftypefun tid_t thread_create (const char *@var{name}, int @var{priority}, thread_func *@var{func}, void *@var{aux})
415 Creates and starts a new thread named @var{name} with the given
416 @var{priority}, returning the new thread's tid. The thread executes
417 @var{func}, passing @var{aux} as the function's single argument.
419 @func{thread_create} allocates a page for the thread's
420 @struct{thread} and stack and initializes its members, then it sets
421 up a set of fake stack frames for it (@pxref{Thread Switching}). The
422 thread is initialized in the blocked state, then unblocked just before
423 returning, which allows the new thread to
424 be scheduled (@pxref{Thread States}).
426 @deftp {Type} {void thread_func (void *@var{aux})}
427 This is the type of the function passed to @func{thread_create}, whose
428 @var{aux} argument is passed along as the function's argument.
432 @deftypefun void thread_block (void)
433 Transitions the running thread from the running state to the blocked
434 state (@pxref{Thread States}). The thread will not run again until
435 @func{thread_unblock} is
436 called on it, so you'd better have some way arranged for that to happen.
437 Because @func{thread_block} is so low-level, you should prefer to use
438 one of the synchronization primitives instead (@pxref{Synchronization}).
441 @deftypefun void thread_unblock (struct thread *@var{thread})
442 Transitions @var{thread}, which must be in the blocked state, to the
443 ready state, allowing it to resume running (@pxref{Thread States}).
444 This is called when the event that the thread is waiting for occurs,
445 e.g.@: when the lock that
446 the thread is waiting on becomes available.
449 @deftypefun {struct thread *} thread_current (void)
450 Returns the running thread.
453 @deftypefun {tid_t} thread_tid (void)
454 Returns the running thread's thread id. Equivalent to
455 @code{thread_current ()->tid}.
458 @deftypefun {const char *} thread_name (void)
459 Returns the running thread's name. Equivalent to @code{thread_current
463 @deftypefun void thread_exit (void) @code{NO_RETURN}
464 Causes the current thread to exit. Never returns, hence
465 @code{NO_RETURN} (@pxref{Function and Parameter Attributes}).
468 @deftypefun void thread_yield (void)
469 Yields the CPU to the scheduler, which picks a new thread to run. The
470 new thread might be the current thread, so you can't depend on this
471 function to keep this thread from running for any particular length of
475 @deftypefun int thread_get_priority (void)
476 @deftypefunx void thread_set_priority (int @var{new_priority})
477 Stub to set and get thread priority. @xref{Priority Scheduling}.
480 @deftypefun int thread_get_nice (void)
481 @deftypefunx void thread_set_nice (int @var{new_nice})
482 @deftypefunx int thread_get_recent_cpu (void)
483 @deftypefunx int thread_get_load_avg (void)
484 Stubs for the advanced scheduler. @xref{4.4BSD Scheduler}.
487 @node Thread Switching
488 @subsection Thread Switching
490 @func{schedule} is responsible for switching threads. It
491 is internal to @file{threads/thread.c} and called only by the three
492 public thread functions that need to switch threads:
493 @func{thread_block}, @func{thread_exit}, and @func{thread_yield}.
494 Before any of these functions call @func{schedule}, they disable
495 interrupts (or ensure that they are already disabled) and then change
496 the running thread's state to something other than running.
498 @func{schedule} is short but tricky. It records the
499 current thread in local variable @var{cur}, determines the next thread
500 to run as local variable @var{next} (by calling
501 @func{next_thread_to_run}), and then calls @func{switch_threads} to do
502 the actual thread switch. The thread we switched to was also running
503 inside @func{switch_threads}, as are all the threads not currently
504 running, so the new thread now returns out of
505 @func{switch_threads}, returning the previously running thread.
507 @func{switch_threads} is an assembly language routine in
508 @file{threads/switch.S}. It saves registers on the stack, saves the
509 CPU's current stack pointer in the current @struct{thread}'s @code{stack}
510 member, restores the new thread's @code{stack} into the CPU's stack
511 pointer, restores registers from the stack, and returns.
513 The rest of the scheduler is implemented in @func{schedule_tail}. It
514 marks the new thread as running. If the thread we just switched from
515 is in the dying state, then it also frees the page that contained the
516 dying thread's @struct{thread} and stack. These couldn't be freed
517 prior to the thread switch because the switch needed to use it.
519 Running a thread for the first time is a special case. When
520 @func{thread_create} creates a new thread, it goes through a fair
521 amount of trouble to get it started properly. In particular, the new
522 thread hasn't started running yet, so there's no way for it to be
523 running inside @func{switch_threads} as the scheduler expects. To
524 solve the problem, @func{thread_create} creates some fake stack frames
525 in the new thread's stack:
529 The topmost fake stack frame is for @func{switch_threads}, represented
530 by @struct{switch_threads_frame}. The important part of this frame is
531 its @code{eip} member, the return address. We point @code{eip} to
532 @func{switch_entry}, indicating it to be the function that called
536 The next fake stack frame is for @func{switch_entry}, an assembly
537 language routine in @file{threads/switch.S} that adjusts the stack
538 pointer,@footnote{This is because @func{switch_threads} takes
539 arguments on the stack and the 80@var{x}86 SVR4 calling convention
540 requires the caller, not the called function, to remove them when the
541 call is complete. See @bibref{SysV-i386} chapter 3 for details.}
542 calls @func{schedule_tail} (this special case is why
543 @func{schedule_tail} is separate from @func{schedule}), and returns.
544 We fill in its stack frame so that it returns into
545 @func{kernel_thread}, a function in @file{threads/thread.c}.
548 The final stack frame is for @func{kernel_thread}, which enables
549 interrupts and calls the thread's function (the function passed to
550 @func{thread_create}). If the thread's function returns, it calls
551 @func{thread_exit} to terminate the thread.
554 @node Synchronization
555 @section Synchronization
557 If sharing of resources between threads is not handled in a careful,
558 controlled fashion, the result is usually a big mess.
559 This is especially the case in operating system kernels, where
560 faulty sharing can crash the entire machine. Pintos provides several
561 synchronization primitives to help out.
564 * Disabling Interrupts::
568 * Optimization Barriers::
571 @node Disabling Interrupts
572 @subsection Disabling Interrupts
574 The crudest way to do synchronization is to disable interrupts, that
575 is, to temporarily prevent the CPU from responding to interrupts. If
576 interrupts are off, no other thread will preempt the running thread,
577 because thread preemption is driven by the timer interrupt. If
578 interrupts are on, as they normally are, then the running thread may
579 be preempted by another at any time, whether between two C statements
580 or even within the execution of one.
582 Incidentally, this means that Pintos is a ``preemptible kernel,'' that
583 is, kernel threads can be preempted at any time. Traditional Unix
584 systems are ``nonpreemptible,'' that is, kernel threads can only be
585 preempted at points where they explicitly call into the scheduler.
586 (User programs can be preempted at any time in both models.) As you
587 might imagine, preemptible kernels require more explicit
590 You should have little need to set the interrupt state directly. Most
591 of the time you should use the other synchronization primitives
592 described in the following sections. The main reason to disable
593 interrupts is to synchronize kernel threads with external interrupt
594 handlers, which cannot sleep and thus cannot use most other forms of
595 synchronization (@pxref{External Interrupt Handling}).
597 Some external interrupts cannot be postponed, even by disabling
598 interrupts. These interrupts, called @dfn{non-maskable interrupts}
599 (NMIs), are supposed to be used only in emergencies, e.g.@: when the
600 computer is on fire. Pintos does not handle non-maskable interrupts.
602 Types and functions for disabling and enabling interrupts are in
603 @file{threads/interrupt.h}.
605 @deftp Type {enum intr_level}
606 One of @code{INTR_OFF} or @code{INTR_ON}, denoting that interrupts are
607 disabled or enabled, respectively.
610 @deftypefun {enum intr_level} intr_get_level (void)
611 Returns the current interrupt state.
614 @deftypefun {enum intr_level} intr_set_level (enum intr_level @var{level})
615 Turns interrupts on or off according to @var{level}. Returns the
616 previous interrupt state.
619 @deftypefun {enum intr_level} intr_enable (void)
620 Turns interrupts on. Returns the previous interrupt state.
623 @deftypefun {enum intr_level} intr_disable (void)
624 Turns interrupts off. Returns the previous interrupt state.
628 @subsection Semaphores
630 A @dfn{semaphore} is a nonnegative integer together with two operators
631 that manipulate it atomically, which are:
635 ``Down'' or ``P'': wait for the value to become positive, then
639 ``Up'' or ``V'': increment the value (and wake up one waiting thread,
643 A semaphore initialized to 0 may be used to wait for an event
644 that will happen exactly once. For example, suppose thread @var{A}
645 starts another thread @var{B} and wants to wait for @var{B} to signal
646 that some activity is complete. @var{A} can create a semaphore
647 initialized to 0, pass it to @var{B} as it starts it, and then
648 ``down'' the semaphore. When @var{B} finishes its activity, it
649 ``ups'' the semaphore. This works regardless of whether @var{A}
650 ``downs'' the semaphore or @var{B} ``ups'' it first.
652 A semaphore initialized to 1 is typically used for controlling access
653 to a resource. Before a block of code starts using the resource, it
654 ``downs'' the semaphore, then after it is done with the resource it
655 ``ups'' the resource. In such a case a lock, described below, may be
658 Semaphores can also be initialized to values larger than 1. These are
661 Semaphores were invented by Edsger Dijkstra and first used in the THE
662 operating system (@bibref{Dijkstra}).
664 Pintos' semaphore type and operations are declared in
665 @file{threads/synch.h}.
667 @deftp {Type} {struct semaphore}
668 Represents a semaphore.
671 @deftypefun void sema_init (struct semaphore *@var{sema}, unsigned @var{value})
672 Initializes @var{sema} as a new semaphore with the given initial
676 @deftypefun void sema_down (struct semaphore *@var{sema})
677 Executes the ``down'' or ``P'' operation on @var{sema}, waiting for
678 its value to become positive and then decrementing it by one.
681 @deftypefun bool sema_try_down (struct semaphore *@var{sema})
682 Tries to execute the ``down'' or ``P'' operation on @var{sema},
683 without waiting. Returns true if @var{sema}
684 was successfully decremented, or false if it was already
685 zero and thus could not be decremented without waiting. Calling this
687 tight loop wastes CPU time, so use @func{sema_down} or find a
688 different approach instead.
691 @deftypefun void sema_up (struct semaphore *@var{sema})
692 Executes the ``up'' or ``V'' operation on @var{sema},
693 incrementing its value. If any threads are waiting on
694 @var{sema}, wakes one of them up.
696 Unlike most synchronization primitives, @func{sema_up} may be called
697 inside an external interrupt handler (@pxref{External Interrupt
701 Semaphores are internally built out of disabling interrupt
702 (@pxref{Disabling Interrupts}) and thread blocking and unblocking
703 (@func{thread_block} and @func{thread_unblock}). Each semaphore maintains
704 a list of waiting threads, using the linked list
705 implementation in @file{lib/kernel/list.c}.
710 A @dfn{lock} is like a semaphore with an initial value of 1
711 (@pxref{Semaphores}). A lock's equivalent of ``up'' is called
712 ``release'', and the ``down'' operation is called ``acquire''.
714 Compared to a semaphore, a lock has one added restriction: only the
715 thread that acquires a lock, called the lock's ``owner'', is allowed to
716 release it. If this restriction is a problem, it's a good sign that a
717 semaphore should be used, instead of a lock.
719 Locks in Pintos are not ``recursive,'' that is, it is an error for the
720 thread currently holding a lock to try to acquire that lock.
722 Lock types and functions are declared in @file{threads/synch.h}.
724 @deftp {Type} {struct lock}
728 @deftypefun void lock_init (struct lock *@var{lock})
729 Initializes @var{lock} as a new lock.
730 The lock is not initially owned by any thread.
733 @deftypefun void lock_acquire (struct lock *@var{lock})
734 Acquires @var{lock} for the current thread, first waiting for
735 any current owner to release it if necessary.
738 @deftypefun bool lock_try_acquire (struct lock *@var{lock})
739 Tries to acquire @var{lock} for use by the current thread, without
740 waiting. Returns true if successful, false if the lock is already
741 owned. Calling this function in a tight loop is a bad idea because it
742 wastes CPU time, so use @func{lock_acquire} instead.
745 @deftypefun void lock_release (struct lock *@var{lock})
746 Releases @var{lock}, which the current thread must own.
749 @deftypefun bool lock_held_by_current_thread (const struct lock *@var{lock})
750 Returns true if the running thread owns @var{lock},
752 There is no function to test whether an arbitrary thread owns a lock,
753 because the answer could change before the caller could act on it.
759 A @dfn{monitor} is a higher-level form of synchronization than a
760 semaphore or a lock. A monitor consists of data being synchronized,
761 plus a lock, called the @dfn{monitor lock}, and one or more
762 @dfn{condition variables}. Before it accesses the protected data, a
763 thread first acquires the monitor lock. It is then said to be ``in the
764 monitor''. While in the monitor, the thread has control over all the
765 protected data, which it may freely examine or modify. When access to
766 the protected data is complete, it releases the monitor lock.
768 Condition variables allow code in the monitor to wait for a condition to
769 become true. Each condition variable is associated with an abstract
770 condition, e.g.@: ``some data has arrived for processing'' or ``over 10
771 seconds has passed since the user's last keystroke''. When code in the
772 monitor needs to wait for a condition to become true, it ``waits'' on
773 the associated condition variable, which releases the lock and waits for
774 the condition to be signaled. If, on the other hand, it has caused one
775 of these conditions to become true, it ``signals'' the condition to wake
776 up one waiter, or ``broadcasts'' the condition to wake all of them.
778 The theoretical framework for monitors was laid out by C.@: A.@: R.@:
779 Hoare (@bibref{Hoare}). Their practical usage was later elaborated in a
780 paper on the Mesa operating system (@bibref{Lampson}).
782 Condition variable types and functions are declared in
783 @file{threads/synch.h}.
785 @deftp {Type} {struct condition}
786 Represents a condition variable.
789 @deftypefun void cond_init (struct condition *@var{cond})
790 Initializes @var{cond} as a new condition variable.
793 @deftypefun void cond_wait (struct condition *@var{cond}, struct lock *@var{lock})
794 Atomically releases @var{lock} (the monitor lock) and waits for
795 @var{cond} to be signaled by some other piece of code. After
796 @var{cond} is signaled, reacquires @var{lock} before returning.
797 @var{lock} must be held before calling this function.
799 Sending a signal and waking up from a wait are not an atomic operation.
800 Thus, typically @func{cond_wait}'s caller must recheck the condition
801 after the wait completes and, if necessary, wait again. See the next
802 section for an example.
805 @deftypefun void cond_signal (struct condition *@var{cond}, struct lock *@var{lock})
806 If any threads are waiting on @var{cond} (protected by monitor lock
807 @var{lock}), then this function wakes up one of them. If no threads are
808 waiting, returns without performing any action.
809 @var{lock} must be held before calling this function.
812 @deftypefun void cond_broadcast (struct condition *@var{cond}, struct lock *@var{lock})
813 Wakes up all threads, if any, waiting on @var{cond} (protected by
814 monitor lock @var{lock}). @var{lock} must be held before calling this
818 @subsubsection Monitor Example
820 The classical example of a monitor is handling a buffer into which one
822 ``producer'' threads write characters and out of which one or more
823 ``consumer'' threads read characters. To implement this we need,
824 besides the monitor lock, two condition variables which we will call
825 @var{not_full} and @var{not_empty}:
828 char buf[BUF_SIZE]; /* @r{Buffer.} */
829 size_t n = 0; /* @r{0 <= n <= @var{BUF_SIZE}: # of characters in buffer.} */
830 size_t head = 0; /* @r{@var{buf} index of next char to write (mod @var{BUF_SIZE}).} */
831 size_t tail = 0; /* @r{@var{buf} index of next char to read (mod @var{BUF_SIZE}).} */
832 struct lock lock; /* @r{Monitor lock.} */
833 struct condition not_empty; /* @r{Signaled when the buffer is not empty.} */
834 struct condition not_full; /* @r{Signaled when the buffer is not full.} */
836 @dots{}@r{initialize the locks and condition variables}@dots{}
838 void put (char ch) @{
839 lock_acquire (&lock);
840 while (n == BUF_SIZE) /* @r{Can't add to @var{buf} as long as it's full.} */
841 cond_wait (¬_full, &lock);
842 buf[head++ % BUF_SIZE] = ch; /* @r{Add @var{ch} to @var{buf}.} */
844 cond_signal (¬_empty, &lock); /* @r{@var{buf} can't be empty anymore.} */
845 lock_release (&lock);
850 lock_acquire (&lock);
851 while (n == 0) /* @r{Can't read @var{buf} as long as it's empty.} */
852 cond_wait (¬_empty, &lock);
853 ch = buf[tail++ % BUF_SIZE]; /* @r{Get @var{ch} from @var{buf}.} */
855 cond_signal (¬_full, &lock); /* @r{@var{buf} can't be full anymore.} */
856 lock_release (&lock);
860 Note that @code{BUF_SIZE} must divide evenly into @code{SIZE_MAX + 1}
861 for the above code to be completely correct. Otherwise, it will fail
862 the first time @code{head} wraps around to 0. In practice,
863 @code{BUF_SIZE} would ordinarily be a power of 2.
865 @node Optimization Barriers
866 @subsection Optimization Barriers
868 @c We should try to come up with a better example.
869 @c Perhaps something with a linked list?
871 An @dfn{optimization barrier} is a special statement that prevents the
872 compiler from making assumptions about the state of memory across the
873 barrier. The compiler will not reorder reads or writes of variables
874 across the barrier or assume that a variable's value is unmodified
875 across the barrier, except for local variables whose address is never
876 taken. In Pintos, @file{threads/synch.h} defines the @code{barrier()}
877 macro as an optimization barrier.
879 One reason to use an optimization barrier is when data can change
880 asynchronously, without the compiler's knowledge, e.g.@: by another
881 thread or an interrupt handler. The @func{too_many_loops} function in
882 @file{devices/timer.c} is an example. This function starts out by
883 busy-waiting in a loop until a timer tick occurs:
886 /* Wait for a timer tick. */
887 int64_t start = ticks;
888 while (ticks == start)
893 Without an optimization barrier in the loop, the compiler could
894 conclude that the loop would never terminate, because @code{start} and
895 @code{ticks} start out equal and the loop itself never changes them.
896 It could then ``optimize'' the function into an infinite loop, which
897 would definitely be undesirable.
899 Optimization barriers can be used to avoid other compiler
900 optimizations. The @func{busy_wait} function, also in
901 @file{devices/timer.c}, is an example. It contains this loop:
909 The goal of this loop is to busy-wait by counting @code{loops} down
910 from its original value to 0. Without the barrier, the compiler could
911 delete the loop entirely, because it produces no useful output and has
912 no side effects. The barrier forces the compiler to pretend that the
913 loop body has an important effect.
915 Finally, optimization barriers can be used to force the ordering of
916 memory reads or writes. For example, suppose we add a ``feature''
917 that, whenever a timer interrupt occurs, the character in global
918 variable @code{timer_put_char} is printed on the console, but only if
919 global Boolean variable @code{timer_do_put} is true. The best way to
920 set up @samp{x} to be printed is then to use an optimization barrier,
924 timer_put_char = 'x';
929 Without the barrier, the code is buggy because the compiler is free to
930 reorder operations when it doesn't see a reason to keep them in the
931 same order. In this case, the compiler doesn't know that the order of
932 assignments is important, so its optimizer is permitted to exchange
933 their order. There's no telling whether it will actually do this, and
934 it is possible that passing the compiler different optimization flags
935 or using a different version of the compiler will produce different
938 Another solution is to disable interrupts around the assignments.
939 This does not prevent reordering, but it prevents the interrupt
940 handler from intervening between the assignments. It also has the
941 extra runtime cost of disabling and re-enabling interrupts:
944 enum intr_level old_level = intr_disable ();
945 timer_put_char = 'x';
947 intr_set_level (old_level);
950 A second solution is to mark the declarations of
951 @code{timer_put_char} and @code{timer_do_put} as @samp{volatile}. This
952 keyword tells the compiler that the variables are externally observable
953 and restricts its latitude for optimization. However, the semantics of
954 @samp{volatile} are not well-defined, so it is not a good general
955 solution. The base Pintos code does not use @samp{volatile} at all.
957 The following is @emph{not} a solution, because locks neither prevent
958 interrupts nor prevent the compiler from reordering the code within the
959 region where the lock is held:
962 lock_acquire (&timer_lock); /* INCORRECT CODE */
963 timer_put_char = 'x';
965 lock_release (&timer_lock);
968 The compiler treats invocation of any function defined externally,
969 that is, in another source file, as a limited form of optimization
970 barrier. Specifically, the compiler assumes that any externally
971 defined function may access any statically or dynamically allocated
972 data and any local variable whose address is taken. This often means
973 that explicit barriers can be omitted. It is one reason that Pintos
974 contains few explicit barriers.
976 A function defined in the same source file, or in a header included by
977 the source file, cannot be relied upon as a optimization barrier.
978 This applies even to invocation of a function before its
979 definition, because the compiler may read and parse the entire source
980 file before performing optimization.
982 @node Interrupt Handling
983 @section Interrupt Handling
985 An @dfn{interrupt} notifies the CPU of some event. Much of the work
986 of an operating system relates to interrupts in one way or another.
987 For our purposes, we classify interrupts into two broad categories:
991 @dfn{Internal interrupts}, that is, interrupts caused directly by CPU
992 instructions. System calls, attempts at invalid memory access
993 (@dfn{page faults}), and attempts to divide by zero are some activities
994 that cause internal interrupts. Because they are caused by CPU
995 instructions, internal interrupts are @dfn{synchronous} or synchronized
996 with CPU instructions. @func{intr_disable} does not disable internal
1000 @dfn{External interrupts}, that is, interrupts originating outside the
1001 CPU. These interrupts come from hardware devices such as the system
1002 timer, keyboard, serial ports, and disks. External interrupts are
1003 @dfn{asynchronous}, meaning that their delivery is not
1004 synchronized with instruction execution. Handling of external interrupts
1005 can be postponed with @func{intr_disable} and related functions
1006 (@pxref{Disabling Interrupts}).
1009 The CPU treats both classes of interrupts largely the same way,
1010 so Pintos has common infrastructure to handle both classes.
1011 The following section describes this
1012 common infrastructure. The sections after that give the specifics of
1013 external and internal interrupts.
1015 If you haven't already read chapter 3, ``Basic Execution Environment,''
1016 in @bibref{IA32-v1}, it is recommended that you do so now. You might
1017 also want to skim chapter 5, ``Interrupt and Exception Handling,'' in
1021 * Interrupt Infrastructure::
1022 * Internal Interrupt Handling::
1023 * External Interrupt Handling::
1026 @node Interrupt Infrastructure
1027 @subsection Interrupt Infrastructure
1029 When an interrupt occurs, the CPU saves
1030 its most essential state on a stack and jumps to an interrupt
1031 handler routine. The 80@var{x}86 architecture supports 256
1032 interrupts, numbered 0 through 255, each with an independent
1033 handler defined in an array called the @dfn{interrupt
1034 descriptor table} or IDT.
1036 In Pintos, @func{intr_init} in @file{threads/interrupt.c} sets up the
1037 IDT so that each entry points to a unique entry point in
1038 @file{threads/intr-stubs.S} named @func{intr@var{NN}_stub}, where
1039 @var{NN} is the interrupt number in
1040 hexadecimal. Because the CPU doesn't give
1041 us any other way to find out the interrupt number, this entry point
1042 pushes the interrupt number on the stack. Then it jumps to
1043 @func{intr_entry}, which pushes all the registers that the processor
1044 didn't already push for us, and then calls @func{intr_handler}, which
1045 brings us back into C in @file{threads/interrupt.c}.
1047 The main job of @func{intr_handler} is to call the function
1048 registered for handling the particular interrupt. (If no
1049 function is registered, it dumps some information to the console and
1050 panics.) It also does some extra processing for external
1051 interrupts (@pxref{External Interrupt Handling}).
1053 When @func{intr_handler} returns, the assembly code in
1054 @file{threads/intr-stubs.S} restores all the CPU registers saved
1055 earlier and directs the CPU to return from the interrupt.
1057 The following types and functions are common to all
1060 @deftp {Type} {void intr_handler_func (struct intr_frame *@var{frame})}
1061 This is how an interrupt handler function must be declared. Its @var{frame}
1062 argument (see below) allows it to determine the cause of the interrupt
1063 and the state of the thread that was interrupted.
1066 @deftp {Type} {struct intr_frame}
1067 The stack frame of an interrupt handler, as saved by the CPU, the interrupt
1068 stubs, and @func{intr_entry}. Its most interesting members are described
1072 @deftypecv {Member} {@struct{intr_frame}} uint32_t edi
1073 @deftypecvx {Member} {@struct{intr_frame}} uint32_t esi
1074 @deftypecvx {Member} {@struct{intr_frame}} uint32_t ebp
1075 @deftypecvx {Member} {@struct{intr_frame}} uint32_t esp_dummy
1076 @deftypecvx {Member} {@struct{intr_frame}} uint32_t ebx
1077 @deftypecvx {Member} {@struct{intr_frame}} uint32_t edx
1078 @deftypecvx {Member} {@struct{intr_frame}} uint32_t ecx
1079 @deftypecvx {Member} {@struct{intr_frame}} uint32_t eax
1080 @deftypecvx {Member} {@struct{intr_frame}} uint16_t es
1081 @deftypecvx {Member} {@struct{intr_frame}} uint16_t ds
1082 Register values in the interrupted thread, pushed by @func{intr_entry}.
1083 The @code{esp_dummy} value isn't actually used (refer to the
1084 description of @code{PUSHA} in @bibref{IA32-v2b} for details).
1087 @deftypecv {Member} {@struct{intr_frame}} uint32_t vec_no
1088 The interrupt vector number, ranging from 0 to 255.
1091 @deftypecv {Member} {@struct{intr_frame}} uint32_t error_code
1092 The ``error code'' pushed on the stack by the CPU for some internal
1096 @deftypecv {Member} {@struct{intr_frame}} void (*eip) (void)
1097 The address of the next instruction to be executed by the interrupted
1101 @deftypecv {Member} {@struct{intr_frame}} {void *} esp
1102 The interrupted thread's stack pointer.
1105 @deftypefun {const char *} intr_name (uint8_t @var{vec})
1106 Returns the name of the interrupt numbered @var{vec}, or
1107 @code{"unknown"} if the interrupt has no registered name.
1110 @node Internal Interrupt Handling
1111 @subsection Internal Interrupt Handling
1113 Internal interrupts are caused directly by CPU instructions executed by
1114 the running kernel thread or user process (from project 2 onward). An
1115 internal interrupt is therefore said to arise in a ``process context.''
1117 In an internal interrupt's handler, it can make sense to examine the
1118 @struct{intr_frame} passed to the interrupt handler, or even to modify
1119 it. When the interrupt returns, modifications in @struct{intr_frame}
1120 become changes to the calling thread or process's state. For example,
1121 the Pintos system call handler returns a value to the user program by
1122 modifying the saved EAX register (@pxref{System Call Details}).
1124 There are no special restrictions on what an internal interrupt
1125 handler can or can't do. Generally they should run with interrupts
1126 enabled, just like other code, and so they can be preempted by other
1127 kernel threads. Thus, they do need to synchronize with other threads
1128 on shared data and other resources (@pxref{Synchronization}).
1130 Internal interrupt handlers can be invoked recursively. For example,
1131 the system call handler might cause a page fault while attempting to
1132 read user memory. Deep recursion would risk overflowing the limited
1133 kernel stack (@pxref{struct thread}), but should be unnecessary.
1135 @deftypefun void intr_register_int (uint8_t @var{vec}, int @var{dpl}, enum intr_level @var{level}, intr_handler_func *@var{handler}, const char *@var{name})
1136 Registers @var{handler} to be called when internal interrupt numbered
1137 @var{vec} is triggered. Names the interrupt @var{name} for debugging
1140 If @var{level} is @code{INTR_ON}, external interrupts will be processed
1141 normally during the interrupt handler's execution, which is normally
1142 desirable. Specifying @code{INTR_OFF} will cause the CPU to disable
1143 external interrupts when it invokes the interrupt handler. The effect
1144 is slightly different from calling @func{intr_disable} inside the
1145 handler, because that leaves a window of one or more CPU instructions in
1146 which external interrupts are still enabled. This is important for the
1147 page fault handler; refer to the comments in @file{userprog/exception.c}
1150 @var{dpl} determines how the interrupt can be invoked. If @var{dpl} is
1151 0, then the interrupt can be invoked only by kernel threads. Otherwise
1152 @var{dpl} should be 3, which allows user processes to invoke the
1153 interrupt with an explicit INT instruction. The value of @var{dpl}
1154 doesn't affect user processes' ability to invoke the interrupt
1155 indirectly, e.g.@: an invalid memory reference will cause a page fault
1156 regardless of @var{dpl}.
1159 @node External Interrupt Handling
1160 @subsection External Interrupt Handling
1162 External interrupts are caused by events outside the CPU.
1163 They are asynchronous, so they can be invoked at any time that
1164 interrupts have not been disabled. We say that an external interrupt
1165 runs in an ``interrupt context.''
1167 In an external interrupt, the @struct{intr_frame} passed to the
1168 handler is not very meaningful. It describes the state of the thread
1169 or process that was interrupted, but there is no way to predict which
1170 one that is. It is possible, although rarely useful, to examine it, but
1171 modifying it is a recipe for disaster.
1173 Only one external interrupt may be processed at a time. Neither
1174 internal nor external interrupt may nest within an external interrupt
1175 handler. Thus, an external interrupt's handler must run with interrupts
1176 disabled (@pxref{Disabling Interrupts}).
1178 An external interrupt handler must not sleep or yield, which rules out
1179 calling @func{lock_acquire}, @func{thread_yield}, and many other
1180 functions. Sleeping in interrupt context would effectively put the
1181 interrupted thread to sleep, too, until the interrupt handler was again
1182 scheduled and returned. This would be unfair to the unlucky thread, and
1183 it would deadlock if the handler were waiting for the sleeping thread
1184 to, e.g., release a lock.
1186 An external interrupt handler
1187 effectively monopolizes the machine and delays all other activities.
1188 Therefore, external interrupt handlers should complete as quickly as
1189 they can. Anything that require much CPU time should instead run in a
1190 kernel thread, possibly one that the interrupt triggers using a
1191 synchronization primitive.
1193 External interrupts are controlled by a
1194 pair of devices outside the CPU called @dfn{programmable interrupt
1195 controllers}, @dfn{PICs} for short. When @func{intr_init} sets up the
1196 CPU's IDT, it also initializes the PICs for interrupt handling. The
1197 PICs also must be ``acknowledged'' at the end of processing for each
1198 external interrupt. @func{intr_handler} takes care of that by calling
1199 @func{pic_end_of_interrupt}, which properly signals the PICs.
1201 The following functions relate to external
1204 @deftypefun void intr_register_ext (uint8_t @var{vec}, intr_handler_func *@var{handler}, const char *@var{name})
1205 Registers @var{handler} to be called when external interrupt numbered
1206 @var{vec} is triggered. Names the interrupt @var{name} for debugging
1207 purposes. The handler will run with interrupts disabled.
1210 @deftypefun bool intr_context (void)
1211 Returns true if we are running in an interrupt context, otherwise
1212 false. Mainly used in functions that might sleep
1213 or that otherwise should not be called from interrupt context, in this
1216 ASSERT (!intr_context ());
1220 @deftypefun void intr_yield_on_return (void)
1221 When called in an interrupt context, causes @func{thread_yield} to be
1222 called just before the interrupt returns. Used
1223 in the timer interrupt handler when a thread's time slice expires, to
1224 cause a new thread to be scheduled.
1227 @node Memory Allocation
1228 @section Memory Allocation
1230 Pintos contains two memory allocators, one that allocates memory in
1231 units of a page, and one that can allocate blocks of any size.
1238 @node Page Allocator
1239 @subsection Page Allocator
1241 The page allocator declared in @file{threads/palloc.h} allocates
1242 memory in units of a page. It is most often used to allocate memory
1243 one page at a time, but it can also allocate multiple contiguous pages
1246 The page allocator divides the memory it allocates into two pools,
1247 called the kernel and user pools. By default, each pool gets half of
1248 system memory above @w{1 MB}, but the division can be changed with the
1251 option (@pxref{Why PAL_USER?}). An allocation request draws from one
1252 pool or the other. If one pool becomes empty, the other may still
1253 have free pages. The user pool should be used for allocating memory
1254 for user processes and the kernel pool for all other allocations.
1255 This will only become important starting with project 3. Until then,
1256 all allocations should be made from the kernel pool.
1258 Each pool's usage is tracked with a bitmap, one bit per page in
1259 the pool. A request to allocate @var{n} pages scans the bitmap
1260 for @var{n} consecutive bits set to
1261 false, indicating that those pages are free, and then sets those bits
1262 to true to mark them as used. This is a ``first fit'' allocation
1263 strategy (@pxref{Wilson}).
1265 The page allocator is subject to fragmentation. That is, it may not
1266 be possible to allocate @var{n} contiguous pages even though @var{n}
1267 or more pages are free, because the free pages are separated by used
1268 pages. In fact, in pathological cases it may be impossible to
1269 allocate 2 contiguous pages even though half of the pool's pages are free.
1270 Single-page requests can't fail due to fragmentation, so
1271 requests for multiple contiguous pages should be limited as much as
1274 Pages may not be allocated from interrupt context, but they may be
1277 When a page is freed, all of its bytes are cleared to @t{0xcc}, as
1278 a debugging aid (@pxref{Debugging Tips}).
1280 Page allocator types and functions are described below.
1282 @deftypefun {void *} palloc_get_page (enum palloc_flags @var{flags})
1283 @deftypefunx {void *} palloc_get_multiple (enum palloc_flags @var{flags}, size_t @var{page_cnt})
1284 Obtains and returns one page, or @var{page_cnt} contiguous pages,
1285 respectively. Returns a null pointer if the pages cannot be allocated.
1287 The @var{flags} argument may be any combination of the following flags:
1289 @defvr {Page Allocator Flag} @code{PAL_ASSERT}
1290 If the pages cannot be allocated, panic the kernel. This is only
1291 appropriate during kernel initialization. User processes
1292 should never be permitted to panic the kernel.
1295 @defvr {Page Allocator Flag} @code{PAL_ZERO}
1296 Zero all the bytes in the allocated pages before returning them. If not
1297 set, the contents of newly allocated pages are unpredictable.
1300 @defvr {Page Allocator Flag} @code{PAL_USER}
1301 Obtain the pages from the user pool. If not set, pages are allocated
1302 from the kernel pool.
1306 @deftypefun void palloc_free_page (void *@var{page})
1307 @deftypefunx void palloc_free_multiple (void *@var{pages}, size_t @var{page_cnt})
1308 Frees one page, or @var{page_cnt} contiguous pages, respectively,
1309 starting at @var{pages}. All of the pages must have been obtained using
1310 @func{palloc_get_page} or @func{palloc_get_multiple}.
1313 @node Block Allocator
1314 @subsection Block Allocator
1316 The block allocator, declared in @file{threads/malloc.h}, can allocate
1317 blocks of any size. It is layered on top of the page allocator
1318 described in the previous section. Blocks returned by the block
1319 allocator are obtained from the kernel pool.
1321 The block allocator uses two different strategies for allocating memory.
1322 The first strategy applies to blocks that are 1 kB or smaller
1323 (one-fourth of the page size). These allocations are rounded up to the
1324 nearest power of 2, or 16 bytes, whichever is larger. Then they are
1325 grouped into a page used only for allocations of that size.
1327 The second strategy applies to blocks larger than 1 kB.
1328 These allocations (plus a small amount of overhead) are rounded up to
1329 the nearest page in size, and then the block allocator requests that
1330 number of contiguous pages from the page allocator.
1332 In either case, the difference between the allocation requested size
1333 and the actual block size is wasted. A real operating system would
1334 carefully tune its allocator to minimize this waste, but this is
1335 unimportant in an instructional system like Pintos.
1337 As long as a page can be obtained from the page allocator, small
1338 allocations always succeed. Most small allocations do not require a
1339 new page from the page allocator at all, because they are satisfied
1340 using part of a page already allocated. However, large allocations
1341 always require calling into the page allocator, and any allocation
1342 that needs more than one contiguous page can fail due to fragmentation,
1343 as already discussed in the previous section. Thus, you should
1344 minimize the number of large allocations in your code, especially
1345 those over approximately 4 kB each.
1347 When a block is freed, all of its bytes are cleared to @t{0xcc}, as
1348 a debugging aid (@pxref{Debugging Tips}).
1350 The block allocator may not be called from interrupt context.
1352 The block allocator functions are described below. Their interfaces are
1353 the same as the standard C library functions of the same names.
1355 @deftypefun {void *} malloc (size_t @var{size})
1356 Obtains and returns a new block, from the kernel pool, at least
1357 @var{size} bytes long. Returns a null pointer if @var{size} is zero or
1358 if memory is not available.
1361 @deftypefun {void *} calloc (size_t @var{a}, size_t @var{b})
1362 Obtains a returns a new block, from the kernel pool, at least
1363 @code{@var{a} * @var{b}} bytes long. The block's contents will be
1364 cleared to zeros. Returns a null pointer if @var{a} or @var{b} is zero
1365 or if insufficient memory is available.
1368 @deftypefun {void *} realloc (void *@var{block}, size_t @var{new_size})
1369 Attempts to resize @var{block} to @var{new_size} bytes, possibly moving
1370 it in the process. If successful, returns the new block, in which case
1371 the old block must no longer be accessed. On failure, returns a null
1372 pointer, and the old block remains valid.
1374 A call with @var{block} null is equivalent to @func{malloc}. A call
1375 with @var{new_size} zero is equivalent to @func{free}.
1378 @deftypefun void free (void *@var{block})
1379 Frees @var{block}, which must have been previously returned by
1380 @func{malloc}, @func{calloc}, or @func{realloc} (and not yet freed).
1383 @node Virtual Addresses
1384 @section Virtual Addresses
1386 A 32-bit virtual address can be divided into a 20-bit @dfn{page number}
1387 and a 12-bit @dfn{page offset} (or just @dfn{offset}), like this:
1392 +-------------------+-----------+
1393 | Page Number | Offset |
1394 +-------------------+-----------+
1399 Header @file{threads/vaddr.h} defines these functions and macros for
1400 working with virtual addresses:
1404 The bit index (0) and number of bits (12) of the offset part of a
1405 virtual address, respectively.
1409 A bit mask with the bits in the page offset set to 1, the rest set to 0
1414 The page size in bytes (4,096).
1417 @deftypefun unsigned pg_ofs (const void *@var{va})
1418 Extracts and returns the page offset in virtual address @var{va}.
1421 @deftypefun uintptr_t pg_no (const void *@var{va})
1422 Extracts and returns the page number in virtual address @var{va}.
1425 @deftypefun {void *} pg_round_down (const void *@var{va})
1426 Returns the start of the virtual page that @var{va} points within, that
1427 is, @var{va} with the page offset set to 0.
1430 @deftypefun {void *} pg_round_up (const void *@var{va})
1431 Returns @var{va} rounded up to the nearest page boundary.
1434 Virtual memory in Pintos is divided into two regions: user virtual
1435 memory and kernel virtual memory (@pxref{Virtual Memory Layout}). The
1436 boundary between them is @code{PHYS_BASE}:
1439 Base address of kernel virtual memory. It defaults to @t{0xc0000000} (3
1440 GB), but it may be changed to any multiple of @t{0x10000000} from
1441 @t{0x80000000} to @t{0xf0000000}.
1443 User virtual memory ranges from virtual address 0 up to
1444 @code{PHYS_BASE}. Kernel virtual memory occupies the rest of the
1445 virtual address space, from @code{PHYS_BASE} up to 4 GB.
1448 @deftypefun {bool} is_user_vaddr (const void *@var{va})
1449 @deftypefunx {bool} is_kernel_vaddr (const void *@var{va})
1450 Returns true if @var{va} is a user or kernel virtual address,
1451 respectively, false otherwise.
1454 The 80@var{x}86 doesn't provide any way to directly access memory given
1455 a physical address. This ability is often necessary in an operating
1456 system kernel, so Pintos works around it by mapping kernel virtual
1457 memory one-to-one to physical memory. That is, virtual address
1458 @code{PHYS_BASE} accesses physical address 0, virtual address
1459 @code{PHYS_BASE} + @t{0x1234} accesses physical address @t{0x1234}, and
1460 so on up to the size of the machine's physical memory. Thus, adding
1461 @code{PHYS_BASE} to a physical address obtains a kernel virtual address
1462 that accesses that address; conversely, subtracting @code{PHYS_BASE}
1463 from a kernel virtual address obtains the corresponding physical
1464 address. Header @file{threads/vaddr.h} provides a pair of functions to
1465 do these translations:
1467 @deftypefun {void *} ptov (uintptr_t @var{pa})
1468 Returns the kernel virtual address corresponding to physical address
1469 @var{pa}, which should be between 0 and the number of bytes of physical
1473 @deftypefun {uintptr_t} vtop (void *@var{va})
1474 Returns the physical address corresponding to @var{va}, which must be a
1475 kernel virtual address.
1481 The code in @file{pagedir.c} is an abstract interface to the 80@var{x}86
1482 hardware page table, also called a ``page directory'' by Intel processor
1483 documentation. The page table interface uses a @code{uint32_t *} to
1484 represent a page table because this is convenient for accessing their
1487 The sections below describe the page table interface and internals.
1490 * Page Table Creation Destruction Activation::
1491 * Page Tables Inspection and Updates::
1492 * Page Table Accessed and Dirty Bits::
1493 * Page Table Details::
1496 @node Page Table Creation Destruction Activation
1497 @subsection Creation, Destruction, and Activation
1499 These functions create, destroy, and activate page tables. The base
1500 Pintos code already calls these functions where necessary, so it should
1501 not be necessary to call them yourself.
1503 @deftypefun {uint32_t *} pagedir_create (void)
1504 Creates and returns a new page table. The new page table contains
1505 Pintos's normal kernel virtual page mappings, but no user virtual
1508 Returns a null pointer if memory cannot be obtained.
1511 @deftypefun void pagedir_destroy (uint32_t *@var{pd})
1512 Frees all of the resources held by @var{pd}, including the page table
1513 itself and the frames that it maps.
1516 @deftypefun void pagedir_activate (uint32_t *@var{pd})
1517 Activates @var{pd}. The active page table is the one used by the CPU to
1518 translate memory references.
1521 @node Page Tables Inspection and Updates
1522 @subsection Inspection and Updates
1524 These functions examine or update the mappings from pages to frames
1525 encapsulated by a page table. They work on both active and inactive
1526 page tables (that is, those for running and suspended processes),
1527 flushing the TLB as necessary.
1529 @deftypefun bool pagedir_set_page (uint32_t *@var{pd}, void *@var{upage}, void *@var{kpage}, bool @var{writable})
1530 Adds to @var{pd} a mapping from user page @var{upage} to the frame identified
1531 by kernel virtual address @var{kpage}. If @var{writable} is true, the
1532 page is mapped read/write; otherwise, it is mapped read-only.
1534 User page @var{upage} must not already be mapped in @var{pd}.
1536 Kernel page @var{kpage} should be a kernel virtual address obtained from
1537 the user pool with @code{palloc_get_page(PAL_USER)} (@pxref{Why
1540 Returns true if successful, false on failure. Failure will occur if
1541 additional memory required for the page table cannot be obtained.
1544 @deftypefun {void *} pagedir_get_page (uint32_t *@var{pd}, const void *@var{uaddr})
1545 Looks up the frame mapped to @var{uaddr} in @var{pd}. Returns the
1546 kernel virtual address for that frame, if @var{uaddr} is mapped, or a
1547 null pointer if it is not.
1550 @deftypefun void pagedir_clear_page (uint32_t *@var{pd}, void *@var{page})
1551 Marks @var{page} ``not present'' in @var{pd}. Later accesses to
1552 the page will fault.
1554 Other bits in the page table for @var{page} are preserved, permitting
1555 the accessed and dirty bits (see the next section) to be checked.
1557 This function has no effect if @var{page} is not mapped.
1560 @node Page Table Accessed and Dirty Bits
1561 @subsection Accessed and Dirty Bits
1563 80@var{x}86 hardware provides some assistance for implementing page
1564 replacement algorithms, through a pair of bits in the page table entry
1565 (PTE) for each page. On any read or write to a page, the CPU sets the
1566 @dfn{accessed bit} to 1 in the page's PTE, and on any write, the CPU
1567 sets the @dfn{dirty bit} to 1. The CPU never resets these bits to 0,
1568 but the OS may do so.
1570 Proper interpretation of these bits requires understanding of
1571 @dfn{aliases}, that is, two (or more) pages that refer to the same
1572 frame. When an aliased frame is accessed, the accessed and dirty bits
1573 are updated in only one page table entry (the one for the page used for
1574 access). The accessed and dirty bits for the other aliases are not
1577 @xref{Accessed and Dirty Bits}, on applying these bits in implementing
1578 page replacement algorithms.
1580 @deftypefun bool pagedir_is_dirty (uint32_t *@var{pd}, const void *@var{page})
1581 @deftypefunx bool pagedir_is_accessed (uint32_t *@var{pd}, const void *@var{page})
1582 Returns true if page directory @var{pd} contains a page table entry for
1583 @var{page} that is marked dirty (or accessed). Otherwise,
1587 @deftypefun void pagedir_set_dirty (uint32_t *@var{pd}, const void *@var{page}, bool @var{value})
1588 @deftypefunx void pagedir_set_accessed (uint32_t *@var{pd}, const void *@var{page}, bool @var{value})
1589 If page directory @var{pd} has a page table entry for @var{page}, then
1590 its dirty (or accessed) bit is set to @var{value}.
1593 @node Page Table Details
1594 @subsection Page Table Details
1596 The functions provided with Pintos are sufficient to implement the
1597 projects. However, you may still find it worthwhile to understand the
1598 hardware page table format, so we'll go into a little detail in this
1602 * Page Table Structure::
1603 * Page Table Entry Format::
1604 * Page Directory Entry Format::
1607 @node Page Table Structure
1608 @subsubsection Structure
1610 The top-level paging data structure is a page called the ``page
1611 directory'' (PD) arranged as an array of 1,024 32-bit page directory
1612 entries (PDEs), each of which represents 4 MB of virtual memory. Each
1613 PDE may point to the physical address of another page called a
1614 ``page table'' (PT) arranged, similarly, as an array of 1,024
1615 32-bit page table entries (PTEs), each of which translates a single 4
1616 kB virtual page to a physical page.
1618 Translation of a virtual address into a physical address follows
1619 the three-step process illustrated in the diagram
1620 below:@footnote{Actually, virtual to physical translation on the
1621 80@var{x}86 architecture occurs via an intermediate ``linear
1622 address,'' but Pintos (and most modern 80@var{x}86 OSes) set up the CPU
1623 so that linear and virtual addresses are one and the same. Thus, you
1624 can effectively ignore this CPU feature.}
1628 The most-significant 10 bits of the virtual address (bits 22@dots{}31)
1629 index the page directory. If the PDE is marked ``present,'' the
1630 physical address of a page table is read from the PDE thus obtained.
1631 If the PDE is marked ``not present'' then a page fault occurs.
1634 The next 10 bits of the virtual address (bits 12@dots{}21) index
1635 the page table. If the PTE is marked ``present,'' the physical
1636 address of a data page is read from the PTE thus obtained. If the PTE
1637 is marked ``not present'' then a page fault occurs.
1640 The least-significant 12 bits of the virtual address (bits 0@dots{}11)
1641 are added to the data page's physical base address, yielding the final
1648 +----------------------+----------------------+----------------------+
1649 | Page Directory Index | Page Table Index | Page Offset |
1650 +----------------------+----------------------+----------------------+
1652 _______/ _______/ _____/
1654 / Page Directory / Page Table / Data Page
1655 / .____________. / .____________. / .____________.
1656 |1,023|____________| |1,023|____________| | |____________|
1657 |1,022|____________| |1,022|____________| | |____________|
1658 |1,021|____________| |1,021|____________| \__\|____________|
1659 |1,020|____________| |1,020|____________| /|____________|
1661 | | | \____\| |_ | |
1662 | | . | /| . | \ | . |
1663 \____\| . |_ | . | | | . |
1664 /| . | \ | . | | | . |
1665 | . | | | . | | | . |
1667 |____________| | |____________| | |____________|
1668 4|____________| | 4|____________| | |____________|
1669 3|____________| | 3|____________| | |____________|
1670 2|____________| | 2|____________| | |____________|
1671 1|____________| | 1|____________| | |____________|
1672 0|____________| \__\0|____________| \____\|____________|
1677 Pintos provides some macros and functions that are useful for working
1678 with raw page tables:
1682 The starting bit index (12) and number of bits (10), respectively, in a
1687 A bit mask with the bits in the page table index set to 1 and the rest
1688 set to 0 (@t{0x3ff000}).
1692 The number of bytes of virtual address space that a single page table
1693 page covers (4,194,304 bytes, or 4 MB).
1698 The starting bit index (22) and number of bits (10), respectively, in a
1699 page directory index.
1703 A bit mask with the bits in the page directory index set to 1 and other
1704 bits set to 0 (@t{0xffc00000}).
1707 @deftypefun uintptr_t pd_no (const void *@var{va})
1708 @deftypefunx uintptr_t pt_no (const void *@var{va})
1709 Returns the page directory index or page table index, respectively, for
1710 virtual address @var{va}. These functions are defined in
1711 @file{threads/pte.h}.
1714 @deftypefun unsigned pg_ofs (const void *@var{va})
1715 Returns the page offset for virtual address @var{va}. This function is
1716 defined in @file{threads/vaddr.h}.
1719 @node Page Table Entry Format
1720 @subsubsection Page Table Entry Format
1722 You do not need to understand the PTE format to do the Pintos
1723 projects, unless you wish to incorporate the page table into your
1724 supplemental page table (@pxref{Managing the Supplemental Page Table}).
1726 The actual format of a page table entry is summarized below. For
1727 complete information, refer to section 3.7, ``Page Translation Using
1728 32-Bit Physical Addressing,'' in @bibref{IA32-v3a}.
1732 31 12 11 9 6 5 2 1 0
1733 +---------------------------------------+----+----+-+-+---+-+-+-+
1734 | Physical Address | AVL| |D|A| |U|W|P|
1735 +---------------------------------------+----+----+-+-+---+-+-+-+
1739 Some more information on each bit is given below. The names are
1740 @file{threads/pte.h} macros that represent the bits' values:
1743 Bit 0, the ``present'' bit. When this bit is 1, the
1744 other bits are interpreted as described below. When this bit is 0, any
1745 attempt to access the page will page fault. The remaining bits are then
1746 not used by the CPU and may be used by the OS for any purpose.
1750 Bit 1, the ``read/write'' bit. When it is 1, the page
1751 is writable. When it is 0, write attempts will page fault.
1755 Bit 2, the ``user/supervisor'' bit. When it is 1, user
1756 processes may access the page. When it is 0, only the kernel may access
1757 the page (user accesses will page fault).
1759 Pintos clears this bit in PTEs for kernel virtual memory, to prevent
1760 user processes from accessing them.
1764 Bit 5, the ``accessed'' bit. @xref{Page Table Accessed
1769 Bit 6, the ``dirty'' bit. @xref{Page Table Accessed and
1774 Bits 9@dots{}11, available for operating system use.
1775 Pintos, as provided, does not use them and sets them to 0.
1779 Bits 12@dots{}31, the top 20 bits of the physical address of a frame.
1780 The low 12 bits of the frame's address are always 0.
1783 Other bits are either reserved or uninteresting in a Pintos context and
1784 should be set to@tie{}0.
1786 Header @file{threads/pte.h} defines three functions for working with
1789 @deftypefun uint32_t pte_create_kernel (uint32_t *@var{page}, bool @var{writable})
1790 Returns a page table entry that points to @var{page}, which should be a
1791 kernel virtual address. The PTE's present bit will be set. It will be
1792 marked for kernel-only access. If @var{writable} is true, the PTE will
1793 also be marked read/write; otherwise, it will be read-only.
1796 @deftypefun uint32_t pte_create_user (uint32_t *@var{page}, bool @var{writable})
1797 Returns a page table entry that points to @var{page}, which should be
1798 the kernel virtual address of a frame in the user pool (@pxref{Why
1799 PAL_USER?}). The PTE's present bit will be set and it will be marked to
1800 allow user-mode access. If @var{writable} is true, the PTE will also be
1801 marked read/write; otherwise, it will be read-only.
1804 @deftypefun {void *} pte_get_page (uint32_t @var{pte})
1805 Returns the kernel virtual address for the frame that @var{pte} points
1806 to. The @var{pte} may be present or not-present; if it is not-present
1807 then the pointer returned is only meaningful if the address bits in the PTE
1808 actually represent a physical address.
1811 @node Page Directory Entry Format
1812 @subsubsection Page Directory Entry Format
1814 Page directory entries have the same format as PTEs, except that the
1815 physical address points to a page table page instead of a frame. Header
1816 @file{threads/pte.h} defines two functions for working with page
1819 @deftypefun uint32_t pde_create (uint32_t *@var{pt})
1820 Returns a page directory that points to @var{page}, which should be the
1821 kernel virtual address of a page table page. The PDE's present bit will
1822 be set, it will be marked to allow user-mode access, and it will be
1826 @deftypefun {uint32_t *} pde_get_pt (uint32_t @var{pde})
1827 Returns the kernel virtual address for the page table page that
1828 @var{pde}, which must be marked present, points to.
1834 Pintos provides a hash table data structure in @file{lib/kernel/hash.c}.
1835 To use it you will need to include its header file,
1836 @file{lib/kernel/hash.h}, with @code{#include <hash.h>}.
1837 No code provided with Pintos uses the hash table, which means that you
1838 are free to use it as is, modify its implementation for your own
1839 purposes, or ignore it, as you wish.
1841 Most implementations of the virtual memory project use a hash table to
1842 translate pages to frames. You may find other uses for hash tables as
1847 * Basic Hash Functions::
1848 * Hash Search Functions::
1849 * Hash Iteration Functions::
1850 * Hash Table Example::
1851 * Hash Auxiliary Data::
1852 * Hash Synchronization::
1855 @node Hash Data Types
1856 @subsection Data Types
1858 A hash table is represented by @struct{hash}.
1860 @deftp {Type} {struct hash}
1861 Represents an entire hash table. The actual members of @struct{hash}
1862 are ``opaque.'' That is, code that uses a hash table should not access
1863 @struct{hash} members directly, nor should it need to. Instead, use
1864 hash table functions and macros.
1867 The hash table operates on elements of type @struct{hash_elem}.
1869 @deftp {Type} {struct hash_elem}
1870 Embed a @struct{hash_elem} member in the structure you want to include
1871 in a hash table. Like @struct{hash}, @struct{hash_elem} is opaque.
1872 All functions for operating on hash table elements actually take and
1873 return pointers to @struct{hash_elem}, not pointers to your hash table's
1877 You will often need to obtain a @struct{hash_elem} given a real element
1878 of the hash table, and vice versa. Given a real element of the hash
1879 table, you may use the @samp{&} operator to obtain a pointer to its
1880 @struct{hash_elem}. Use the @code{hash_entry()} macro to go the other
1883 @deftypefn {Macro} {@var{type} *} hash_entry (struct hash_elem *@var{elem}, @var{type}, @var{member})
1884 Returns a pointer to the structure that @var{elem}, a pointer to a
1885 @struct{hash_elem}, is embedded within. You must provide @var{type},
1886 the name of the structure that @var{elem} is inside, and @var{member},
1887 the name of the member in @var{type} that @var{elem} points to.
1889 For example, suppose @code{h} is a @code{struct hash_elem *} variable
1890 that points to a @struct{thread} member (of type @struct{hash_elem})
1891 named @code{h_elem}. Then, @code{hash_entry@tie{}(h, struct thread, h_elem)}
1892 yields the address of the @struct{thread} that @code{h} points within.
1895 @xref{Hash Table Example}, for an example.
1897 Each hash table element must contain a key, that is, data that
1898 identifies and distinguishes elements, which must be unique
1899 among elements in the hash table. (Elements may
1900 also contain non-key data that need not be unique.) While an element is
1901 in a hash table, its key data must not be changed. Instead, if need be,
1902 remove the element from the hash table, modify its key, then reinsert
1905 For each hash table, you must write two functions that act on keys: a
1906 hash function and a comparison function. These functions must match the
1907 following prototypes:
1909 @deftp {Type} {unsigned hash_hash_func (const struct hash_elem *@var{element}, void *@var{aux})}
1910 Returns a hash of @var{element}'s data, as a value anywhere in the range
1911 of @code{unsigned int}. The hash of an element should be a
1912 pseudo-random function of the element's key. It must not depend on
1913 non-key data in the element or on any non-constant data other than the
1914 key. Pintos provides the following functions as a suitable basis for
1917 @deftypefun unsigned hash_bytes (const void *@var{buf}, size_t *@var{size})
1918 Returns a hash of the @var{size} bytes starting at @var{buf}. The
1919 implementation is the general-purpose
1920 @uref{http://en.wikipedia.org/wiki/Fowler_Noll_Vo_hash, Fowler-Noll-Vo
1921 hash} for 32-bit words.
1924 @deftypefun unsigned hash_string (const char *@var{s})
1925 Returns a hash of null-terminated string @var{s}.
1928 @deftypefun unsigned hash_int (int @var{i})
1929 Returns a hash of integer @var{i}.
1932 If your key is a single piece of data of an appropriate type, it is
1933 sensible for your hash function to directly return the output of one of
1934 these functions. For multiple pieces of data, you may wish to combine
1935 the output of more than one call to them using, e.g., the @samp{^}
1937 operator. Finally, you may entirely ignore these functions and write
1938 your own hash function from scratch, but remember that your goal is to
1939 build an operating system kernel, not to design a hash function.
1941 @xref{Hash Auxiliary Data}, for an explanation of @var{aux}.
1944 @deftp {Type} {bool hash_less_func (const struct hash_elem *@var{a}, const struct hash_elem *@var{b}, void *@var{aux})}
1945 Compares the keys stored in elements @var{a} and @var{b}. Returns
1946 true if @var{a} is less than @var{b}, false if @var{a} is greater than
1947 or equal to @var{b}.
1949 If two elements compare equal, then they must hash to equal values.
1951 @xref{Hash Auxiliary Data}, for an explanation of @var{aux}.
1954 @xref{Hash Table Example}, for hash and comparison function examples.
1956 A few functions accept a pointer to a third kind of
1957 function as an argument:
1959 @deftp {Type} {void hash_action_func (struct hash_elem *@var{element}, void *@var{aux})}
1960 Performs some kind of action, chosen by the caller, on @var{element}.
1962 @xref{Hash Auxiliary Data}, for an explanation of @var{aux}.
1965 @node Basic Hash Functions
1966 @subsection Basic Functions
1968 These functions create, destroy, and inspect hash tables.
1970 @deftypefun bool hash_init (struct hash *@var{hash}, hash_hash_func *@var{hash_func}, hash_less_func *@var{less_func}, void *@var{aux})
1971 Initializes @var{hash} as a hash table with @var{hash_func} as hash
1972 function, @var{less_func} as comparison function, and @var{aux} as
1974 Returns true if successful, false on failure. @func{hash_init} calls
1975 @func{malloc} and fails if memory cannot be allocated.
1977 @xref{Hash Auxiliary Data}, for an explanation of @var{aux}, which is
1978 most often a null pointer.
1981 @deftypefun void hash_clear (struct hash *@var{hash}, hash_action_func *@var{action})
1982 Removes all the elements from @var{hash}, which must have been
1983 previously initialized with @func{hash_init}.
1985 If @var{action} is non-null, then it is called once for each element in
1986 the hash table, which gives the caller an opportunity to deallocate any
1987 memory or other resources used by the element. For example, if the hash
1988 table elements are dynamically allocated using @func{malloc}, then
1989 @var{action} could @func{free} the element. This is safe because
1990 @func{hash_clear} will not access the memory in a given hash element
1991 after calling @var{action} on it. However, @var{action} must not call
1992 any function that may modify the hash table, such as @func{hash_insert}
1993 or @func{hash_delete}.
1996 @deftypefun void hash_destroy (struct hash *@var{hash}, hash_action_func *@var{action})
1997 If @var{action} is non-null, calls it for each element in the hash, with
1998 the same semantics as a call to @func{hash_clear}. Then, frees the
1999 memory held by @var{hash}. Afterward, @var{hash} must not be passed to
2000 any hash table function, absent an intervening call to @func{hash_init}.
2003 @deftypefun size_t hash_size (struct hash *@var{hash})
2004 Returns the number of elements currently stored in @var{hash}.
2007 @deftypefun bool hash_empty (struct hash *@var{hash})
2008 Returns true if @var{hash} currently contains no elements,
2009 false if @var{hash} contains at least one element.
2012 @node Hash Search Functions
2013 @subsection Search Functions
2015 Each of these functions searches a hash table for an element that
2016 compares equal to one provided. Based on the success of the search,
2017 they perform some action, such as inserting a new element into the hash
2018 table, or simply return the result of the search.
2020 @deftypefun {struct hash_elem *} hash_insert (struct hash *@var{hash}, struct hash_elem *@var{element})
2021 Searches @var{hash} for an element equal to @var{element}. If none is
2022 found, inserts @var{element} into @var{hash} and returns a null pointer.
2023 If the table already contains an element equal to @var{element}, it is
2024 returned without modifying @var{hash}.
2027 @deftypefun {struct hash_elem *} hash_replace (struct hash *@var{hash}, struct hash_elem *@var{element})
2028 Inserts @var{element} into @var{hash}. Any element equal to
2029 @var{element} already in @var{hash} is removed. Returns the element
2030 removed, or a null pointer if @var{hash} did not contain an element
2031 equal to @var{element}.
2033 The caller is responsible for deallocating any resources associated with
2034 the returned element, as appropriate. For example, if the hash table
2035 elements are dynamically allocated using @func{malloc}, then the caller
2036 must @func{free} the element after it is no longer needed.
2039 The element passed to the following functions is only used for hashing
2040 and comparison purposes. It is never actually inserted into the hash
2041 table. Thus, only key data in the element needs to be initialized, and
2042 other data in the element will not be used. It often makes sense to
2043 declare an instance of the element type as a local variable, initialize
2044 the key data, and then pass the address of its @struct{hash_elem} to
2045 @func{hash_find} or @func{hash_delete}. @xref{Hash Table Example}, for
2046 an example. (Large structures should not be
2047 allocated as local variables. @xref{struct thread}, for more
2050 @deftypefun {struct hash_elem *} hash_find (struct hash *@var{hash}, struct hash_elem *@var{element})
2051 Searches @var{hash} for an element equal to @var{element}. Returns the
2052 element found, if any, or a null pointer otherwise.
2055 @deftypefun {struct hash_elem *} hash_delete (struct hash *@var{hash}, struct hash_elem *@var{element})
2056 Searches @var{hash} for an element equal to @var{element}. If one is
2057 found, it is removed from @var{hash} and returned. Otherwise, a null
2058 pointer is returned and @var{hash} is unchanged.
2060 The caller is responsible for deallocating any resources associated with
2061 the returned element, as appropriate. For example, if the hash table
2062 elements are dynamically allocated using @func{malloc}, then the caller
2063 must @func{free} the element after it is no longer needed.
2066 @node Hash Iteration Functions
2067 @subsection Iteration Functions
2069 These functions allow iterating through the elements in a hash table.
2070 Two interfaces are supplied. The first requires writing and supplying a
2071 @var{hash_action_func} to act on each element (@pxref{Hash Data Types}).
2073 @deftypefun void hash_apply (struct hash *@var{hash}, hash_action_func *@var{action})
2074 Calls @var{action} once for each element in @var{hash}, in arbitrary
2075 order. @var{action} must not call any function that may modify the hash
2076 table, such as @func{hash_insert} or @func{hash_delete}. @var{action}
2077 must not modify key data in elements, although it may modify any other
2081 The second interface is based on an ``iterator'' data type.
2082 Idiomatically, iterators are used as follows:
2085 struct hash_iterator i;
2088 while (hash_next (&i))
2090 struct foo *f = hash_entry (hash_cur (&i), struct foo, elem);
2091 @r{@dots{}do something with @i{f}@dots{}}
2095 @deftp {Type} {struct hash_iterator}
2096 Represents a position within a hash table. Calling any function that
2097 may modify a hash table, such as @func{hash_insert} or
2098 @func{hash_delete}, invalidates all iterators within that hash table.
2100 Like @struct{hash} and @struct{hash_elem}, @struct{hash_elem} is opaque.
2103 @deftypefun void hash_first (struct hash_iterator *@var{iterator}, struct hash *@var{hash})
2104 Initializes @var{iterator} to just before the first element in
2108 @deftypefun {struct hash_elem *} hash_next (struct hash_iterator *@var{iterator})
2109 Advances @var{iterator} to the next element in @var{hash}, and returns
2110 that element. Returns a null pointer if no elements remain. After
2111 @func{hash_next} returns null for @var{iterator}, calling it again
2112 yields undefined behavior.
2115 @deftypefun {struct hash_elem *} hash_cur (struct hash_iterator *@var{iterator})
2116 Returns the value most recently returned by @func{hash_next} for
2117 @var{iterator}. Yields undefined behavior after @func{hash_first} has
2118 been called on @var{iterator} but before @func{hash_next} has been
2119 called for the first time.
2122 @node Hash Table Example
2123 @subsection Hash Table Example
2125 Suppose you have a structure, called @struct{page}, that you
2126 want to put into a hash table. First, define @struct{page} to include a
2127 @struct{hash_elem} member:
2132 struct hash_elem hash_elem; /* @r{Hash table element.} */
2133 void *addr; /* @r{Virtual address.} */
2134 /* @r{@dots{}other members@dots{}} */
2138 We write a hash function and a comparison function using @var{addr} as
2139 the key. A pointer can be hashed based on its bytes, and the @samp{<}
2140 operator works fine for comparing pointers:
2143 /* @r{Returns a hash value for page @var{p}.} */
2145 page_hash (const struct hash_elem *p_, void *aux UNUSED)
2147 const struct page *p = hash_entry (p_, struct page, hash_elem);
2148 return hash_bytes (&p->addr, sizeof p->addr);
2151 /* @r{Returns true if page @var{a} precedes page @var{b}.} */
2153 page_less (const struct hash_elem *a_, const struct hash_elem *b_,
2156 const struct page *a = hash_entry (a_, struct page, hash_elem);
2157 const struct page *b = hash_entry (b_, struct page, hash_elem);
2159 return a->addr < b->addr;
2164 (The use of @code{UNUSED} in these functions' prototypes suppresses a
2165 warning that @var{aux} is unused. @xref{Function and Parameter
2166 Attributes}, for information about @code{UNUSED}. @xref{Hash Auxiliary
2167 Data}, for an explanation of @var{aux}.)
2169 Then, we can create a hash table like this:
2174 hash_init (&pages, page_hash, page_less, NULL);
2177 Now we can manipulate the hash table we've created. If @code{@var{p}}
2178 is a pointer to a @struct{page}, we can insert it into the hash table
2182 hash_insert (&pages, &p->hash_elem);
2185 @noindent If there's a chance that @var{pages} might already contain a
2186 page with the same @var{addr}, then we should check @func{hash_insert}'s
2189 To search for an element in the hash table, use @func{hash_find}. This
2190 takes a little setup, because @func{hash_find} takes an element to
2191 compare against. Here's a function that will find and return a page
2192 based on a virtual address, assuming that @var{pages} is defined at file
2196 /* @r{Returns the page containing the given virtual @var{address},}
2197 @r{or a null pointer if no such page exists.} */
2199 page_lookup (const void *address)
2202 struct hash_elem *e;
2205 e = hash_find (&pages, &p.hash_elem);
2206 return e != NULL ? hash_entry (e, struct page, hash_elem) : NULL;
2211 @struct{page} is allocated as a local variable here on the assumption
2212 that it is fairly small. Large structures should not be allocated as
2213 local variables. @xref{struct thread}, for more information.
2215 A similar function could delete a page by address using
2218 @node Hash Auxiliary Data
2219 @subsection Auxiliary Data
2221 In simple cases like the example above, there's no need for the
2222 @var{aux} parameters. In these cases, just pass a null pointer to
2223 @func{hash_init} for @var{aux} and ignore the values passed to the hash
2224 function and comparison functions. (You'll get a compiler warning if
2225 you don't use the @var{aux} parameter, but you can turn that off with
2226 the @code{UNUSED} macro, as shown in the example, or you can just ignore
2229 @var{aux} is useful when you have some property of the data in the
2230 hash table is both constant and needed for hashing or comparison,
2231 but not stored in the data items themselves. For example, if
2232 the items in a hash table are fixed-length strings, but the items
2233 themselves don't indicate what that fixed length is, you could pass
2234 the length as an @var{aux} parameter.
2236 @node Hash Synchronization
2237 @subsection Synchronization
2239 The hash table does not do any internal synchronization. It is the
2240 caller's responsibility to synchronize calls to hash table functions.
2241 In general, any number of functions that examine but do not modify the
2242 hash table, such as @func{hash_find} or @func{hash_next}, may execute
2243 simultaneously. However, these function cannot safely execute at the
2244 same time as any function that may modify a given hash table, such as
2245 @func{hash_insert} or @func{hash_delete}, nor may more than one function
2246 that can modify a given hash table execute safely at once.
2248 It is also the caller's responsibility to synchronize access to data in
2249 hash table elements. How to synchronize access to this data depends on
2250 how it is designed and organized, as with any other data structure.