pintos-os.org Git - pintos-anon/blob - sigcse2009/assignments.tex

   1 \section{Pintos Projects}
   2 \label{sec:assignments}
   3
   4 The Pintos instructional operating system is split into four projects.
   5 \pintostestcounttable{}
   6
   7 %
   8 % Not sure if we need that.
   9 %
  10 % \subsection{Project 0}
  11 % If Pintos is used in a semester-long course, project 0 serves as a ``warm-up'' project.
  12 % In this project, students will gain familiarity with the Pintos source tree and some
  13 % supporting classes, in particular its implementation of doubly-linked lists.
  14 % In OS, doubly-linked lists are frequently used because they allow $O(1)$ insertion and
  15 % removal operations.  Moreover, they are often used in a style in which the list cell
  16 % containing the next and prev pointers is embedded in some larger structure, such as
  17 % a thread control block, rather than having separately allocated list cells.
  18 % In project 0, students use Pintos's list implementation to implement a simple, first-fit
  19 % memory allocator.
  20
  21 \subsection{Project 1 -- Threads}
  22 % intro
  23 Project 1 revolves around threads.  The baseline Pintos code boots into a kernel that
  24 supports multiple in-kernel threads.  It provides code for initialization, thread creation and
  25 destruction, context switches, thread blocking and unblocking as well as a simple but
  26 preemptive round-robin scheduler.
  27 Students study the existing, barebones threading system (about 600 lines of C code) to
  28 understand how threads are created and destroyed, and to understand the transitioning of
  29 threads between the READY, RUNNING, and BLOCKED states.  They also study how a thread's
  30 internal memory is managed, which is used to store its runtime stack and thread control block.
  31 Students can examine the context switch code, but the projects do not involve any modifications
  32 to it.
  33
  34 After reading the baseline code, the project asks students to implement several features
  35 that exercise thread state transitions.  The first part of this project includes a simple
  36 alarm clock, which requires maintaining a timer queue of sleeping threads and changing
  37 the timer interrupt handler to unblock those threads whose wakeup time has arrived.
  38 Students learn how to protect data structures that are shared between a thread and an
  39 interrupt handler.  The second part of the project constitutes a strict priority-based
  40 uniprocessor scheduler; students learn about the different ways in which
  41 such a scheduler must react to thread state changes.
  42 % cut for length
  43 %Project 1 also introduces synchronization primitives such as semaphores, locks,
  44 %and condition variables and explores the interaction between such primitives,
  45 %thread states, and the scheduler.
  46
  47 Based on the priority scheduler, students implement priority inheritance,
  48 which deepens their understanding of the interaction of threads and locks.
  49 We use the example of the near-failure of the Mars PathFinder mission to motivate
  50 the need for priority inheritance.
  51 Separately, students build a multi-level feedback queue scheduler on top of the strict
  52 priority scheduler.  This scheduler adjusts threads' priorities based on a sampling
  53 of how much CPU time a thread has received recently.
  54
  55 \paragraph{Testing and Grading}
  56 Project 1 is accompanied by 27 tests as shown in Table~\ref{table:tests}, which are
  57 run by a grading script using the Bochs simulator.
  58 Most of the tests are designed to produce deterministic output;
  59 the grading script will point out differences between expected and actual output.
  60 Usually, a test failure leads students to study the documented source code of the test
  61 and understand how the expected output derives from it.
  62
  63 The MLFQS scheduler tests require a different approach.  Since those tests rely on estimating CPU
  64 usage, they depend on how much CPU time a specific implementation uses, which in turn depends on how
  65 efficient it is.  We compute the expected CPU consumption values by simulation and provide an
  66 envelope within which the output is accepted.  The envelope is large enough to allow for minor
  67 inefficiencies, but major inefficiencies will usually lead to test failures.  Such failures
  68 convey the importance of using efficient algorithms and data structures within an OS kernel.
  69 % cfl: because wasting CPU cycles in the kernel reduces the amount available to applications.
  70
  71 % intro
  72 \paragraph{Learning Objectives}
  73 Project 1 has three learning objectives.  First, students will understand how
  74 the illusion that ``computers can do multiple things at once'' is created by a sequence
  75 of thread state transitions and context switches.  Second, they will understand how
  76 sophisticated scheduling policies can be built on top of a simple priority-based scheduler.
  77 Third, having seen the mechanisms a preemptive scheduler uses to create apparent
  78 concurrency, students gain a better intuition of the non-determinism inherent
  79 in concurrent systems.
  80
  81 %
  82 %
  83 %
  84 \subsection{Project 2 -- User Programs}
  85 The second project illustrates how an OS implements protection and isolation between user processes,
  86 how user processes access kernel services, and how user processes lay out the virtual
  87 address space in which their program and data is contained.
  88 Students first add support to Pintos to load and execute user programs.
  89 We kept the provided code purposefully minimal to
  90 only a library that reads the text and data segments of ELF binaries.   These binaries
  91 are loaded into a new address space; the baseline code includes functionality to allocate
  92 physical memory and set up a page directory to establish the process's virtual address
  93 mappings.
  94
  95 Students implement support for a small set of system calls that allow processes to perform
  96 I/O, start new processes, wait for their termination, and exit.  Both the Pintos user
  97 process model and system call API are modeled after traditional Unix, with the exception
  98 that Pintos does not separate process creation (i.e., fork()) from program loading
  99 (i.e., exec) - instead, Pintos's exec() system call combines these two pieces of functionality
 100 into one.  The implementation of these calls requires the students to keep track of
 101 per-process resources such as file descriptors, which deepens their understanding of how
 102 an operating system provides the abstraction of a process.
 103
 104 Like most OS, Pintos exploits dual-mode operation in which user processes run
 105 in a nonprivileged mode.
 106 % cfl: that prevents access to kernel resources and to resources allocated to other processes.
 107 Processes attempting to bypass these restrictions are terminated.
 108 Students implement the system call handling code, a key aspect of which includes the
 109 careful examination of arguments passed by user processes.
 110
 111 Project 2 also illustrates concurrent programming techniques, notably fork/join
 112 parallelism, which students implement using rendezvous-style synchronization
 113 % based on semaphores
 114 when providing support for the exec()/wait()/exit() system calls.
 115
 116 \paragraph{Testing and Grading}
 117 All tests for Project 2 are user programs written in C.
 118 They are divided into functionality and robustness tests.  Functionality tests check that
 119 the operating system provides the specified set of services when it is used as
 120 expected.  Robustness tests check that the OS rejects all attempts at passing
 121 invalid input to system calls.  To pass those tests, the student's kernel must be
 122 ``bullet-proof.'' We include a stress test in which we artificially induce low memory
 123 conditions by creating a large number of processes and pseudo-randomly introducing
 124 failures in some of them.  We expect the kernel to fully recover from such situations.
 125
 126 \paragraph{Learning Objectives}
 127 Students learn how the thread abstraction introduced in Project 1 is
 128 extended into the process abstraction, which combines a thread, a virtual address space,
 129 and its associated resources.
 130 Project 2 enables students to understand how operating systems employ dual-mode
 131 operation to implement isolation between processes and to protect system resources
 132 even in the presence of failing or misbehaving processes.
 133 Students understand how processes transition into the kernel to access its services,
 134 and how kernels implement such services in a robust way.
 135 The principles learned in this exercise carry over to all scenarios
 136 in which applications must be robust in the face of input coming from untrusted
 137 sources and uncertain resource availability, as is the case in many server systems.
 138
 139 \subsection{Project 3 -- Virtual Memory}
 140 Project 3 asks students to implement several virtual memory techniques, including
 141 on-demand paging of programs, stack growth, page replacement, and memory-mapped files.
 142 This functionality is primarily implemented in a page fault handler routine.
 143 % and during the program loading process.
 144 We provide supporting code to create and maintain page directories, which hide
 145 the x86-specifics of how to program the memory management unit (MMU) and how
 146 to ensure consistency with the CPU's translation look-aside buffer (TLB).
 147 As a result, students can treat the MMU as an abstract device in which to
 148 install, update, or remove virtual-to-physical address mappings.
 149 Consequently, they are free to choose any design for the data structures needed to
 150 keep track of the state of each page or region in a process's virtual address
 151 space.
 152
 153 In early offerings, this significant creative freedom came at the cost that
 154 some students were lost as to how to accomplish set goals.  We added an intermediate
 155 design review stage to this project using a structured questionnaire in which students
 156 outline their planned design.  We also provided a suggested order of implementation.
 157
 158 Like Project 2, Project 3 requires the use of concurrent programming techniques.
 159 Since the Pintos kernel is fully preemptive, students must consider which data structures
 160 require locking, and they must design a locking strategy that both avoids deadlock
 161 and unnecessary serialization.
 162
 163 \paragraph{Testing and Grading}
 164 Project 3 relies on Project 2, therefore, we include all tests provided with Project 2
 165 as regression tests to ensure that system call functionality does not break in the
 166 presence of virtual memory.  Furthermore, we provide tests for the
 167 added functionality that lends itself to such testing, namely, memory-mapped files
 168 and stack growth.  Some of the project tasks, such as on-demand paging, are
 169 performance-enhancing techniques that do not directly add functionality that is
 170 apparent to user programs; these tasks are graded by inspection.  We test the students
 171 page replacement code by varying the amount of physical memory available to
 172 the kernel when run under emulation, relative to the amount of memory that is
 173 accessed by our test programs.
 174 Grossly inefficient page replacement schemes result test failures due to time-outs.
 175
 176 \paragraph{Learning Objectives}
 177 Students learn how an OS creates the environment in which a user
 178 program executes as it relates to the program's code and variables.
 179 It provides a thorough understanding of how OS use resumption after page faults
 180 to virtualize a process's use of physical memory.
 181 Students gain hands-on experience with page replacement algorithms
 182 and have the opportunity to directly observe their performance impact.
 183
 184 %
 185 %
 186 %
 187 \subsection{Project 4 -- Filesystems}
 188 Project 4 asks students to design and implement a hierarchical, multi-threaded
 189 filesystem and buffer cache.  In Projects 2 and 3, students use a basic filesystem
 190 we provide to access the disk, which supports only fixed-size files, no subdirectories,
 191 and which lacks a buffer cache.
 192 Although we suggest a traditional, Unix-like filesystem design, which stores file
 193 metadata in inodes and treats directories as files, students have
 194 complete freedom in designing the layout of their filesystem's metadata.
 195 % as long as their design does not suffer from external fragmentation.
 196 Since our host tools will not know how to interpret the student's filesystems,
 197 we attach an intermediate ``scratch'' disk or partition to the
 198 physical or virtual computer on which Pintos runs, and use the student's kernel
 199 to copy files into and out of their filesystems.
 200 Similarly, we encourage students to experiment with different replacement
 201 strategies for their buffer cache, although we require that their algorithm
 202 behaves at least as well as a least-recently-used (LRU) strategy.
 203
 204 As with all projects, this assignment includes additional concurrent programming
 205 tasks: in this project, we suggest that students implement
 206 a multiple-reader, single-writer access scheme for individual buffer cache blocks
 207 to allow shared access when reading file data and metadata.
 208
 209 \paragraph{Testing and Grading}
 210 Project 4 adds a new set of test cases for the extended functionality, including
 211 stress tests that check correct behavior for deeply nested directories and
 212 for nearly full disks.
 213 For each functionality test, we provide a sibling persistence test that verifies
 214 that the changes done to the filesystem survive a shutdown and restart.
 215 Since the project does not require the virtual memory functionality, it can be built
 216 on either Project 2 or 3, depending on the instructor's judgment.
 217
 218 \paragraph{Learning Objectives}
 219 This project provides a deep understanding of how OS's manage secondary storage
 220 while avoiding fragmentation and providing efficiency for commonly occurring
 221 disk access patterns.
 222 Students learn how the use of a buffer cache helps absorb disk requests and
 223 improves performance.
 224 They also gain insight into filesystem semantics in the presence of
 225 simultaneously occurring requests.
 226