1 @node Project 4--File Systems, Multilevel Feedback Scheduling, Project 3--Virtual Memory, Top
2 @chapter Project 4: File Systems
4 In the previous two assignments, you made extensive use of a
5 filesystem without actually worrying about how it was implemented
6 underneath. For this last assignment, you will fill in the
7 implementation of the filesystem. You will be working primarily in
8 the @file{filesys} directory.
10 You should build on the code you wrote for the previous assignments.
11 However, if you wish, you may turn off your VM features, as they are
12 not vital to making the filesystem work. (You will need to edit
13 @file{filesys/Makefile.vars} to fully disable VM.) All of the
14 functionality needed for project 2 (argument passing, syscalls and
15 multiprogramming) must work in your filesys submission.
17 On the other hand, one of the particular charms of working on
18 operating systems is being able to use what you build, and building
19 full-featured systems. Therefore, you should strive to make all the
20 parts work together so that you can run VM and your filesystem at the
21 same time. Plus, keeping VM is a great way to stress-test your
22 filesystem implementation.
25 The first step is to understand the default filesystem provided by the
26 base code. The first things you should look at are
27 @file{threads/init.c} and @file{filesys/fsutil.c}: there are special
28 command line arguments to Pintos which are used to format and
29 manipulate the emulated disk. Specifically, @option{-f} formats the
30 file system disk for use with Pintos, and @option{-cp @var{src}
31 @var{dst}} copies @var{src} on the Unix filesystem to @var{dst} on the
32 file system disk. With this information, you should be able to
33 compile the base code and try out the command: @samp{pintos -f -cp
34 ./small small} to copy the file @file{small} from your working
35 directory into the Pintos file system.
38 One thing you should realize immediately is that, until you use the
39 above operation to copy the shell (or whatever your initial program
40 is) to the emulated disk, Pintos will be unable to do very much useful
41 work (it'll try to open the shell and fail, thereby quitting out). You
42 will also find that you won't be able to do interesting things until
43 you copy a variety of programs to the disk. A useful technique, once
44 your inode formats are finalized, is to create a clean reference disk
45 and copy that over whenever you trash your disk beyond a useful state
46 (which will probably happen quite often while debugging).
49 * File System New Code::
50 * Problem 4-1 Large Files::
51 * Problem 4-2 File Growth::
52 * Problem 4-3 Subdirectories::
53 * Problem 4-4 Buffer Cache::
54 * File System Design Document Requirements::
58 @node File System New Code
61 Here are some files that are probably new to you. These are in the
62 @file{filesys} directory except where indicated:
66 Simple utilities for the filesystem that are accessible from the
71 Top-level interface to the file system.
75 Translates file names to disk file headers; the
76 directory data structure is stored as a file.
80 Manages the data structure representing the layout of a
85 Translates file reads and writes to disk sector reads
90 Provides access to the physical disk, abstracting away the rather
93 @item lib/kernel/bitmap.h
94 @itemx lib/kernel/bitmap.c
95 A bitmap data structure along with routines for reading and writing
96 the bitmap to disk files.
99 Our file system has a Unix-like interface, so you may also wish to
100 read the Unix man pages for @code{creat}, @code{open}, @code{close},
101 @code{read}, @code{write}, @code{lseek}, and @code{unlink}. Our file
102 system has calls that are similar, but not identical, to these. The
103 file system translates these calls into physical disk operations.
105 All the basic functionality is there in the code above, so that the
106 filesystem is usable right off the bat. In fact, you've been using it
107 in the previous two projects. However, it has severe limitations
108 which you will remove.
110 While most of your work will be in @file{filesys}, you should be
111 prepared for interactions with all previous parts (as usual).
113 @node Problem 4-1 Large Files
114 @section Problem 4-1: Large Files
116 Modify the file system to allow the maximum size of a file to be as
117 large as the disk. You can assume that the disk will not be larger
118 than 8 MB. In the basic file system, each file is limited to a file
119 size of just under 64 kB. Each file has a header (@code{struct
120 filehdr}) that is a table of direct pointers to the disk blocks for
121 that file. Since the header is stored in one disk sector, the maximum
122 size of a file is limited by the number of pointers that will fit in
123 one disk sector. Increasing the limit to 8 MB will require you to
124 implement doubly-indirect blocks.
126 @node Problem 4-2 File Growth
127 @section Problem 4-2: File Growth
129 Implement extensible files. In the basic file system, the file size
130 is specified when the file is created. One advantage of this is that
131 the FileHeader data structure, once created, never changes. In UNIX
132 and most other file systems, a file is initially created with size 0
133 and is then expanded every time a write is made off the end of the
134 file. Modify the file system to allow this. As one test case, allow
135 the root directory file to expand beyond its current limit of ten
136 files. Make sure that concurrent accesses to the file header remain
137 properly synchronized.
139 @node Problem 4-3 Subdirectories
140 @section Problem 4-3: Subdirectories
142 Implement a hierarchical name space. In the basic file system, all
143 files live in a single directory. Modify this to allow directories to
144 point to either files or other directories. To do this, you will need
145 to implement routines that parse path names into a sequence of
146 directories, as well as routines that change the current working
147 directory and that list the contents of the current directory. For
148 performance, allow concurrent updates to different directories, but
149 use mutual exclusion to ensure that updates to the same directory are
150 performed atomically (for example, to ensure that a file is deleted
153 Make sure that directories can expand beyond their original size just
154 as any other file can.
156 To take advantage of hierarchical name spaces in user programs,
157 provide the following syscalls:
161 @itemx bool chdir (const char *@var{dir})
162 Attempts to change the current working directory of the process to
163 @var{dir}, which may be either relative or absolute. Returns true if
164 successful, false on failure.
167 @itemx bool mkdir (const char *dir)
168 Attempts to create the directory named @var{dir}, which may be either
169 relative or absolute. Returns true if successful, false on failure.
172 @itemx void lsdir (void)
173 Prints a list of files in the current directory to @code{stdout}, one
177 Also write the @command{ls} and @command{mkdir} user programs. This
178 is straightforward once the above syscalls are implemented. If Unix,
179 these are programs rather than built-in shell commands, but
180 @command{cd} is a shell command. (Why?)
182 @node Problem 4-4 Buffer Cache
183 @section Problem 4-4: Buffer Cache
185 Modify the file system to keep a cache of file blocks. When a request
186 is made to read or write a block, check to see if it is stored in the
187 cache, and if so, fetch it immediately from the cache without going to
188 disk. (Otherwise, fetch the block from disk into cache, evicting an
189 older entry if necessary.) You are limited to a cache no greater than
190 64 sectors in size. Be sure to choose an intelligent cache
191 replacement algorithm. Experiment to see what combination of use,
192 dirty, and other information results in the best performance, as
193 measured by the number of disk accesses. (For example, metadata is
194 generally more valuable to cache than data.) Document your
195 replacement algoritm in your design document.
197 In addition to the basic file caching scheme, your implementation
198 should also include the following enhancements:
202 Instead of always immediately writing modified data to disk, dirty
203 blocks can be kept in the cache and written out sometime later. Your
204 buffer cache should write behind whenever a block is evicted from the
208 Your buffer cache should automatically fetch the next block of a file
209 into the cache when one block of a file is read, in case that block is
213 For each of these three optimizations, design a file I/O workload that
214 is likely to benefit from the enhancement, explain why you expect it
215 to perform better than on the original file system implementation, and
216 demonstrate the performance improvement.
218 Note that write-behind makes your filesystem more fragile in the face
219 of crashes. Therefore, you should implement some manner to
220 periodically write all cached blocks to disk. If you have
221 @code{timer_sleep()} from the first project working, this is an
222 excellent application for it.
224 Likewise, read-ahead is only really useful when done asynchronously.
225 That is, if a process wants disk block 1 from the file, it needs to
226 block until disk block 1 is read in, but once that read is complete,
227 control should return to the process immediately while the read
228 request for disk block 2 is handled asynchronously. In other words,
229 the process will block to wait for disk block 1, but should not block
230 waiting for disk block 2.
233 When you're implementing this, please make sure you have a scheme for
234 making any read-ahead and write-behind threads halt when Pintos is
235 ``done'' (when the user program has completed, etc), so that Pintos
236 will halt normally and print its various statistics.
238 @node File System Design Document Requirements
239 @section Design Document Requirements
241 As always, submit a design document file summarizing your design. Be
242 sure to cover the following points :
246 How did you structure your inodes? How many blocks did you access
247 directly, via single-indirection, and/or via double-indirection? Why?
250 How did you structure your buffer cache? How did you perform a lookup
251 in the cache? How did you choose elements to evict from the cache?
254 How and when did you flush the cache?
257 @node File System FAQs
262 @b{What extra credit opportunities are available for this assignment?}
266 We'll give out extra credit to groups that implement Unix-style
267 support for @file{.} and @file{..} in relative paths in their projects.
270 We'll give some extra credit if you submit with VM enabled. If you do
271 this, make sure you show us that you can run multiple programs
272 concurrently. A particularly good demonstration is running
273 @file{capitalize} (with a reduced words file that fits comfortably on
274 your disk, of course). So submit a file system disk that contains a
275 VM-heavy program like @file{capitalize}, so we can try it out. And also
276 include the results in your test case file.
278 We feel that you will be much more satisfied with your cs140 ``final
279 product'' if you can get your VM working with your file system. It's
280 also a great stress test for your FS, but obviously you have to be
281 pretty confident with your VM if you're going to submit this extra
282 credit, since you'll still lose points for failing FS-related tests,
283 even if the problem is in your VM code.
286 A point of extra credit can be assigned if a user can recursively
287 remove directories from the shell command prompt. Note that the
288 typical semantic is to just fail if a directory is not empty.
291 Make sure that you discuss any extra credit in your @file{README}
292 file. We're likely to miss it if it gets buried in your design
296 @b{What exec modes for running Pintos do I absolutely need to
300 The most standard mode is to run your Pintos with all the command
301 flags on one command line, like this: @samp{pintos -f -cp shell
302 shell -ex "shell"}. However, you also need to support these flags
303 individually---especially since that's how the grader tests your
304 program. Thus, you should be able to run the above instead as:
309 pintos -cp shell shell
313 Note that this also provides a way for you to validate that your disks
314 are really persistent. This is a common problem with a write behind
315 cache: if you don't shut down properly it will leave the disk in an
319 @b{Will you test our file system with a different @code{DISK_SECTOR_SIZE}?}
321 No, @code{DISK_SECTOR_SIZE} will not change.
324 @b{Will the @code{struct filehdr} take up space on the disk too?}
326 Yes. Anything stored in @code{struct filehdr} takes up space on disk,
327 so you must include this in your calculation of how many entires will
328 fit in a single disk sector.
335 @b{What is the largest file size that we are supposed to support?}
337 The disk we create will be 8 MB or smaller. However, individual files
338 will have to be smaller than the disk to accommodate the metadata.
339 You'll need to consider this when deciding your @code{struct filehdr}
340 (inode) organization.
348 @b{What's the answer to the question in the spec about why
349 @command{ls} and @command{mkdir} are user programs, while @command{cd}
352 Each process maintains its own current working directory, so it's much
353 easier to change the current working directory of the shell process if
354 @command{cd} is implemented as a shell command rather than as another
355 user process. In fact, Unix-like systems don't provide any way for
356 one process to change another process's current working directory.
359 @b{When the spec states that directories should be able to grow beyond
360 ten files, does this mean that there can still be a set maximum number
361 of files per directory that is greater than ten, or should directories
362 now support unlimited growth (bounded by the maximum supported file
365 We're looking for directories that can support arbitrarily large
366 numbers of files. Now that directories can grow, we want you to
367 remove the concept of a preset maximum file limit.
370 @b{When should the @code{lsdir} system call return?}
372 The @code{lsdir} system call should not return until after the
373 directory has been printed. Here's a code fragment, and the desired
377 printf ("Start of directory\n");
379 printf ("End of directory\n");
382 This code should create the following output:
386 ... directory contents ...
391 @b{Do we have to implement both absolute and relative pathnames?}
393 Yes. Implementing @file{.} and @file{..} is extra credit, though.
396 @b{Should @code{remove()} also be able to remove directories?}
398 Yes. The @code{remove} system call should handle removal of both
399 regular files and directories. You may assume that directories can
400 only be deleted if they are empty, as in Unix.
408 @b{We're limited to a 64-block cache, but can we also keep a copy of
409 each @code{struct filehdr} for an open file inside @code{struct file},
410 the way the stub code does?}
412 No, you shouldn't keep any disk sectors stored anywhere outside the
413 cache. That means you'll have to change the way the file
414 implementation accesses its corresponding inode right now, since it
415 currently just creates a new @code{struct filehdr} in its constructor
416 and reads the corresponding sector in from disk when it's created.
418 There are two reasons for not storing inodes in @code{struct file}.
419 First, keeping extra copies of inodes would be cheating the 64-block
420 limitation that we place on your cache. Second, if two processes have
421 the same file open, you will create a huge synchronization headache
422 for yourself if each @code{struct file} has its own copy of the inode.
424 Note that you can store pointers to inodes in @code{struct file} if
425 you want, and you can store some other small amount of information to
426 help you find the inode when you need it.
428 Similarly, if you want to store one block of data plus some small
429 amount of metadata for each of your 64 cache entries, that's fine.
432 @b{But why can't we store copies of inodes in @code{struct file}? We
433 don't understand the answer to the previous question.}
435 The issue regarding storing @code{struct filehdr}s has to do with
436 implementation of the buffer cache. Basically, you can't store a
437 @code{struct filehdr *} in @code{struct filehdr}. Each time you need
438 to read a @code{struct filehdr}, you'll have to get it either from the
439 buffer cache or from disk.
441 If you look at @code{file_read_at()}, it uses @code{hdr} directly
442 without having first read in that sector from wherever it was in the
443 storage hierarchy. You are no longer allowed to do this. You will
444 need to change @code{file_read_at} (and similar functions) so that it
445 reads @code{hdr} from the storage hierarchy before using it.