pintos-os.org Git - pintos-anon/blob - doc/filesys.texi

   1 @node Project 4--File Systems, Multilevel Feedback Scheduling, Project 3--Virtual Memory, Top
   2 @chapter Project 4: File Systems
   3
   4 In the previous two assignments, you made extensive use of a
   5 filesystem without actually worrying about how it was implemented
   6 underneath.  For this last assignment, you will fill in the
   7 implementation of the filesystem.  You will be working primarily in
   8 the @file{filesys} directory.
   9
  10 You should build on the code you wrote for the previous assignments.
  11 However, if you wish, you may turn off your VM features, as they are
  12 not vital to making the filesystem work.  (You will need to edit
  13 @file{filesys/Makefile.vars} to fully disable VM.)  All of the
  14 functionality needed for project 2 (argument passing, syscalls and
  15 multiprogramming) must work in your filesys submission.
  16
  17 On the other hand, one of the particular charms of working on
  18 operating systems is being able to use what you build, and building
  19 full-featured systems.  Therefore, you should strive to make all the
  20 parts work together so that you can run VM and your filesystem at the
  21 same time.  Plus, keeping VM is a great way to stress-test your
  22 filesystem implementation.
  23
  24 FIXME FIXME FIXME
  25 The first step is to understand the default filesystem provided by the
  26 base code.  The first things you should look at are
  27 @file{threads/init.c} and @file{filesys/fsutil.c}: there are special
  28 command line arguments to Pintos which are used to format and
  29 manipulate the emulated disk.  Specifically, @option{-f} formats the
  30 file system disk for use with Pintos, and @option{-cp @var{src}
  31 @var{dst}} copies @var{src} on the Unix filesystem to @var{dst} on the
  32 file system disk.  With this information, you should be able to
  33 compile the base code and try out the command: @samp{pintos -f -cp
  34 ./small small} to copy the file @file{small} from your working
  35 directory into the Pintos file system.
  36
  37 FIXME FIXME FIXME
  38 One thing you should realize immediately is that, until you use the
  39 above operation to copy the shell (or whatever your initial program
  40 is) to the emulated disk, Pintos will be unable to do very much useful
  41 work (it'll try to open the shell and fail, thereby quitting out).  You
  42 will also find that you won't be able to do interesting things until
  43 you copy a variety of programs to the disk.  A useful technique, once
  44 your inode formats are finalized, is to create a clean reference disk
  45 and copy that over whenever you trash your disk beyond a useful state
  46 (which will probably happen quite often while debugging).
  47
  48 @menu
  49 * File System New Code::
  50 * Problem 4-1 Large Files::
  51 * Problem 4-2 File Growth::
  52 * Problem 4-3 Subdirectories::
  53 * Problem 4-4 Buffer Cache::
  54 * File System Design Document Requirements::
  55 * File System FAQs::
  56 @end menu
  57
  58 @node File System New Code
  59 @section New Code
  60
  61 Here are some files that are probably new to you.  These are in the
  62 @file{filesys} directory except where indicated:
  63
  64 @table @file
  65 @item fsutil.c
  66 Simple utilities for the filesystem that are accessible from the
  67 kernel command line.
  68
  69 @item filesys.h
  70 @itemx filesys.c
  71 Top-level interface to the file system.
  72
  73 @item directory.h
  74 @itemx directory.c
  75 Translates file names to disk file headers; the
  76 directory data structure is stored as a file.
  77
  78 @item filehdr.h
  79 @itemx filehdr.c
  80 Manages the data structure representing the layout of a
  81 file's data on disk.
  82
  83 @item file.h
  84 @itemx file.c
  85 Translates file reads and writes to disk sector reads
  86 and writes.
  87
  88 @item devices/disk.h
  89 @itemx devices/disk.c
  90 Provides access to the physical disk, abstracting away the rather
  91 awful IDE interface.
  92
  93 @item lib/kernel/bitmap.h
  94 @itemx lib/kernel/bitmap.c
  95 A bitmap data structure along with routines for reading and writing
  96 the bitmap to disk files.
  97 @end table
  98
  99 Our file system has a Unix-like interface, so you may also wish to
 100 read the Unix man pages for @code{creat}, @code{open}, @code{close},
 101 @code{read}, @code{write}, @code{lseek}, and @code{unlink}.  Our file
 102 system has calls that are similar, but not identical, to these.  The
 103 file system translates these calls into physical disk operations.
 104
 105 All the basic functionality is there in the code above, so that the
 106 filesystem is usable right off the bat.  In fact, you've been using it
 107 in the previous two projects.  However, it has severe limitations
 108 which you will remove.
 109
 110 While most of your work will be in @file{filesys}, you should be
 111 prepared for interactions with all previous parts (as usual).
 112
 113 @node Problem 4-1 Large Files
 114 @section Problem 4-1: Large Files
 115
 116 Modify the file system to allow the maximum size of a file to be as
 117 large as the disk.  You can assume that the disk will not be larger
 118 than 8 MB.  In the basic file system, each file is limited to a file
 119 size of just under 64 kB.  Each file has a header (@code{struct
 120 filehdr}) that is a table of direct pointers to the disk blocks for
 121 that file.  Since the header is stored in one disk sector, the maximum
 122 size of a file is limited by the number of pointers that will fit in
 123 one disk sector.  Increasing the limit to 8 MB will require you to
 124 implement doubly-indirect blocks.
 125
 126 @node Problem 4-2 File Growth
 127 @section Problem 4-2: File Growth
 128
 129 Implement extensible files.  In the basic file system, the file size
 130 is specified when the file is created.  One advantage of this is that
 131 the FileHeader data structure, once created, never changes.  In UNIX
 132 and most other file systems, a file is initially created with size 0
 133 and is then expanded every time a write is made off the end of the
 134 file.  Modify the file system to allow this.  As one test case, allow
 135 the root directory file to expand beyond its current limit of ten
 136 files.  Make sure that concurrent accesses to the file header remain
 137 properly synchronized.
 138
 139 @node Problem 4-3 Subdirectories
 140 @section Problem 4-3: Subdirectories
 141
 142 Implement a hierarchical name space.  In the basic file system, all
 143 files live in a single directory.  Modify this to allow directories to
 144 point to either files or other directories.  To do this, you will need
 145 to implement routines that parse path names into a sequence of
 146 directories, as well as routines that change the current working
 147 directory and that list the contents of the current directory.  For
 148 performance, allow concurrent updates to different directories, but
 149 use mutual exclusion to ensure that updates to the same directory are
 150 performed atomically (for example, to ensure that a file is deleted
 151 only once).
 152
 153 Make sure that directories can expand beyond their original size just
 154 as any other file can.
 155
 156 To take advantage of hierarchical name spaces in user programs,
 157 provide the following syscalls:
 158
 159 @table @code
 160 @item SYS_chdir
 161 @itemx bool chdir (const char *@var{dir})
 162 Attempts to change the current working directory of the process to
 163 @var{dir}, which may be either relative or absolute.  Returns true if
 164 successful, false on failure.
 165
 166 @item SYS_mkdir
 167 @itemx bool mkdir (const char *dir)
 168 Attempts to create the directory named @var{dir}, which may be either
 169 relative or absolute.  Returns true if successful, false on failure.
 170
 171 @item SYS_lsdir
 172 @itemx void lsdir (void)
 173 Prints a list of files in the current directory to @code{stdout}, one
 174 per line.
 175 @end table
 176
 177 Also write the @command{ls} and @command{mkdir} user programs.  This
 178 is straightforward once the above syscalls are implemented.  If Unix,
 179 these are programs rather than built-in shell commands, but
 180 @command{cd} is a shell command.  (Why?)
 181
 182 @node Problem 4-4 Buffer Cache
 183 @section Problem 4-4: Buffer Cache
 184
 185 Modify the file system to keep a cache of file blocks.  When a request
 186 is made to read or write a block, check to see if it is stored in the
 187 cache, and if so, fetch it immediately from the cache without going to
 188 disk.  (Otherwise, fetch the block from disk into cache, evicting an
 189 older entry if necessary.)  You are limited to a cache no greater than
 190 64 sectors in size.  Be sure to choose an intelligent cache
 191 replacement algorithm.  Experiment to see what combination of use,
 192 dirty, and other information results in the best performance, as
 193 measured by the number of disk accesses.  (For example, metadata is
 194 generally more valuable to cache than data.)  Document your
 195 replacement algoritm in your design document.
 196
 197 In addition to the basic file caching scheme, your implementation
 198 should also include the following enhancements:
 199
 200 @table @b
 201 @item write-behind:
 202 Instead of always immediately writing modified data to disk, dirty
 203 blocks can be kept in the cache and written out sometime later.  Your
 204 buffer cache should write behind whenever a block is evicted from the
 205 cache.
 206
 207 @item read-ahead:
 208 Your buffer cache should automatically fetch the next block of a file
 209 into the cache when one block of a file is read, in case that block is
 210 about to be read.
 211 @end table
 212
 213 For each of these three optimizations, design a file I/O workload that
 214 is likely to benefit from the enhancement, explain why you expect it
 215 to perform better than on the original file system implementation, and
 216 demonstrate the performance improvement.
 217
 218 Note that write-behind makes your filesystem more fragile in the face
 219 of crashes.  Therefore, you should implement some manner to
 220 periodically write all cached blocks to disk.  If you have
 221 @code{timer_sleep()} from the first project working, this is an
 222 excellent application for it.
 223
 224 Likewise, read-ahead is only really useful when done asynchronously.
 225 That is, if a process wants disk block 1 from the file, it needs to
 226 block until disk block 1 is read in, but once that read is complete,
 227 control should return to the process immediately while the read
 228 request for disk block 2 is handled asynchronously.  In other words,
 229 the process will block to wait for disk block 1, but should not block
 230 waiting for disk block 2.
 231
 232 FIXME
 233 When you're implementing this, please make sure you have a scheme for
 234 making any read-ahead and write-behind threads halt when Pintos is
 235 ``done'' (when the user program has completed, etc), so that Pintos
 236 will halt normally and print its various statistics.
 237
 238 @node File System Design Document Requirements
 239 @section Design Document Requirements
 240
 241 As always, submit a design document file summarizing your design.  Be
 242 sure to cover the following points :
 243
 244 @itemize @bullet
 245 @item
 246 How did you structure your inodes? How many blocks did you access
 247 directly, via single-indirection, and/or via double-indirection?  Why?
 248
 249 @item
 250 How did you structure your buffer cache? How did you perform a lookup
 251 in the cache? How did you choose elements to evict from the cache?
 252
 253 @item
 254 How and when did you flush the cache?
 255 @end itemize
 256
 257 @node File System FAQs
 258 @section FAQ
 259
 260 @enumerate 1
 261 @item
 262 @b{What extra credit opportunities are available for this assignment?}
 263
 264 @itemize @bullet
 265 @item
 266 We'll give out extra credit to groups that implement Unix-style
 267 support for @file{.} and @file{..} in relative paths in their projects.
 268
 269 @item
 270 We'll give some extra credit if you submit with VM enabled.  If you do
 271 this, make sure you show us that you can run multiple programs
 272 concurrently.  A particularly good demonstration is running
 273 @file{capitalize} (with a reduced words file that fits comfortably on
 274 your disk, of course).  So submit a file system disk that contains a
 275 VM-heavy program like @file{capitalize}, so we can try it out.  And also
 276 include the results in your test case file.
 277
 278 We feel that you will be much more satisfied with your cs140 ``final
 279 product'' if you can get your VM working with your file system.  It's
 280 also a great stress test for your FS, but obviously you have to be
 281 pretty confident with your VM if you're going to submit this extra
 282 credit, since you'll still lose points for failing FS-related tests,
 283 even if the problem is in your VM code.
 284
 285 @item
 286 A point of extra credit can be assigned if a user can recursively
 287 remove directories from the shell command prompt.  Note that the
 288 typical semantic is to just fail if a directory is not empty.
 289 @end itemize
 290
 291 Make sure that you discuss any extra credit in your @file{README}
 292 file.  We're likely to miss it if it gets buried in your design
 293 document.
 294
 295 @item
 296 @b{What exec modes for running Pintos do I absolutely need to
 297 support?}
 298
 299 FIXME FIXME
 300 The most standard mode is to run your Pintos with all the command
 301 flags on one command line, like this: @samp{pintos -f -cp shell
 302 shell -ex "shell"}.  However, you also need to support these flags
 303 individually---especially since that's how the grader tests your
 304 program.  Thus, you should be able to run the above instead as:
 305
 306 FIXME
 307 @example
 308 pintos -f
 309 pintos -cp shell shell
 310 pintos -ex "shell"
 311 @end example
 312
 313 Note that this also provides a way for you to validate that your disks
 314 are really persistent.  This is a common problem with a write behind
 315 cache: if you don't shut down properly it will leave the disk in an
 316 inconsistent state.
 317
 318 @item
 319 @b{Will you test our file system with a different @code{DISK_SECTOR_SIZE}?}
 320
 321 No, @code{DISK_SECTOR_SIZE} will not change.
 322
 323 @item
 324 @b{Will the @code{struct filehdr} take up space on the disk too?}
 325
 326 Yes.  Anything stored in @code{struct filehdr} takes up space on disk,
 327 so you must include this in your calculation of how many entires will
 328 fit in a single disk sector.
 329
 330 @item
 331 File Growth FAQs
 332
 333 @enumerate 1
 334 @item
 335 @b{What is the largest file size that we are supposed to support?}
 336
 337 The disk we create will be 8 MB or smaller.  However, individual files
 338 will have to be smaller than the disk to accommodate the metadata.
 339 You'll need to consider this when deciding your @code{struct filehdr}
 340 (inode) organization.
 341 @end enumerate
 342
 343 @item
 344 Subdirectory FAQs
 345
 346 @enumerate 1
 347 @item
 348 @b{What's the answer to the question in the spec about why
 349 @command{ls} and @command{mkdir} are user programs, while @command{cd}
 350 is a shell command?}
 351
 352 Each process maintains its own current working directory, so it's much
 353 easier to change the current working directory of the shell process if
 354 @command{cd} is implemented as a shell command rather than as another
 355 user process.  In fact, Unix-like systems don't provide any way for
 356 one process to change another process's current working directory.
 357
 358 @item
 359 @b{When the spec states that directories should be able to grow beyond
 360 ten files, does this mean that there can still be a set maximum number
 361 of files per directory that is greater than ten, or should directories
 362 now support unlimited growth (bounded by the maximum supported file
 363 size)?}
 364
 365 We're looking for directories that can support arbitrarily large
 366 numbers of files.  Now that directories can grow, we want you to
 367 remove the concept of a preset maximum file limit.
 368
 369 @item
 370 @b{When should the @code{lsdir} system call return?}
 371
 372 The @code{lsdir} system call should not return until after the
 373 directory has been printed.  Here's a code fragment, and the desired
 374 output:
 375
 376 @example
 377 printf ("Start of directory\n");
 378 lsdir ();
 379 printf ("End of directory\n");
 380 @end example
 381
 382 This code should create the following output:
 383
 384 @example
 385 Start of directory
 386 ...  directory contents ...
 387 End of directory
 388 @end example
 389
 390 @item
 391 @b{Do we have to implement both absolute and relative pathnames?}
 392
 393 Yes.  Implementing @file{.} and @file{..} is extra credit, though.
 394
 395 @item
 396 @b{Should @code{remove()} also be able to remove directories?}
 397
 398 Yes.  The @code{remove} system call should handle removal of both
 399 regular files and directories.  You may assume that directories can
 400 only be deleted if they are empty, as in Unix.
 401 @end enumerate
 402
 403 @item
 404 Buffer Cache FAQs
 405
 406 @enumerate 1
 407 @item
 408 @b{We're limited to a 64-block cache, but can we also keep a copy of
 409 each @code{struct filehdr} for an open file inside @code{struct file},
 410 the way the stub code does?}
 411
 412 No, you shouldn't keep any disk sectors stored anywhere outside the
 413 cache.  That means you'll have to change the way the file
 414 implementation accesses its corresponding inode right now, since it
 415 currently just creates a new @code{struct filehdr} in its constructor
 416 and reads the corresponding sector in from disk when it's created.
 417
 418 There are two reasons for not storing inodes in @code{struct file}.
 419 First, keeping extra copies of inodes would be cheating the 64-block
 420 limitation that we place on your cache.  Second, if two processes have
 421 the same file open, you will create a huge synchronization headache
 422 for yourself if each @code{struct file} has its own copy of the inode.
 423
 424 Note that you can store pointers to inodes in @code{struct file} if
 425 you want, and you can store some other small amount of information to
 426 help you find the inode when you need it.
 427
 428 Similarly, if you want to store one block of data plus some small
 429 amount of metadata for each of your 64 cache entries, that's fine.
 430
 431 @item
 432 @b{But why can't we store copies of inodes in @code{struct file}? We
 433 don't understand the answer to the previous question.}
 434
 435 The issue regarding storing @code{struct filehdr}s has to do with
 436 implementation of the buffer cache.  Basically, you can't store a
 437 @code{struct filehdr *} in @code{struct filehdr}.  Each time you need
 438 to read a @code{struct filehdr}, you'll have to get it either from the
 439 buffer cache or from disk.
 440
 441 If you look at @code{file_read_at()}, it uses @code{hdr} directly
 442 without having first read in that sector from wherever it was in the
 443 storage hierarchy.  You are no longer allowed to do this.  You will
 444 need to change @code{file_read_at} (and similar functions) so that it
 445 reads @code{hdr} from the storage hierarchy before using it.
 446 @end enumerate
 447 @end enumerate