pintos-os.org Git - pintos-anon/blob - doc/filesys.texi

   1 @node Project 4--File Systems, References, Project 3--Virtual Memory, Top
   2 @chapter Project 4: File Systems
   3
   4 In the previous two assignments, you made extensive use of a
   5 filesystem without actually worrying about how it was implemented
   6 underneath.  For this last assignment, you will fill in the
   7 implementation of the filesystem.  You will be working primarily in
   8 the @file{filesys} directory.
   9
  10 You should build on the code you wrote for the previous assignments.
  11 However, if you wish, you may turn off your VM features, as they are
  12 not vital to making the filesystem work.  (You will need to edit
  13 @file{filesys/Makefile.vars} to fully disable VM.)  All of the
  14 functionality needed for project 2 (argument passing, syscalls and
  15 multiprogramming) must work in your filesys submission.
  16
  17 On the other hand, one of the particular charms of working on
  18 operating systems is being able to use what you build, and building
  19 full-featured systems.  Therefore, you should strive to make all the
  20 parts work together so that you can run VM and your filesystem at the
  21 same time.  Plus, keeping VM is a great way to stress-test your
  22 filesystem implementation.
  23
  24 @menu
  25 * File System New Code::
  26 * File System Synchronization::
  27 * Problem 4-1 Indexed Files::
  28 * Problem 4-2 File Growth::
  29 * Problem 4-3 Subdirectories::
  30 * Problem 4-4 Buffer Cache::
  31 * File System Design Document Requirements::
  32 * File System FAQ::
  33 @end menu
  34
  35 @node File System New Code
  36 @section New Code
  37
  38 Here are some files that are probably new to you.  These are in the
  39 @file{filesys} directory except where indicated:
  40
  41 @table @file
  42 @item fsutil.c
  43 Simple utilities for the filesystem that are accessible from the
  44 kernel command line.
  45
  46 @item filesys.h
  47 @itemx filesys.c
  48 Top-level interface to the file system.  Please read the long comment
  49 near the top of @file{filesys.c}, which introduces some details of the
  50 file system code as provided.
  51
  52 @item directory.h
  53 @itemx directory.c
  54 Translates file names to inodes.  The directory data structure is
  55 stored as a file.
  56
  57 @item inode.h
  58 @itemx inode.c
  59 Manages the data structure representing the layout of a
  60 file's data on disk.
  61
  62 @item file.h
  63 @itemx file.c
  64 Translates file reads and writes to disk sector reads
  65 and writes.
  66
  67 @item lib/kernel/bitmap.h
  68 @itemx lib/kernel/bitmap.c
  69 A bitmap data structure along with routines for reading and writing
  70 the bitmap to disk files.
  71 @end table
  72
  73 Our file system has a Unix-like interface, so you may also wish to
  74 read the Unix man pages for @code{creat}, @code{open}, @code{close},
  75 @code{read}, @code{write}, @code{lseek}, and @code{unlink}.  Our file
  76 system has calls that are similar, but not identical, to these.  The
  77 file system translates these calls into physical disk operations.
  78
  79 All the basic functionality is there in the code above, so that the
  80 filesystem is usable right off the bat.  In fact, you've been using it
  81 in the previous two projects.  However, it has severe limitations
  82 which you will remove.
  83
  84 While most of your work will be in @file{filesys}, you should be
  85 prepared for interactions with all previous parts (as usual).
  86
  87 @node File System Synchronization
  88 @section Synchronization
  89
  90 The provided file system requires external synchronization, that is,
  91 callers must ensure that only one thread can be running in the file
  92 system code at once.  Your submission should use a finer-grained
  93 synchronization strategy.  You will need to consider synchronization
  94 issues for each type of file system object.  The provided code uses the
  95 following strategies:
  96
  97 @itemize @bullet
  98 @item
  99 The free map and root directory are read each time they are needed for
 100 an operation, and if they are modified, they are written back before the
 101 operation completes.  Thus, the free map is always consistent from an
 102 external viewpoint.
 103
 104 @item
 105 Inodes are immutable in the provided file system, that is, their content
 106 never changes between creation and deletion.  Furthermore, only one copy
 107 of an inode's data is maintained in memory at once, even if the file is
 108 open in multiple contexts.
 109
 110 @item
 111 File data doesn't have to be consistent because it's just not part of
 112 the model.  In Unix and many other operating systems, a read of a file
 113 by one process when the file is being written by another process can
 114 show inconsistent results: it can show that none, all, or part of the
 115 write has completed.  (However, after the write system call returns to
 116 its caller, all subsequent readers must see the change.)  Similarly,
 117 when two threads write to the same part of a file at the same time,
 118 their data may be interleaved.  External synchronization of the provided
 119 file system ensures that reads and writes are fully serialized, but your
 120 file system doesn't have to maintain full serialization as long as it
 121 follows the rules above.
 122 @end itemize
 123
 124 @node Problem 4-1 Indexed Files
 125 @section Problem 4-1: Indexed Files
 126
 127 The basic file system allocates files as a single extent, making it
 128 vulnerable to external fragmentation.  Eliminate this problem by
 129 modifying the inode structure.  In practice, this probably means using
 130 an index structure with direct, indirect, and doubly indirect blocks.
 131 (You are welcome to choose a different scheme as long as you explain the
 132 rationale for it in your design documentation, and as long as it does
 133 not suffer from external fragmentation.)
 134
 135 You can assume that the disk will not be larger than 8 MB.  You must
 136 support files as large as the disk (minus metadata).  Each inode is
 137 stored in one disk sector, limiting the number of block pointers that it
 138 can contain.  Supporting 8 MB files will require you to implement
 139 doubly-indirect blocks.
 140
 141 @node Problem 4-2 File Growth
 142 @section Problem 4-2: File Growth
 143
 144 Implement extensible files.  In the basic file system, the file size
 145 is specified when the file is created.  One advantage of this is that
 146 the inode data structure, once created, never changes.  In UNIX and
 147 most other file systems, a file is initially created with size 0 and
 148 is then expanded every time a write is made off the end of the file.
 149 Modify the file system to allow this.
 150 Make sure that concurrent accesses to the inode remain properly
 151 synchronized.
 152
 153 There should be no predetermined limit on the size of a file, except
 154 that a disk cannot exceed the size of the disk (minus metadata).  This
 155 also applies to the root directory file, which should now be allowed
 156 to expand beyond its initial limit of ten files.
 157
 158 The user is allowed to seek beyond the current end-of-file (EOF).  The
 159 seek itself does not extend the file.  Writing at a position past EOF
 160 extends the file to the position being written, and any gap between the
 161 previous EOF and the start of the write must be filled with zeros.  A
 162 read starting from a position past EOF returns no bytes.
 163
 164 Writing far beyond EOF can cause many blocks to be entirely zero.  Some
 165 file systems allocate and write real data blocks for these implicitly
 166 zeroed blocks.  Other file systems do not allocate these blocks at all
 167 until they are explicitly written.  The latter file systems are said to
 168 support ``sparse files.''  You may adopt either allocation strategy in
 169 your file system.
 170
 171 @node Problem 4-3 Subdirectories
 172 @section Problem 4-3: Subdirectories
 173
 174 Implement a hierarchical name space.  In the basic file system, all
 175 files live in a single directory.  Modify this to allow directory
 176 entries to point to files or to other directories.  You will need
 177 routines to parse a path name into a sequence of directories, to
 178 change the current working directory, and to list the contents of the
 179 current directory.  For performance, allow concurrent updates to
 180 different directories, but use mutual exclusion to ensure that updates
 181 to the same directory are performed atomically (for example, to ensure
 182 that a file is deleted only once).
 183
 184 Make sure that directories can expand beyond their original size just
 185 as any other file can.
 186
 187 The basic file system has a 14-character limit on file names.  You may
 188 retain this limit for individual file name components, or may extend
 189 it, at your option.  In any case you must allow full path names to be
 190 much longer than 14 characters.
 191
 192 The current directory is maintained separately for each process.  At
 193 startup, the initial process has the root directory as its current
 194 directory.  When one process starts another with the @code{exec}
 195 system call, the child process inherits its parent's current
 196 directory.  After that, the two processes' current directories are
 197 independent, so that either changing its own current directory has no
 198 effect on the other.
 199
 200 Update the existing system calls so that, anywhere a file name is
 201 provided by the caller, an absolute or relative path name may used.
 202
 203 Update the @code{remove} system call so that it can delete empty
 204 directories in addition to regular files.  Directories can only be
 205 deleted if they do not contain files or subdirectories.
 206
 207 Implement the following new system calls:
 208
 209 @table @code
 210 @item SYS_chdir
 211 @itemx bool chdir (const char *@var{dir})
 212 Attempts to change the current working directory of the process to
 213 @var{dir}, which may be either relative or absolute.  Returns true if
 214 successful, false on failure.
 215
 216 @item SYS_mkdir
 217 @itemx bool mkdir (const char *dir)
 218 Attempts to create the directory named @var{dir}, which may be either
 219 relative or absolute.  Returns true if successful, false on failure.
 220 Fails if @var{dir} already exists or if any directory name in
 221 @var{dir}, besides the last, does not already exist.  That is,
 222 @code{mkdir("/a/b/c")} succeeds only if @file{/a/b} already exists and
 223 @file{/a/b/c} does not.
 224
 225 @item SYS_lsdir
 226 @itemx void lsdir (void)
 227 Prints a list of files in the current directory to @code{stdout}, one
 228 per line, in no particular order.
 229 @end table
 230
 231 We have provided @command{ls} and @command{mkdir} user programs, which
 232 are straightforward once the above syscalls are implemented.  In Unix,
 233 these are programs rather than built-in shell commands, but
 234 @command{cd} is a shell command.  (Why?)
 235
 236 The @code{pintos} @option{put} and @option{get} commands should now
 237 accept full path names, assuming that the directories used in the
 238 paths have already been created.  This should not require any extra
 239 effort on your part.
 240
 241 @node Problem 4-4 Buffer Cache
 242 @section Problem 4-4: Buffer Cache
 243
 244 Modify the file system to keep a cache of file blocks.  When a request
 245 is made to read or write a block, check to see if it is stored in the
 246 cache, and if so, fetch it immediately from the cache without going to
 247 disk.  (Otherwise, fetch the block from disk into cache, evicting an
 248 older entry if necessary.)  You are limited to a cache no greater than
 249 64 sectors in size.  Be sure to choose an intelligent cache
 250 replacement algorithm.  Experiment to see what combination of accessed,
 251 dirty, and other information results in the best performance, as
 252 measured by the number of disk accesses.  (For example, metadata is
 253 generally more valuable to cache than data.)  Document your
 254 replacement algorithm in your design document.
 255
 256 The provided file system code uses a ``bounce buffer'' in @struct{file}
 257 to translate the disk's sector-by-sector interface into the system call
 258 interface's byte-by-byte interface.  It needs per-file buffers because,
 259 without them, there's no other good place to put sector
 260 data.@footnote{The stack is not a good place because large objects
 261 should not be allocated on the stack.  A 512-byte sector is pushing the
 262 limit there.}  As part of implementing the buffer cache, you should get
 263 rid of these bounce buffers.  Instead, copy data into and out of sectors
 264 in the buffer cache directly.  You will probably need some
 265 synchronization to prevent sectors from being evicted from the cache
 266 while you are using them.
 267
 268 In addition to the basic file caching scheme, your implementation
 269 should also include the following enhancements:
 270
 271 @table @b
 272 @item write-behind:
 273 Instead of always immediately writing modified data to disk, dirty
 274 blocks can be kept in the cache and written out sometime later.  Your
 275 buffer cache should write behind whenever a block is evicted from the
 276 cache.
 277
 278 @item read-ahead:
 279 Your buffer cache should automatically fetch the next block of a file
 280 into the cache when one block of a file is read, in case that block is
 281 about to be read.
 282 @end table
 283
 284 For each of these three optimizations, design a file I/O workload that
 285 is likely to benefit from the enhancement, explain why you expect it
 286 to perform better than on the original file system implementation, and
 287 demonstrate the performance improvement.
 288
 289 Note that write-behind makes your filesystem more fragile in the face
 290 of crashes.  Therefore, you should
 291 periodically write all cached blocks to disk.  If you have
 292 @func{timer_sleep} from the first project working, this is an
 293 excellent application for it.  (If you're still using the base
 294 implementation of @func{timer_sleep}, be aware that it busy-waits, which
 295 is not an acceptable solution.) If @func{timer_sleep}'s delays seem too
 296 short or too long, reread the explanation of the @option{-r} option to
 297 @command{pintos} (@pxref{Debugging versus Testing}).
 298
 299 Likewise, read-ahead is only really useful when done asynchronously.
 300 That is, if a process wants disk block 1 from the file, it needs to
 301 block until disk block 1 is read in, but once that read is complete,
 302 control should return to the process immediately while the read
 303 request for disk block 2 is handled asynchronously.  In other words,
 304 the process will block to wait for disk block 1, but should not block
 305 waiting for disk block 2.
 306
 307 When you're implementing this, please make sure you have a scheme for
 308 making any read-ahead and write-behind threads halt when Pintos is
 309 ``done'' (when the user program has completed, etc), so that Pintos
 310 will halt normally and the disk contents will be consistent.
 311
 312 @node File System Design Document Requirements
 313 @section Design Document Requirements
 314
 315 As always, submit a design document file summarizing your design.  Be
 316 sure to cover the following points:
 317
 318 @itemize @bullet
 319 @item
 320 How did you choose to synchronize file system operations?
 321
 322 @item
 323 How did you structure your inodes? How many blocks did you access
 324 directly, via single-indirection, and/or via double-indirection?  Why?
 325
 326 @item
 327 How did you structure your buffer cache? How did you perform a lookup
 328 in the cache? How did you choose elements to evict from the cache?
 329
 330 @item
 331 How and when did you flush the cache?
 332 @end itemize
 333
 334 @node File System FAQ
 335 @section FAQ
 336
 337 @enumerate 1
 338 @item
 339 @b{What extra credit opportunities are available for this assignment?}
 340
 341 @itemize @bullet
 342 @item
 343 We'll give out extra credit to groups that implement Unix-style
 344 support for @file{.} and @file{..} in relative paths in their projects.
 345
 346 @item
 347 We'll give some extra credit if you submit with VM enabled.  If you do
 348 this, make sure you show us that you can run multiple programs
 349 concurrently.  A particularly good demonstration is running
 350 @file{capitalize} (with a reduced words file that fits comfortably on
 351 your disk, of course).  So submit a file system disk that contains a
 352 VM-heavy program like @file{capitalize}, so we can try it out.  And also
 353 include the results in your test case file.
 354
 355 We feel that you will be much more satisfied with your cs140 ``final
 356 product'' if you can get your VM working with your file system.  It's
 357 also a great stress test for your FS, but obviously you have to be
 358 pretty confident with your VM if you're going to submit this extra
 359 credit, since you'll still lose points for failing FS-related tests,
 360 even if the problem is in your VM code.
 361
 362 @item
 363 A point of extra credit can be assigned if a user can recursively
 364 remove directories from the shell command prompt.  Note that the
 365 typical semantic is to just fail if a directory is not empty.
 366 @end itemize
 367
 368 Make sure that you discuss any extra credit in your @file{README}
 369 file.  We're likely to miss it if it gets buried in your design
 370 document.
 371
 372 @item
 373 @b{What exec modes for running Pintos do I absolutely need to
 374 support?}
 375
 376 You also need to support the @option{-f}, @option{-ci}, @option{-co},
 377 and @option{-ex} flags individually, and you need to handle them when
 378 they're combined, like this: @samp{pintos -f -ci shell 12345 -ex
 379 "shell"}.  Thus, you should be able to treat the above as equivalent to:
 380
 381 @example
 382 pintos -f
 383 pintos -ci shell 12345
 384 pintos -ex "shell"
 385 @end example
 386
 387 If you don't change the filesystem interface, none of this should
 388 require any special effort on your part.  They are already implemented
 389 properly in @file{threads/init.c} and @file{filesys/fsutil.c}.
 390
 391 You must also implement the @option{-q} option and make sure that data
 392 gets flushed out to disk properly when it is used.
 393
 394 @item
 395 @b{Will you test our file system with a different @code{DISK_SECTOR_SIZE}?}
 396
 397 No, @code{DISK_SECTOR_SIZE} is fixed at 512.  This is a fixed property
 398 of IDE disk hardware.
 399
 400 @item
 401 @b{Will the @struct{inode} take up space on the disk too?}
 402
 403 Yes.  Anything stored in @struct{inode} takes up space on disk,
 404 so you must include this in your calculation of how many entires will
 405 fit in a single disk sector.
 406
 407 @item
 408 @b{What's the directory separator character?}
 409
 410 Forward slash (@samp{/}).
 411 @end enumerate
 412
 413 @menu
 414 * Problem 4-2 File Growth FAQ::
 415 * Problem 4-3 Subdirectory FAQ::
 416 * Problem 4-4 Buffer Cache FAQ::
 417 @end menu
 418
 419 @node Problem 4-2 File Growth FAQ
 420 @subsection Problem 4-2: File Growth FAQ
 421
 422 @enumerate 1
 423 @item
 424 @b{What is the largest file size that we are supposed to support?}
 425
 426 The disk we create will be 8 MB or smaller.  However, individual files
 427 will have to be smaller than the disk to accommodate the metadata.
 428 You'll need to consider this when deciding your @struct{inode}
 429 organization.
 430 @end enumerate
 431
 432 @node Problem 4-3 Subdirectory FAQ
 433 @subsection Problem 4-3: Subdirectory FAQ
 434
 435 @enumerate 1
 436 @item
 437 @b{What's the answer to the question in the spec about why
 438 @command{ls} and @command{mkdir} are user programs, while @command{cd}
 439 is a shell command?}
 440
 441 Each process maintains its own current working directory, so it's much
 442 easier to change the current working directory of the shell process if
 443 @command{cd} is implemented as a shell command rather than as another
 444 user process.  In fact, Unix-like systems don't provide any way for
 445 one process to change another process's current working directory.
 446
 447 @item
 448 @b{When should the @code{lsdir} system call return?}
 449
 450 The @code{lsdir} system call should not return until after the
 451 directory has been printed.  Here's a code fragment, and the desired
 452 output:
 453
 454 @example
 455 printf ("Start of directory\n");
 456 lsdir ();
 457 printf ("End of directory\n");
 458 @end example
 459
 460 This code should create the following output:
 461
 462 @example
 463 Start of directory
 464 @dots{}directory contents@dots{}
 465 End of directory
 466 @end example
 467
 468 @item
 469 @b{Do we have to implement both absolute and relative pathnames?}
 470
 471 Yes.  Implementing @file{.} and @file{..} is extra credit, though.
 472 @end enumerate
 473
 474 @node Problem 4-4 Buffer Cache FAQ
 475 @subsection Problem 4-4: Buffer Cache FAQ
 476
 477 @enumerate 1
 478 @item
 479 @b{We're limited to a 64-block cache, but can we also keep an
 480 @struct{inode_disk} inside @struct{inode}, the way the provided code
 481 does?}
 482
 483 The goal of the 64-block limit is to bound the amount of cached file
 484 system data.  If you keep a block of disk data---whether file data or
 485 metadata---anywhere in kernel memory then you have to count it against
 486 the 64-block limit.  The same rule applies to anything that's
 487 ``similar'' to a block of disk data, such as a @struct{inode_disk}
 488 without the @code{length} or @code{sector_cnt} members.
 489
 490 You can keep a cached copy of the free map in memory permanently if you
 491 like.  It doesn't have to count against the cache size.
 492
 493 That means you'll have to change the way the inode implementation
 494 accesses its corresponding on-disk inode right now, since it currently
 495 just embeds a @struct{inode_disk} in @struct{inode} and reads the
 496 corresponding sector in from disk when it's created.  Keeping extra
 497 copies of inodes would be cheating the 64-block limitation that we place
 498 on your cache.
 499
 500 You can store pointers to inode data in @struct{inode}, if you want, and
 501 you can store some other small amount of information to help you find
 502 the inode when you need it.  Similarly, if you want to store one block
 503 of data plus some small amount of metadata for each of your 64 cache
 504 entries, that's fine.
 505
 506 If you look at @func{inode_byte_to_sector}, it uses the
 507 @struct{inode_disk} directly without having first read in that sector
 508 from wherever it was in the storage hierarchy.  This will no longer
 509 work.  You will need to change @func{inode_byte_to_sector} so that it
 510 reads the @struct{inode_disk} from the storage hierarchy before using
 511 it.
 512 @end enumerate