pintos-os.org Git - pintos-anon/blob - doc/debug.texi

   1 @node Debugging Tools, Development Tools, Project Documentation, Top
   2 @appendix Debugging Tools
   3
   4 Many tools lie at your disposal for debugging Pintos.  This appendix
   5 introduces you to a few of them.
   6
   7 @menu
   8 * printf::
   9 * ASSERT::
  10 * Function and Parameter Attributes::
  11 * Backtraces::
  12 * Automatic Race Detection with Checkbochs::
  13 * gdb::
  14 * Debugging by Infinite Loop::
  15 * Modifying Bochs::
  16 * Debugging Tips::
  17 @end menu
  18
  19 @node printf
  20 @section @code{printf()}
  21
  22 Don't underestimate the value of @func{printf}.  The way
  23 @func{printf} is implemented in Pintos, you can call it from
  24 practically anywhere in the kernel, whether it's in a kernel thread or
  25 an interrupt handler, almost regardless of what locks are held (but see
  26 @ref{printf Reboots} for a counterexample).
  27
  28 @func{printf} is useful for more than just examining data.
  29 It can also help figure out when and where something goes wrong, even
  30 when the kernel crashes or panics without a useful error message.  The
  31 strategy is to sprinkle calls to @func{print} with different strings
  32 (e.g.@: @code{"<1>"}, @code{"<2>"}, @dots{}) throughout the pieces of
  33 code you suspect are failing.  If you don't even see @code{<1>} printed,
  34 then something bad happened before that point, if you see @code{<1>}
  35 but not @code{<2>}, then something bad happened between those two
  36 points, and so on.  Based on what you learn, you can then insert more
  37 @func{printf} calls in the new, smaller region of code you suspect.
  38 Eventually you can narrow the problem down to a single statement.
  39 @xref{Debugging by Infinite Loop}, for a related technique.
  40
  41 @node ASSERT
  42 @section @code{ASSERT}
  43
  44 Assertions are useful because they can catch problems early, before
  45 they'd otherwise be noticed.  Pintos provides the
  46 @code{ASSERT}, defined in @file{<debug.h>}, for assertions.
  47 Ideally, each function should begin with a set of
  48 assertions that check its arguments for validity.  (Initializers for
  49 functions' local variables are evaluated before assertions are
  50 checked, so be careful not to assume that an argument is valid in an
  51 initializer.)  You can also sprinkle assertions throughout the body of
  52 functions in places where you suspect things are likely to go wrong.
  53 They are especially useful for checking loop invariants.
  54
  55 When an assertion proves untrue, the kernel panics.  The panic message
  56 should help you to find the problem.  See the description of
  57 backtraces below for more information.
  58
  59 @node Function and Parameter Attributes
  60 @section Function and Parameter Attributes
  61
  62 These macros defined in @file{<debug.h>} tell the compiler special
  63 attributes of a function or function parameter.  Their expansions are
  64 GCC-specific.
  65
  66 @defmac UNUSED
  67 Appended to a function parameter to tell the compiler that the
  68 parameter might not be used within the function.  It suppresses the
  69 warning that would otherwise appear.
  70 @end defmac
  71
  72 @defmac NO_RETURN
  73 Appended to a function prototype to tell the compiler that the
  74 function never returns.  It allows the compiler to fine-tune its
  75 warnings and its code generation.
  76 @end defmac
  77
  78 @defmac NO_INLINE
  79 Appended to a function prototype to tell the compiler to never emit
  80 the function in-line.  Occasionally useful to improve the quality of
  81 backtraces (see below).
  82 @end defmac
  83
  84 @defmac PRINTF_FORMAT (@var{format}, @var{first})
  85 Appended to a function prototype to tell the compiler that the function
  86 takes a @func{printf}-like format string as the argument numbered
  87 @var{format} (starting from 1) and that the corresponding value
  88 arguments start at the argument numbered @var{first}.  This lets the
  89 compiler tell you if you pass the wrong argument types.
  90 @end defmac
  91
  92 @node Backtraces
  93 @section Backtraces
  94
  95 When the kernel panics, it prints a ``backtrace,'' that is, a summary
  96 of how your program got where it is, as a list of addresses inside the
  97 functions that were running at the time of the panic.  You can also
  98 insert a call to @func{debug_backtrace}, prototyped in
  99 @file{<debug.h>}, to print a backtrace at any point in your code.
 100
 101 The addresses in a backtrace are listed as raw hexadecimal numbers,
 102 which are meaningless by themselves.  You can translate them into
 103 function names and source file line numbers using a tool called
 104 @command{addr2line} (80@var{x}86) or @command{i386-elf-addr2line}
 105 (SPARC).
 106
 107 The output format of @command{addr2line} is not ideal, so
 108 we've supplied a wrapper for it simply called @command{backtrace}.
 109 Give it the name of your @file{kernel.o} as the first argument and the
 110 hexadecimal numbers composing the backtrace (including the @samp{0x}
 111 prefixes) as the remaining arguments.  It outputs the function name
 112 and source file line numbers that correspond to each address.
 113
 114 If the translated form of a backtrace is garbled, or doesn't make
 115 sense (e.g.@: function A is listed above function B, but B doesn't
 116 call A), then it's a good sign that you're corrupting a kernel
 117 thread's stack, because the backtrace is extracted from the stack.
 118 Alternatively, it could be that the @file{kernel.o} you passed to
 119 @command{backtrace} does not correspond to the kernel that produced
 120 the backtrace.
 121
 122 Sometimes backtraces can be confusing without implying corruption.
 123 Compiler optimizations can cause surprising behavior.  For example, when
 124 a function has called another function as its final action (a @dfn{tail
 125 call}), the calling function may not appear in a backtrace at all.
 126
 127 @menu
 128 * Backtrace Example::
 129 @end menu
 130
 131 @node Backtrace Example
 132 @subsection Example
 133
 134 Here's an example.  Suppose that Pintos printed out this following call
 135 stack, which is taken from an actual Pintos submission for the file
 136 system project:
 137
 138 @example
 139 Call stack: 0xc0106eff 0xc01102fb 0xc010dc22 0xc010cf67 0xc0102319
 140 0xc010325a 0x804812c 0x8048a96 0x8048ac8.
 141 @end example
 142
 143 You would then invoke the @command{backtrace} utility like shown below,
 144 cutting and pasting the backtrace information into the command line.
 145 This assumes that @file{kernel.o} is in the current directory.  You
 146 would of course enter all of the following on a single shell command
 147 line, even though that would overflow our margins here:
 148
 149 @example
 150 backtrace kernel.o 0xc0106eff 0xc01102fb 0xc010dc22 0xc010cf67
 151 0xc0102319 0xc010325a 0x804812c 0x8048a96 0x8048ac8
 152 @end example
 153
 154 The backtrace output would then look something like this:
 155
 156 @example
 157 0xc0106eff: debug_panic (../../lib/debug.c:86)
 158 0xc01102fb: file_seek (../../filesys/file.c:405)
 159 0xc010dc22: seek (../../userprog/syscall.c:744)
 160 0xc010cf67: syscall_handler (../../userprog/syscall.c:444)
 161 0xc0102319: intr_handler (../../threads/interrupt.c:334)
 162 0xc010325a: ?? (threads/intr-stubs.S:1554)
 163 0x804812c: ?? (??:0)
 164 0x8048a96: ?? (??:0)
 165 0x8048ac8: ?? (??:0)
 166 @end example
 167
 168 (You will probably not get the same results if you run the command above
 169 on your own kernel binary, because the source code you compiled from is
 170 different from the source code that panicked.)
 171
 172 The first line in the backtrace refers to @func{debug_panic}, the
 173 function that implements kernel panics.  Because backtraces commonly
 174 result from kernel panics, @func{debug_panic} will often be the first
 175 function shown in a backtrace.
 176
 177 The second line shows @func{file_seek} as the function that panicked,
 178 in this case as the result of an assertion failure.  In the source code
 179 tree used for this example, line 405 of @file{filesys/file.c} is the
 180 assertion
 181
 182 @example
 183 ASSERT (file_ofs >= 0);
 184 @end example
 185
 186 @noindent
 187 (This line was also cited in the assertion failure message.)
 188 Thus, @func{file_seek} panicked because it passed a negative file offset
 189 argument.
 190
 191 The third line indicates that @func{seek} called @func{file_seek},
 192 presumably without validating the offset argument.  In this submission,
 193 @func{seek} implements the @code{seek} system call.
 194
 195 The fourth line shows that @func{syscall_handler}, the system call
 196 handler, invoked @func{seek}.
 197
 198 The fifth and sixth lines are the interrupt handler entry path.
 199
 200 The remaining lines are for addresses below @code{PHYS_BASE}.  This
 201 means that they refer to addresses in the user program, not in the
 202 kernel.  If you know what user program was running when the kernel
 203 panicked, you can re-run @command{backtrace} on the user program, like
 204 so: (typing the command on a single line, of course):
 205
 206 @example
 207 backtrace grow-too-big 0xc0106eff 0xc01102fb 0xc010dc22 0xc010cf67
 208 0xc0102319 0xc010325a 0x804812c 0x8048a96 0x8048ac8
 209 @end example
 210
 211 The results look like this:
 212
 213 @example
 214 0xc0106eff: ?? (??:0)
 215 0xc01102fb: ?? (??:0)
 216 0xc010dc22: ?? (??:0)
 217 0xc010cf67: ?? (??:0)
 218 0xc0102319: ?? (??:0)
 219 0xc010325a: ?? (??:0)
 220 0x804812c: test_main (../../tests/filesys/extended/grow-too-big.c:20)
 221 0x8048a96: main (../../tests/main.c:10)
 222 0x8048ac8: _start (../../lib/user/entry.c:9)
 223 @end example
 224
 225 Here's an extra tip for anyone who read this far: @command{backtrace}
 226 is smart enough to strip the @code{Call stack:} header and @samp{.}
 227 trailer from the command line if you include them.  This can save you
 228 a little bit of trouble in cutting and pasting.  Thus, the following
 229 command prints the same output as the first one we used:
 230
 231 @example
 232 backtrace kernel.o Call stack: 0xc0106eff 0xc01102fb 0xc010dc22
 233 0xc010cf67 0xc0102319 0xc010325a 0x804812c 0x8048a96 0x8048ac8.
 234 @end example
 235
 236 @node Automatic Race Detection with Checkbochs
 237 @section Automatic Race Detection with Checkbochs
 238
 239 In the Pintos projects, especially projects 3 and 4, proper
 240 synchronization is essential for avoiding race conditions.  There is
 241 no substitute for good design, but sometimes tools can help with
 242 debugging an implementation.
 243
 244 We provide a tool called Checkbochs to help find race conditions in
 245 your implementation.  Checkbochs is a version of Bochs which has been
 246 modified by Sorav Bansal to implement the Eraser race-detection
 247 algorithm (@bibref{Eraser}).
 248
 249 Before you can use Checkbochs to to detect data races in your code,
 250 you must first apply a patch to your Pintos source tree.  FIXME
 251
 252 Once you have applied the Checkbochs patch, you can use Checkbochs
 253 simply by adding @option{--checkbochs} to the @command{pintos} command
 254 line.  You can usefully use @option{--checkbochs} in conjunction with
 255 @option{--monitor} or @option{--gdb}, but not @option{--qemu} or
 256 @option{--gsx}.  The only immediate effect of @option{--checkbochs} is
 257 that your run will produce an extra log file, called
 258 @file{checkbochs.log} by default (use @option{--logfile} to change
 259 it).  The log file is a plaintext log of accesses to memory locations
 260 shared by multiple threads.
 261
 262 After you have a log file, use @command{checkbochs-trace} to interpret
 263 it.  First run it without any options, which causes it to report how
 264 many potential race conditions were detected.  If any were detected,
 265 it also tells you the address of the data involved in the race and
 266 writes backtraces for access to data for the first race it detected to
 267 @file{btrace}.  If more than one race was detected, you can request
 268 information about the @var{n}th race, instead, using @option{-n
 269 @var{n}}.
 270
 271 The @file{btrace} file lists a backtrace for each access to the data
 272 item implicated in a race.  Each backtrace is listed across several
 273 lines.  Each backtrace line is divided into several columns containing
 274 a source file name, function name, and line number.  The first line in
 275 each backtrace also identifies the thread (as the physical address of
 276 its @struct{thread}, followed by the state transition of the memory
 277 location in the format @samp{<OldState>-><NewState>}. More
 278 information on the state transition can be found in @bibref{Eraser}.
 279
 280 You can save the trouble of reading the backtraces and looking up
 281 their source file location by hand by using the @option{--cscope}
 282 option to @command{checkbochs-trace}.  This brings up the backtrace
 283 info in @command{cscope}, a program for interactively exploring code,
 284 which lets you quickly view the source lines implicated in each
 285 backtrace.
 286
 287 @menu
 288 * Debugging Race Conditions::
 289 * Checkbochs Limitations::
 290 @end menu
 291
 292 @node Debugging Race Conditions
 293 @subsection Debugging Race Conditions
 294
 295 Once you know that there is a race condition at memory location
 296 @var{addr}, you need to know at least two things to debug it.  First,
 297 you need to know what data corresponds to @var{addr}.  If it is global
 298 or @code{static} data, then @command{checkbochs-trace} will normally
 299 print the variable's name.  Otherwise, for automatic (local) or
 300 heap-allocated data, you will have to figure it out by looking at the
 301 backtrace.
 302
 303 Second, you need to know the various points in time during code
 304 execution at which the data variable was accessed, and which thread
 305 made each access.  You can figure this out from the backtraces:
 306 @file{btrace} contains both the types of accesses made and the tids of
 307 the accessing threads.
 308
 309 @node Checkbochs Limitations
 310 @subsection Limitations
 311
 312 Checkbochs can be useful for debugging race conditions, but it has
 313 several limitations.
 314
 315 Checkbochs is subject to false negatives, that is, it will not find
 316 all potential races.  For example, Checkbochs will not detect races
 317 that result from failing to recheck a condition that might have
 318 changed while a lock was released.  Suppose that @code{lock} is a lock
 319 that controls acquisition and release of a resource, such as a page of
 320 memory.  The following code then has a race, because @code{owner} is
 321 not re-checked after releasing and re-acquiring @code{lock}.
 322 Checkbochs fails to detect this race because @code{lock} is always
 323 held when @code{owner} is accessed:
 324
 325 @example
 326 struct thread *owner = NULL; /* Initially no owner. */
 327 struct lock lock;            /* Controls changes to @var{owner}. */
 328
 329 @dots{}
 330
 331 lock_acquire (&lock);
 332 if (owner == NULL)
 333   @{
 334     lock_release (&lock);
 335     @dots{}
 336     lock_acquire (&lock);
 337     owner = thread_current ();
 338     lock_release (&lock);
 339   @}
 340 @end example
 341
 342 Checkbochs is also subject to false positives, that is, not every race
 343 it reports is really a race.  (This is why they have been called
 344 ``potential'' races throughout this section.)  This will occur when
 345 accesses to data are synchronized in a way not visible to Checkbochs.
 346 For example, Checkbochs does not understand all uses of semaphores,
 347 so synchronization via semaphores may be reported as races.
 348
 349 Checkbochs is a dynamic checker, meaning that it actually runs the
 350 code that it checks.  This also means that code that doesn't get run
 351 doesn't get checked, so you can also get false negatives that way.
 352
 353 @node gdb
 354 @section @command{gdb}
 355
 356 You can run the Pintos kernel under the supervision of the
 357 @command{gdb} (80@var{x}86) or @command{i386-elf-gdb} (SPARC)
 358 debugger.  First,
 359 start Pintos with the @option{--gdb} option, e.g.@: @command{pintos
 360 --gdb -- run mytest}.  Second, in a separate terminal, invoke @command{gdb} (or
 361 @command{i386-elf-gdb}) on
 362 @file{kernel.o}:
 363 @example
 364 gdb kernel.o
 365 @end example
 366 @noindent and issue the following @command{gdb} command:
 367 @example
 368 target remote localhost:1234
 369 @end example
 370
 371 Now @command{gdb} is connected to the simulator over a local
 372 network connection.  You can now issue any normal @command{gdb}
 373 commands.  If you issue the @samp{c} command, the simulated BIOS will take
 374 control, load Pintos, and then Pintos will run in the usual way.  You
 375 can pause the process at any point with @key{Ctrl+C}.  If you want
 376 @command{gdb} to stop when Pintos starts running, set a breakpoint on
 377 @func{main} with the command @code{break main} before @samp{c}.
 378
 379 You can read the @command{gdb} manual by typing @code{info gdb} at a
 380 terminal command prompt, or you can view it in Emacs with the command
 381 @kbd{C-h i}.  Here's a few commonly useful @command{gdb} commands:
 382
 383 @table @code
 384 @item c
 385 Continues execution until @key{Ctrl+C} or the next breakpoint.
 386
 387 @item break @var{function}
 388 @itemx break @var{filename}:@var{linenum}
 389 @itemx break *@var{address}
 390 Sets a breakpoint at the given function, line number, or address.
 391 (Use a @samp{0x} prefix to specify an address in hex.)
 392
 393 @item p @var{expression}
 394 Evaluates the given C expression and prints its value.
 395 If the expression contains a function call, that function will actually
 396 be executed.
 397
 398 @item l *@var{address}
 399 Lists a few lines of code around the given address.
 400 (Use a @samp{0x} prefix to specify an address in hex.)
 401
 402 @item bt
 403 Prints a stack backtrace similar to that output by the
 404 @command{backtrace} program described above.
 405
 406 @item p/a @var{address}
 407 Prints the name of the function or variable that occupies the given
 408 address.
 409 (Use a @samp{0x} prefix to specify an address in hex.)
 410
 411 @item diassemble @var{function}
 412 Disassembles the specified @var{function}.
 413 @end table
 414
 415 If you notice other strange behavior while using @command{gdb}, there
 416 are three possibilities: a bug in your
 417 modified Pintos, a bug in Bochs's
 418 interface to @command{gdb} or in @command{gdb} itself, or
 419 a bug in the original Pintos code.  The first and second
 420 are quite likely, and you should seriously consider both.  We hope
 421 that the third is less likely, but it is also possible.
 422
 423 You can also use @command{gdb} to debug a user program running under
 424 Pintos.  Start by issuing this @command{gdb} command to load the
 425 program's symbol table:
 426 @example
 427 add-symbol-file @var{program}
 428 @end example
 429 @noindent
 430 where @var{program} is the name of the program's executable (in the host
 431 file system, not in the Pintos file system).  After this, you should be
 432 able to debug the user program the same way you would the kernel, by
 433 placing breakpoints, inspecting data, etc.  Your actions apply to every
 434 user program running in Pintos, not just to the one you want to debug,
 435 so be careful in interpreting the results.  Also, a name that appears in
 436 both the kernel and the user program will actually refer to the kernel
 437 name.  (The latter problem can be avoided by giving the user executable
 438 name on the @command{gdb} command line, instead of @file{kernel.o}.)
 439
 440 @node Debugging by Infinite Loop
 441 @section Debugging by Infinite Loop
 442
 443 If you get yourself into a situation where the machine reboots in a
 444 loop, that's probably a ``triple fault.''  In such a situation you
 445 might not be able to use @func{printf} for debugging, because the
 446 reboots might be happening even before everything needed for
 447 @func{printf} is initialized.  In such a situation, you might want to
 448 try what I call ``debugging by infinite loop.''
 449
 450 What you do is pick a place in the Pintos code, insert the statement
 451 @code{for (;;);} there, and recompile and run.  There are two likely
 452 possibilities:
 453
 454 @itemize @bullet
 455 @item
 456 The machine hangs without rebooting.  If this happens, you know that
 457 the infinite loop is running.  That means that whatever caused the
 458 reboot must be @emph{after} the place you inserted the infinite loop.
 459 Now move the infinite loop later in the code sequence.
 460
 461 @item
 462 The machine reboots in a loop.  If this happens, you know that the
 463 machine didn't make it to the infinite loop.  Thus, whatever caused the
 464 reboot must be @emph{before} the place you inserted the infinite loop.
 465 Now move the infinite loop earlier in the code sequence.
 466 @end itemize
 467
 468 If you move around the infinite loop in a ``binary search'' fashion, you
 469 can use this technique to pin down the exact spot that everything goes
 470 wrong.  It should only take a few minutes at most.
 471
 472 @node Modifying Bochs
 473 @section Modifying Bochs
 474
 475 An advanced debugging technique is to modify and recompile the
 476 simulator.  This proves useful when the simulated hardware has more
 477 information than it makes available to the OS.  For example, page
 478 faults have a long list of potential causes, but the hardware does not
 479 report to the OS exactly which one is the particular cause.
 480 Furthermore, a bug in the kernel's handling of page faults can easily
 481 lead to recursive faults, but a ``triple fault'' will cause the CPU to
 482 reset itself, which is hardly conducive to debugging.
 483
 484 In a case like this, you might appreciate being able to make Bochs
 485 print out more debug information, such as the exact type of fault that
 486 occurred.  It's not very hard.  You start by retrieving the source
 487 code for Bochs 2.1.1 from @uref{http://bochs.sourceforge.net} and
 488 extracting it into a directory.  Then read
 489 @file{pintos/src/misc/bochs-2.1.1.patch} and apply the patches needed.
 490 Then run @file{./configure}, supplying the options you want (some
 491 suggestions are in the patch file).  Finally, run @command{make}.
 492 This will compile Bochs and eventually produce a new binary
 493 @file{bochs}.  To use your @file{bochs} binary with @command{pintos},
 494 put it in your @env{PATH}, and make sure that it is earlier than
 495 @file{/usr/class/cs140/`uname -m`/bochs}.
 496
 497 Of course, to get any good out of this you'll have to actually modify
 498 Bochs.  Instructions for doing this are firmly out of the scope of
 499 this document.  However, if you want to debug page faults as suggested
 500 above, a good place to start adding @func{printf}s is
 501 @func{BX_CPU_C::dtranslate_linear} in @file{cpu/paging.cc}.
 502
 503 @node Debugging Tips
 504 @section Tips
 505
 506 The page allocator in @file{threads/palloc.c} and the block allocator in
 507 @file{threads/malloc.c} both clear all the bytes in pages and blocks to
 508 @t{0xcc} when they are freed.  Thus, if you see an attempt to
 509 dereference a pointer like @t{0xcccccccc}, or some other reference to
 510 @t{0xcc}, there's a good chance you're trying to reuse a page that's
 511 already been freed.  Also, byte @t{0xcc} is the CPU opcode for ``invoke
 512 interrupt 3,'' so if you see an error like @code{Interrupt 0x03 (#BP
 513 Breakpoint Exception)}, Pintos tried to execute code in a freed page or
 514 block.
 515
 516 An assertion failure on the expression @code{sec_no < d->capacity}
 517 indicates that Pintos tried to access a file through an inode that has
 518 been closed and freed.  Freeing an inode clears its starting sector
 519 number to @t{0xcccccccc}, which is not a valid sector number for disks
 520 smaller than about 1.6 TB.