pintos-os.org Git - pintos-anon/blob - doc/debug.texi

   1 @node Debugging Tools
   2 @appendix Debugging Tools
   3
   4 Many tools lie at your disposal for debugging Pintos.  This appendix
   5 introduces you to a few of them.
   6
   7 @menu
   8 * printf::
   9 * ASSERT::
  10 * Function and Parameter Attributes::
  11 * Backtraces::
  12 * GDB::
  13 * Triple Faults::
  14 * Modifying Bochs::
  15 * Debugging Tips::
  16 @end menu
  17
  18 @node printf
  19 @section @code{printf()}
  20
  21 Don't underestimate the value of @func{printf}.  The way
  22 @func{printf} is implemented in Pintos, you can call it from
  23 practically anywhere in the kernel, whether it's in a kernel thread or
  24 an interrupt handler, almost regardless of what locks are held (but see
  25 @ref{printf Reboots} for a counterexample).
  26
  27 @func{printf} is useful for more than just examining data.
  28 It can also help figure out when and where something goes wrong, even
  29 when the kernel crashes or panics without a useful error message.  The
  30 strategy is to sprinkle calls to @func{print} with different strings
  31 (e.g.@: @code{"<1>"}, @code{"<2>"}, @dots{}) throughout the pieces of
  32 code you suspect are failing.  If you don't even see @code{<1>} printed,
  33 then something bad happened before that point, if you see @code{<1>}
  34 but not @code{<2>}, then something bad happened between those two
  35 points, and so on.  Based on what you learn, you can then insert more
  36 @func{printf} calls in the new, smaller region of code you suspect.
  37 Eventually you can narrow the problem down to a single statement.
  38 @xref{Triple Faults}, for a related technique.
  39
  40 @node ASSERT
  41 @section @code{ASSERT}
  42
  43 Assertions are useful because they can catch problems early, before
  44 they'd otherwise be noticed.  Ideally, each function should begin with a
  45 set of assertions that check its arguments for validity.  (Initializers
  46 for functions' local variables are evaluated before assertions are
  47 checked, so be careful not to assume that an argument is valid in an
  48 initializer.)  You can also sprinkle assertions throughout the body of
  49 functions in places where you suspect things are likely to go wrong.
  50 They are especially useful for checking loop invariants.
  51
  52 Pintos provides the @code{ASSERT} macro, defined in @file{<debug.h>},
  53 for checking assertions.
  54
  55 @defmac ASSERT (expression)
  56 Tests the value of @var{expression}.  If it evaluates to zero (false),
  57 the kernel panics.  The panic message includes the expression that
  58 failed, its file and line number, and a backtrace, which should help you
  59 to find the problem.  @xref{Backtraces}, for more information.
  60 @end defmac
  61
  62 @node Function and Parameter Attributes
  63 @section Function and Parameter Attributes
  64
  65 These macros defined in @file{<debug.h>} tell the compiler special
  66 attributes of a function or function parameter.  Their expansions are
  67 GCC-specific.
  68
  69 @defmac UNUSED
  70 Appended to a function parameter to tell the compiler that the
  71 parameter might not be used within the function.  It suppresses the
  72 warning that would otherwise appear.
  73 @end defmac
  74
  75 @defmac NO_RETURN
  76 Appended to a function prototype to tell the compiler that the
  77 function never returns.  It allows the compiler to fine-tune its
  78 warnings and its code generation.
  79 @end defmac
  80
  81 @defmac NO_INLINE
  82 Appended to a function prototype to tell the compiler to never emit
  83 the function in-line.  Occasionally useful to improve the quality of
  84 backtraces (see below).
  85 @end defmac
  86
  87 @defmac PRINTF_FORMAT (@var{format}, @var{first})
  88 Appended to a function prototype to tell the compiler that the function
  89 takes a @func{printf}-like format string as the argument numbered
  90 @var{format} (starting from 1) and that the corresponding value
  91 arguments start at the argument numbered @var{first}.  This lets the
  92 compiler tell you if you pass the wrong argument types.
  93 @end defmac
  94
  95 @node Backtraces
  96 @section Backtraces
  97
  98 When the kernel panics, it prints a ``backtrace,'' that is, a summary
  99 of how your program got where it is, as a list of addresses inside the
 100 functions that were running at the time of the panic.  You can also
 101 insert a call to @func{debug_backtrace}, prototyped in
 102 @file{<debug.h>}, to print a backtrace at any point in your code.
 103
 104 The addresses in a backtrace are listed as raw hexadecimal numbers,
 105 which are difficult to interpret.  We provide a tool called
 106 @command{backtrace} to translate these into function names and source
 107 file line numbers.
 108 Give it the name of your @file{kernel.o} as the first argument and the
 109 hexadecimal numbers composing the backtrace (including the @samp{0x}
 110 prefixes) as the remaining arguments.  It outputs the function name
 111 and source file line numbers that correspond to each address.
 112
 113 If the translated form of a backtrace is garbled, or doesn't make
 114 sense (e.g.@: function A is listed above function B, but B doesn't
 115 call A), then it's a good sign that you're corrupting a kernel
 116 thread's stack, because the backtrace is extracted from the stack.
 117 Alternatively, it could be that the @file{kernel.o} you passed to
 118 @command{backtrace} does not correspond to the kernel that produced
 119 the backtrace.
 120
 121 Sometimes backtraces can be confusing without implying corruption.
 122 Compiler optimizations can cause surprising behavior.  When a function
 123 has called another function as its final action (a @dfn{tail call}), the
 124 calling function may not appear in a backtrace at all.  Similarly, when
 125 function A calls another function B that never returns, the compiler may
 126 optimize such that an unrelated function C appears in the backtrace
 127 instead of A.  Function C is simply the function that happens to be in
 128 memory just after A.  In the threads project, this is commonly seen in
 129 backtraces for test failures; see @ref{The pass function fails, ,
 130 @func{pass} Fails}, for more information.
 131
 132 @menu
 133 * Backtrace Example::
 134 @end menu
 135
 136 @node Backtrace Example
 137 @subsection Example
 138
 139 Here's an example.  Suppose that Pintos printed out this following call
 140 stack, which is taken from an actual Pintos submission for the file
 141 system project:
 142
 143 @example
 144 Call stack: 0xc0106eff 0xc01102fb 0xc010dc22 0xc010cf67 0xc0102319
 145 0xc010325a 0x804812c 0x8048a96 0x8048ac8.
 146 @end example
 147
 148 You would then invoke the @command{backtrace} utility like shown below,
 149 cutting and pasting the backtrace information into the command line.
 150 This assumes that @file{kernel.o} is in the current directory.  You
 151 would of course enter all of the following on a single shell command
 152 line, even though that would overflow our margins here:
 153
 154 @example
 155 backtrace kernel.o 0xc0106eff 0xc01102fb 0xc010dc22 0xc010cf67
 156 0xc0102319 0xc010325a 0x804812c 0x8048a96 0x8048ac8
 157 @end example
 158
 159 The backtrace output would then look something like this:
 160
 161 @example
 162 0xc0106eff: debug_panic (../../lib/debug.c:86)
 163 0xc01102fb: file_seek (../../filesys/file.c:405)
 164 0xc010dc22: seek (../../userprog/syscall.c:744)
 165 0xc010cf67: syscall_handler (../../userprog/syscall.c:444)
 166 0xc0102319: intr_handler (../../threads/interrupt.c:334)
 167 0xc010325a: ?? (threads/intr-stubs.S:1554)
 168 0x804812c: ?? (??:0)
 169 0x8048a96: ?? (??:0)
 170 0x8048ac8: ?? (??:0)
 171 @end example
 172
 173 (You will probably not see exactly the same addresses if you run the
 174 command above on your own kernel binary, because the source code you
 175 compiled and the compiler you used are probably different.)
 176
 177 The first line in the backtrace refers to @func{debug_panic}, the
 178 function that implements kernel panics.  Because backtraces commonly
 179 result from kernel panics, @func{debug_panic} will often be the first
 180 function shown in a backtrace.
 181
 182 The second line shows @func{file_seek} as the function that panicked,
 183 in this case as the result of an assertion failure.  In the source code
 184 tree used for this example, line 405 of @file{filesys/file.c} is the
 185 assertion
 186
 187 @example
 188 ASSERT (file_ofs >= 0);
 189 @end example
 190
 191 @noindent
 192 (This line was also cited in the assertion failure message.)
 193 Thus, @func{file_seek} panicked because it passed a negative file offset
 194 argument.
 195
 196 The third line indicates that @func{seek} called @func{file_seek},
 197 presumably without validating the offset argument.  In this submission,
 198 @func{seek} implements the @code{seek} system call.
 199
 200 The fourth line shows that @func{syscall_handler}, the system call
 201 handler, invoked @func{seek}.
 202
 203 The fifth and sixth lines are the interrupt handler entry path.
 204
 205 The remaining lines are for addresses below @code{PHYS_BASE}.  This
 206 means that they refer to addresses in the user program, not in the
 207 kernel.  If you know what user program was running when the kernel
 208 panicked, you can re-run @command{backtrace} on the user program, like
 209 so: (typing the command on a single line, of course):
 210
 211 @example
 212 backtrace grow-too-big 0xc0106eff 0xc01102fb 0xc010dc22 0xc010cf67
 213 0xc0102319 0xc010325a 0x804812c 0x8048a96 0x8048ac8
 214 @end example
 215
 216 The results look like this:
 217
 218 @example
 219 0xc0106eff: ?? (??:0)
 220 0xc01102fb: ?? (??:0)
 221 0xc010dc22: ?? (??:0)
 222 0xc010cf67: ?? (??:0)
 223 0xc0102319: ?? (??:0)
 224 0xc010325a: ?? (??:0)
 225 0x804812c: test_main (../../tests/filesys/extended/grow-too-big.c:20)
 226 0x8048a96: main (../../tests/main.c:10)
 227 0x8048ac8: _start (../../lib/user/entry.c:9)
 228 @end example
 229
 230 Here's an extra tip for anyone who read this far: @command{backtrace}
 231 is smart enough to strip the @code{Call stack:} header and @samp{.}
 232 trailer from the command line if you include them.  This can save you
 233 a little bit of trouble in cutting and pasting.  Thus, the following
 234 command prints the same output as the first one we used:
 235
 236 @example
 237 backtrace kernel.o Call stack: 0xc0106eff 0xc01102fb 0xc010dc22
 238 0xc010cf67 0xc0102319 0xc010325a 0x804812c 0x8048a96 0x8048ac8.
 239 @end example
 240
 241 @node GDB
 242 @section GDB
 243
 244 You can run Pintos under the supervision of the GDB debugger.
 245 First, start Pintos with the @option{--gdb} option, e.g.@:
 246 @command{pintos --gdb -- run mytest}.  Second, open a second terminal on
 247 the same machine and
 248 use @command{pintos-gdb} to invoke GDB on
 249 @file{kernel.o}:@footnote{@command{pintos-gdb} is a wrapper around
 250 @command{gdb} (80@var{x}86) or @command{i386-elf-gdb} (SPARC) that loads
 251 the Pintos macros at startup.}
 252 @example
 253 pintos-gdb kernel.o
 254 @end example
 255 @noindent and issue the following GDB command:
 256 @example
 257 target remote localhost:1234
 258 @end example
 259
 260 Now GDB is connected to the simulator over a local
 261 network connection.  You can now issue any normal GDB
 262 commands.  If you issue the @samp{c} command, the simulated BIOS will take
 263 control, load Pintos, and then Pintos will run in the usual way.  You
 264 can pause the process at any point with @key{Ctrl+C}.
 265
 266 @menu
 267 * Using GDB::
 268 * Example GDB Session::
 269 * Debugging User Programs::
 270 * GDB FAQ::
 271 @end menu
 272
 273 @node Using GDB
 274 @subsection Using GDB
 275
 276 You can read the GDB manual by typing @code{info gdb} at a
 277 terminal command prompt.  Here's a few commonly useful GDB commands:
 278
 279 @deffn {GDB Command} c
 280 Continues execution until @key{Ctrl+C} or the next breakpoint.
 281 @end deffn
 282
 283 @deffn {GDB Command} break function
 284 @deffnx {GDB Command} break file:line
 285 @deffnx {GDB Command} break *address
 286 Sets a breakpoint at @var{function}, at @var{line} within @var{file}, or
 287 @var{address}.
 288 (Use a @samp{0x} prefix to specify an address in hex.)
 289
 290 Use @code{break main} to make GDB stop when Pintos starts running.
 291 @end deffn
 292
 293 @deffn {GDB Command} p expression
 294 Evaluates the given @var{expression} and prints its value.
 295 If the expression contains a function call, that function will actually
 296 be executed.
 297 @end deffn
 298
 299 @deffn {GDB Command} l *address
 300 Lists a few lines of code around @var{address}.
 301 (Use a @samp{0x} prefix to specify an address in hex.)
 302 @end deffn
 303
 304 @deffn {GDB Command} bt
 305 Prints a stack backtrace similar to that output by the
 306 @command{backtrace} program described above.
 307 @end deffn
 308
 309 @deffn {GDB Command} p/a address
 310 Prints the name of the function or variable that occupies @var{address}.
 311 (Use a @samp{0x} prefix to specify an address in hex.)
 312 @end deffn
 313
 314 @deffn {GDB Command} diassemble function
 315 Disassembles @var{function}.
 316 @end deffn
 317
 318 We also provide a set of macros specialized for debugging Pintos,
 319 written by Godmar Back @email{gback@@cs.vt.edu}.  You can type
 320 @code{help user-defined} for basic help with the macros.  Here is an
 321 overview of their functionality, based on Godmar's documentation:
 322
 323 @deffn {GDB Macro} debugpintos
 324 Attach debugger to a waiting pintos process on the same machine.
 325 Shorthand for @code{target remote localhost:1234}.
 326 @end deffn
 327
 328 @deffn {GDB Macro} dumplist list type element
 329 Prints the elements of @var{list}, which should be a @code{struct} list
 330 that contains elements of the given @var{type} (without the word
 331 @code{struct}) in which @var{element} is the @struct{list_elem} member
 332 that links the elements.
 333
 334 Example: @code{dumplist all_list thread all_elem} prints all elements of
 335 @struct{thread} that are linked in @code{struct list all_list} using the
 336 @code{struct list_elem all_elem} which is part of @struct{thread}.
 337 @end deffn
 338
 339 @deffn {GDB Macro} btthread thread
 340 Shows the backtrace of @var{thread}, which is a pointer to the
 341 @struct{thread} of the thread whose backtrace it should show.  For the
 342 current thread, this is identical to the @code{bt} (backtrace) command.
 343 It also works for any thread suspended in @func{schedule},
 344 provided you know where its kernel stack page is located.
 345 @end deffn
 346
 347 @deffn {GDB Macro} btthreadlist list element
 348 Shows the backtraces of all threads in @var{list}, the @struct{list} in
 349 which the threads are kept.  Specify @var{element} as the
 350 @struct{list_elem} field used inside @struct{thread} to link the threads
 351 together.
 352
 353 Example: @code{btthreadlist all_list all_elem} shows the backtraces of
 354 all threads contained in @code{struct list all_list}, linked together by
 355 @code{all_elem}.  This command is useful to determine where your threads
 356 are stuck when a deadlock occurs.  Please see the example scenario below.
 357 @end deffn
 358
 359 @deffn {GDB Macro} btpagefault
 360 Print a backtrace of the current thread after a page fault exception.
 361 Normally, when a page fault exception occurs, GDB will stop
 362 with a message that might say:
 363
 364 @example
 365 Program received signal 0, Signal 0.
 366 0xc0102320 in intr0e_stub ()
 367 @end example
 368
 369 In that case, the @code{bt} command might not give a useful
 370 backtrace.  Use @code{btpagefault} instead.
 371
 372 You may also use @code{btpagefault} for page faults that occur in a user
 373 process.  In this case, you may also wish to load the user program's
 374 symbol table (@pxref{Debugging User Programs}).
 375 @end deffn
 376
 377 @deffn {GDB Macro} hook-stop
 378 GDB invokes this macro every time the simulation stops, which Bochs will
 379 do for every processor exception, among other reasons.  If the
 380 simulation stops due to a page fault, @code{hook-stop} will print a
 381 message that says and explains further whether the page fault occurred
 382 in the kernel or in user code.
 383
 384 If the exception occurred from user code, @code{hook-stop} will say:
 385 @example
 386 pintos-debug: a page fault exception occurred in user mode
 387 pintos-debug: hit 'c' to continue, or 's' to step to intr_handler
 388 @end example
 389
 390 In Project 2, a page fault in a user process leads to the termination of
 391 the process.  You should expect those page faults to occur in the
 392 robustness tests where we test that your kernel properly terminates
 393 processes that try to access invalid addresses.  To debug those, set a
 394 break point in @func{page_fault} in @file{exception.c}, which you will
 395 need to modify accordingly.
 396
 397 In Project 3, a page fault in a user process no longer automatically
 398 leads to the termination of a process.  Instead, it may require reading in
 399 data for the page the process was trying to access, either
 400 because it was swapped out or because this is the first time it's
 401 accessed.  In either case, you will reach @func{page_fault} and need to
 402 take the appropriate action there.
 403
 404 If the page fault did not occur in user mode while executing a user
 405 process, then it occurred in kernel mode while executing kernel code.
 406 In this case, @code{hook-stop} will print this message:
 407 @example
 408 pintos-debug: a page fault occurred in kernel mode
 409 @end example
 410 followed by the output of the @code{btpagefault} command.
 411
 412 Before Project 3, a page fault exception in kernel code is always a bug
 413 in your kernel, because your kernel should never crash.  Starting with
 414 Project 3, the situation will change if you use @func{get_user} and
 415 @func{put_user} strategy to verify user memory accesses
 416 (@pxref{Accessing User Memory}).
 417 @end deffn
 418
 419 @node Example GDB Session
 420 @subsection Example GDB Session
 421
 422 This section narrates a sample GDB session, provided by Godmar Back.
 423 This example illustrates how one might debug a Project 1 solution in
 424 which occasionally a thread that calls @func{timer_sleep} is not woken
 425 up.  With this bug, tests such as @code{mlfqs_load_1} get stuck.
 426
 427 This session was captured with a slightly older version of Bochs and the
 428 GDB macros for Pintos, so it looks slightly different than it would now.
 429 Program output is shown in normal type, user input in @strong{strong}
 430 type.
 431
 432 First, I start Pintos:
 433
 434 @smallexample
 435 $ @strong{pintos -v --gdb -- -q -mlfqs run mlfqs-load-1}
 436 Writing command line to /tmp/gDAlqTB5Uf.dsk...
 437 bochs -q
 438 ========================================================================
 439                        Bochs x86 Emulator 2.2.5
 440              Build from CVS snapshot on December 30, 2005
 441 ========================================================================
 442 00000000000i[     ] reading configuration from bochsrc.txt
 443 00000000000i[     ] Enabled gdbstub
 444 00000000000i[     ] installing nogui module as the Bochs GUI
 445 00000000000i[     ] using log file bochsout.txt
 446 Waiting for gdb connection on localhost:1234
 447 @end smallexample
 448
 449 @noindent Then, I open a second window on the same machine and start GDB:
 450
 451 @smallexample
 452 $ @strong{pintos-gdb kernel.o}
 453 GNU gdb Red Hat Linux (6.3.0.0-1.84rh)
 454 Copyright 2004 Free Software Foundation, Inc.
 455 GDB is free software, covered by the GNU General Public License, and you are
 456 welcome to change it and/or distribute copies of it under certain conditions.
 457 Type "show copying" to see the conditions.
 458 There is absolutely no warranty for GDB.  Type "show warranty" for details.
 459 This GDB was configured as "i386-redhat-linux-gnu"...
 460 Using host libthread_db library "/lib/libthread_db.so.1".
 461 @end smallexample
 462
 463 @noindent Then, I tell GDB to attach to the waiting Pintos emulator:
 464
 465 @smallexample
 466 (gdb) @strong{debugpintos}
 467 Remote debugging using localhost:1234
 468 0x0000fff0 in ?? ()
 469 Reply contains invalid hex digit 78
 470 @end smallexample
 471
 472 @noindent Now I tell Pintos to run by executing @code{c} (short for
 473 @code{continue}) twice:
 474
 475 @smallexample
 476 (gdb) @strong{c}
 477 Continuing.
 478 Reply contains invalid hex digit 78
 479 (gdb) @strong{c}
 480 Continuing.
 481 @end smallexample
 482
 483 @noindent Now Pintos will continue and output:
 484
 485 @smallexample
 486 Pintos booting with 4,096 kB RAM...
 487 Kernel command line: -q -mlfqs run mlfqs-load-1
 488 374 pages available in kernel pool.
 489 373 pages available in user pool.
 490 Calibrating timer...  102,400 loops/s.
 491 Boot complete.
 492 Executing 'mlfqs-load-1':
 493 (mlfqs-load-1) begin
 494 (mlfqs-load-1) spinning for up to 45 seconds, please wait...
 495 (mlfqs-load-1) load average rose to 0.5 after 42 seconds
 496 (mlfqs-load-1) sleeping for another 10 seconds, please wait...
 497 @end smallexample
 498
 499 @noindent
 500 @dots{}until it gets stuck because of the bug I had introduced.  I hit
 501 @key{Ctrl+C} in the debugger window:
 502
 503 @smallexample
 504 Program received signal 0, Signal 0.
 505 0xc010168c in next_thread_to_run () at ../../threads/thread.c:649
 506 649       while (i <= PRI_MAX && list_empty (&ready_list[i]))
 507 (gdb)
 508 @end smallexample
 509
 510 @noindent
 511 The thread that was running when I interrupted Pintos was the idle
 512 thread.  If I run @code{backtrace}, it shows this backtrace:
 513
 514 @smallexample
 515 (gdb) @strong{bt}
 516 #0  0xc010168c in next_thread_to_run () at ../../threads/thread.c:649
 517 #1  0xc0101778 in schedule () at ../../threads/thread.c:714
 518 #2  0xc0100f8f in thread_block () at ../../threads/thread.c:324
 519 #3  0xc0101419 in idle (aux=0x0) at ../../threads/thread.c:551
 520 #4  0xc010145a in kernel_thread (function=0xc01013ff , aux=0x0)
 521     at ../../threads/thread.c:575
 522 #5  0x00000000 in ?? ()
 523 @end smallexample
 524
 525 @noindent
 526 Not terribly useful.  What I really like to know is what's up with the
 527 other thread (or threads).  Since I keep all threads in a linked list
 528 called @code{all_list}, linked together by a @struct{list_elem} member
 529 named @code{all_elem}, I can use the @code{btthreadlist} macro from the
 530 macro library I wrote.  @code{btthreadlist} iterates through the list of
 531 threads and prints the backtrace for each thread:
 532
 533 @smallexample
 534 (gdb) @strong{btthreadlist all_list all_elem}
 535 pintos-debug: dumping backtrace of thread 'main' @@0xc002f000
 536 #0  0xc0101820 in schedule () at ../../threads/thread.c:722
 537 #1  0xc0100f8f in thread_block () at ../../threads/thread.c:324
 538 #2  0xc0104755 in timer_sleep (ticks=1000) at ../../devices/timer.c:141
 539 #3  0xc010bf7c in test_mlfqs_load_1 () at ../../tests/threads/mlfqs-load-1.c:49
 540 #4  0xc010aabb in run_test (name=0xc0007d8c "mlfqs-load-1")
 541     at ../../tests/threads/tests.c:50
 542 #5  0xc0100647 in run_task (argv=0xc0110d28) at ../../threads/init.c:281
 543 #6  0xc0100721 in run_actions (argv=0xc0110d28) at ../../threads/init.c:331
 544 #7  0xc01000c7 in main () at ../../threads/init.c:140
 545
 546 pintos-debug: dumping backtrace of thread 'idle' @@0xc0116000
 547 #0  0xc010168c in next_thread_to_run () at ../../threads/thread.c:649
 548 #1  0xc0101778 in schedule () at ../../threads/thread.c:714
 549 #2  0xc0100f8f in thread_block () at ../../threads/thread.c:324
 550 #3  0xc0101419 in idle (aux=0x0) at ../../threads/thread.c:551
 551 #4  0xc010145a in kernel_thread (function=0xc01013ff , aux=0x0)
 552     at ../../threads/thread.c:575
 553 #5  0x00000000 in ?? ()
 554 @end smallexample
 555
 556 @noindent
 557 In this case, there are only two threads, the idle thread and the main
 558 thread.  The kernel stack pages (to which the @struct{thread} points)
 559 are at @t{0xc0116000} and @t{0xc002f000}, respectively.  The main thread
 560 is stuck in @func{timer_sleep}, called from @code{test_mlfqs_load_1}.
 561
 562 Knowing where threads are stuck can be tremendously useful, for instance
 563 when diagnosing deadlocks or unexplained hangs.
 564
 565 @node Debugging User Programs
 566 @subsection Debugging User Programs
 567
 568 You can also use GDB to debug a user program running under
 569 Pintos.  Start by issuing this GDB command to load the
 570 program's symbol table:
 571 @example
 572 add-symbol-file @var{program}
 573 @end example
 574 @noindent
 575 where @var{program} is the name of the program's executable (in the host
 576 file system, not in the Pintos file system).  After this, you should be
 577 able to debug the user program the same way you would the kernel, by
 578 placing breakpoints, inspecting data, etc.  Your actions apply to every
 579 user program running in Pintos, not just to the one you want to debug,
 580 so be careful in interpreting the results.  Also, a name that appears in
 581 both the kernel and the user program will actually refer to the kernel
 582 name.  (The latter problem can be avoided by giving the user executable
 583 name on the GDB command line, instead of @file{kernel.o}, and then using
 584 @code{add-symbol-file} to load @file{kernel.o}.)
 585
 586 @node GDB FAQ
 587 @subsection FAQ
 588
 589 @table @asis
 590 @item GDB can't connect to Bochs.
 591
 592 If the @command{target remote} command fails, then make sure that both
 593 GDB and @command{pintos} are running on the same machine by
 594 running @command{hostname} in each terminal.  If the names printed
 595 differ, then you need to open a new terminal for GDB on the
 596 machine running @command{pintos}.
 597
 598 @item GDB doesn't recognize any of the macros.
 599
 600 If you start GDB with @command{pintos-gdb}, it should load the Pintos
 601 macros automatically.  If you start GDB some other way, then you must
 602 issue the command @code{source @var{pintosdir}/src/misc/gdb-macros},
 603 where @var{pintosdir} is the root of your Pintos directory, before you
 604 can use them.
 605
 606 @item Can I debug Pintos with DDD?
 607
 608 Yes, you can.  DDD invokes GDB as a subprocess, so you'll need to tell
 609 it to invokes @command{pintos-gdb} instead:
 610 @example
 611 ddd --gdb --debugger pintos-gdb
 612 @end example
 613
 614 @item Can I use GDB inside Emacs?
 615
 616 Yes, you can.  Emacs has special support for running GDB as a
 617 subprocess.  Type @kbd{M-x gdb} and enter your @command{pintos-gdb}
 618 command at the prompt.  The Emacs manual has information on how to use
 619 its debugging features in a section titled ``Debuggers.''
 620
 621 @item GDB is doing something weird.
 622
 623 If you notice strange behavior while using GDB, there
 624 are three possibilities: a bug in your
 625 modified Pintos, a bug in Bochs's
 626 interface to GDB or in GDB itself, or
 627 a bug in the original Pintos code.  The first and second
 628 are quite likely, and you should seriously consider both.  We hope
 629 that the third is less likely, but it is also possible.
 630 @end table
 631
 632 @node Triple Faults
 633 @section Triple Faults
 634
 635 When a CPU exception handler, such as a page fault handler, cannot be
 636 invoked because it is missing or defective, the CPU will try to invoke
 637 the ``double fault'' handler.  If the double fault handler is itself
 638 missing or defective, that's called a ``triple fault.''  A triple fault
 639 causes an immediate CPU reset.
 640
 641 Thus, if you get yourself into a situation where the machine reboots in
 642 a loop, that's probably a ``triple fault.''  In a triple fault
 643 situation, you might not be able to use @func{printf} for debugging,
 644 because the reboots might be happening even before everything needed for
 645 @func{printf} is initialized.
 646
 647 There are at least two ways to debug triple faults.  First, you can run
 648 Pintos in Bochs under GDB (@pxref{GDB}).  If Bochs has been built
 649 properly for Pintos, a triple fault under GDB will cause it to print the
 650 message ``Triple fault: stopping for gdb'' on the console and break into
 651 the debugger.  (If Bochs is not running under GDB, a triple fault will
 652 still cause it to reboot.)  You can then inspect where Pintos stopped,
 653 which is where the triple fault occurred.
 654
 655 Another option is what I call ``debugging by infinite loop.''
 656 Pick a place in the Pintos code, insert the infinite loop
 657 @code{for (;;);} there, and recompile and run.  There are two likely
 658 possibilities:
 659
 660 @itemize @bullet
 661 @item
 662 The machine hangs without rebooting.  If this happens, you know that
 663 the infinite loop is running.  That means that whatever caused the
 664 reboot must be @emph{after} the place you inserted the infinite loop.
 665 Now move the infinite loop later in the code sequence.
 666
 667 @item
 668 The machine reboots in a loop.  If this happens, you know that the
 669 machine didn't make it to the infinite loop.  Thus, whatever caused the
 670 reboot must be @emph{before} the place you inserted the infinite loop.
 671 Now move the infinite loop earlier in the code sequence.
 672 @end itemize
 673
 674 If you move around the infinite loop in a ``binary search'' fashion, you
 675 can use this technique to pin down the exact spot that everything goes
 676 wrong.  It should only take a few minutes at most.
 677
 678 @node Modifying Bochs
 679 @section Modifying Bochs
 680
 681 An advanced debugging technique is to modify and recompile the
 682 simulator.  This proves useful when the simulated hardware has more
 683 information than it makes available to the OS.  For example, page
 684 faults have a long list of potential causes, but the hardware does not
 685 report to the OS exactly which one is the particular cause.
 686 Furthermore, a bug in the kernel's handling of page faults can easily
 687 lead to recursive faults, but a ``triple fault'' will cause the CPU to
 688 reset itself, which is hardly conducive to debugging.
 689
 690 In a case like this, you might appreciate being able to make Bochs
 691 print out more debug information, such as the exact type of fault that
 692 occurred.  It's not very hard.  You start by retrieving the source
 693 code for Bochs 2.2.6 from @uref{http://bochs.sourceforge.net} and
 694 extracting it into a directory.  Then read
 695 @file{pintos/src/misc/bochs-2.2.6.README} and apply the patches needed.
 696 Then run @file{./configure}, supplying the options you want (some
 697 suggestions are in the patch file).  Finally, run @command{make}.
 698 This will compile Bochs and eventually produce a new binary
 699 @file{bochs}.  To use your @file{bochs} binary with @command{pintos},
 700 put it in your @env{PATH}, and make sure that it is earlier than
 701 @file{/usr/class/cs140/`uname -m`/bochs}.
 702
 703 Of course, to get any good out of this you'll have to actually modify
 704 Bochs.  Instructions for doing this are firmly out of the scope of
 705 this document.  However, if you want to debug page faults as suggested
 706 above, a good place to start adding @func{printf}s is
 707 @func{BX_CPU_C::dtranslate_linear} in @file{cpu/paging.cc}.
 708
 709 @node Debugging Tips
 710 @section Tips
 711
 712 The page allocator in @file{threads/palloc.c} and the block allocator in
 713 @file{threads/malloc.c} clear all the bytes in memory to
 714 @t{0xcc} at time of free.  Thus, if you see an attempt to
 715 dereference a pointer like @t{0xcccccccc}, or some other reference to
 716 @t{0xcc}, there's a good chance you're trying to reuse a page that's
 717 already been freed.  Also, byte @t{0xcc} is the CPU opcode for ``invoke
 718 interrupt 3,'' so if you see an error like @code{Interrupt 0x03 (#BP
 719 Breakpoint Exception)}, then Pintos tried to execute code in a freed page or
 720 block.
 721
 722 An assertion failure on the expression @code{sec_no < d->capacity}
 723 indicates that Pintos tried to access a file through an inode that has
 724 been closed and freed.  Freeing an inode clears its starting sector
 725 number to @t{0xcccccccc}, which is not a valid sector number for disks
 726 smaller than about 1.6 TB.