pintos-os.org Git - pintos-anon/blob - doc/debug.texi

   1 @node Debugging Tools
   2 @appendix Debugging Tools
   3
   4 Many tools lie at your disposal for debugging Pintos.  This appendix
   5 introduces you to a few of them.
   6
   7 @menu
   8 * printf::
   9 * ASSERT::
  10 * Function and Parameter Attributes::
  11 * Backtraces::
  12 * GDB::
  13 * Triple Faults::
  14 * Modifying Bochs::
  15 * Debugging Tips::
  16 @end menu
  17
  18 @node printf
  19 @section @code{printf()}
  20
  21 Don't underestimate the value of @func{printf}.  The way
  22 @func{printf} is implemented in Pintos, you can call it from
  23 practically anywhere in the kernel, whether it's in a kernel thread or
  24 an interrupt handler, almost regardless of what locks are held.
  25
  26 @func{printf} is useful for more than just examining data.
  27 It can also help figure out when and where something goes wrong, even
  28 when the kernel crashes or panics without a useful error message.  The
  29 strategy is to sprinkle calls to @func{printf} with different strings
  30 (e.g.@: @code{"<1>"}, @code{"<2>"}, @dots{}) throughout the pieces of
  31 code you suspect are failing.  If you don't even see @code{<1>} printed,
  32 then something bad happened before that point, if you see @code{<1>}
  33 but not @code{<2>}, then something bad happened between those two
  34 points, and so on.  Based on what you learn, you can then insert more
  35 @func{printf} calls in the new, smaller region of code you suspect.
  36 Eventually you can narrow the problem down to a single statement.
  37 @xref{Triple Faults}, for a related technique.
  38
  39 @node ASSERT
  40 @section @code{ASSERT}
  41
  42 Assertions are useful because they can catch problems early, before
  43 they'd otherwise be noticed.  Ideally, each function should begin with a
  44 set of assertions that check its arguments for validity.  (Initializers
  45 for functions' local variables are evaluated before assertions are
  46 checked, so be careful not to assume that an argument is valid in an
  47 initializer.)  You can also sprinkle assertions throughout the body of
  48 functions in places where you suspect things are likely to go wrong.
  49 They are especially useful for checking loop invariants.
  50
  51 Pintos provides the @code{ASSERT} macro, defined in @file{<debug.h>},
  52 for checking assertions.
  53
  54 @defmac ASSERT (expression)
  55 Tests the value of @var{expression}.  If it evaluates to zero (false),
  56 the kernel panics.  The panic message includes the expression that
  57 failed, its file and line number, and a backtrace, which should help you
  58 to find the problem.  @xref{Backtraces}, for more information.
  59 @end defmac
  60
  61 @node Function and Parameter Attributes
  62 @section Function and Parameter Attributes
  63
  64 These macros defined in @file{<debug.h>} tell the compiler special
  65 attributes of a function or function parameter.  Their expansions are
  66 GCC-specific.
  67
  68 @defmac UNUSED
  69 Appended to a function parameter to tell the compiler that the
  70 parameter might not be used within the function.  It suppresses the
  71 warning that would otherwise appear.
  72 @end defmac
  73
  74 @defmac NO_RETURN
  75 Appended to a function prototype to tell the compiler that the
  76 function never returns.  It allows the compiler to fine-tune its
  77 warnings and its code generation.
  78 @end defmac
  79
  80 @defmac NO_INLINE
  81 Appended to a function prototype to tell the compiler to never emit
  82 the function in-line.  Occasionally useful to improve the quality of
  83 backtraces (see below).
  84 @end defmac
  85
  86 @defmac PRINTF_FORMAT (@var{format}, @var{first})
  87 Appended to a function prototype to tell the compiler that the function
  88 takes a @func{printf}-like format string as the argument numbered
  89 @var{format} (starting from 1) and that the corresponding value
  90 arguments start at the argument numbered @var{first}.  This lets the
  91 compiler tell you if you pass the wrong argument types.
  92 @end defmac
  93
  94 @node Backtraces
  95 @section Backtraces
  96
  97 When the kernel panics, it prints a ``backtrace,'' that is, a summary
  98 of how your program got where it is, as a list of addresses inside the
  99 functions that were running at the time of the panic.  You can also
 100 insert a call to @func{debug_backtrace}, prototyped in
 101 @file{<debug.h>}, to print a backtrace at any point in your code.
 102
 103 The addresses in a backtrace are listed as raw hexadecimal numbers,
 104 which are difficult to interpret.  We provide a tool called
 105 @command{backtrace} to translate these into function names and source
 106 file line numbers.
 107 Give it the name of your @file{kernel.o} as the first argument and the
 108 hexadecimal numbers composing the backtrace (including the @samp{0x}
 109 prefixes) as the remaining arguments.  It outputs the function name
 110 and source file line numbers that correspond to each address.
 111
 112 If the translated form of a backtrace is garbled, or doesn't make
 113 sense (e.g.@: function A is listed above function B, but B doesn't
 114 call A), then it's a good sign that you're corrupting a kernel
 115 thread's stack, because the backtrace is extracted from the stack.
 116 Alternatively, it could be that the @file{kernel.o} you passed to
 117 @command{backtrace} is not the same kernel that produced
 118 the backtrace.
 119
 120 Sometimes backtraces can be confusing without any corruption.
 121 Compiler optimizations can cause surprising behavior.  When a function
 122 has called another function as its final action (a @dfn{tail call}), the
 123 calling function may not appear in a backtrace at all.  Similarly, when
 124 function A calls another function B that never returns, the compiler may
 125 optimize such that an unrelated function C appears in the backtrace
 126 instead of A.  Function C is simply the function that happens to be in
 127 memory just after A.  In the threads project, this is commonly seen in
 128 backtraces for test failures; see @ref{The pass function fails, ,
 129 @func{pass} Fails}, for more information.
 130
 131 @menu
 132 * Backtrace Example::
 133 @end menu
 134
 135 @node Backtrace Example
 136 @subsection Example
 137
 138 Here's an example.  Suppose that Pintos printed out this following call
 139 stack, which is taken from an actual Pintos submission for the file
 140 system project:
 141
 142 @example
 143 Call stack: 0xc0106eff 0xc01102fb 0xc010dc22 0xc010cf67 0xc0102319
 144 0xc010325a 0x804812c 0x8048a96 0x8048ac8.
 145 @end example
 146
 147 You would then invoke the @command{backtrace} utility like shown below,
 148 cutting and pasting the backtrace information into the command line.
 149 This assumes that @file{kernel.o} is in the current directory.  You
 150 would of course enter all of the following on a single shell command
 151 line, even though that would overflow our margins here:
 152
 153 @example
 154 backtrace kernel.o 0xc0106eff 0xc01102fb 0xc010dc22 0xc010cf67
 155 0xc0102319 0xc010325a 0x804812c 0x8048a96 0x8048ac8
 156 @end example
 157
 158 The backtrace output would then look something like this:
 159
 160 @example
 161 0xc0106eff: debug_panic (lib/debug.c:86)
 162 0xc01102fb: file_seek (filesys/file.c:405)
 163 0xc010dc22: seek (userprog/syscall.c:744)
 164 0xc010cf67: syscall_handler (userprog/syscall.c:444)
 165 0xc0102319: intr_handler (threads/interrupt.c:334)
 166 0xc010325a: intr_entry (threads/intr-stubs.S:38)
 167 0x0804812c: (unknown)
 168 0x08048a96: (unknown)
 169 0x08048ac8: (unknown)
 170 @end example
 171
 172 (You will probably not see exactly the same addresses if you run the
 173 command above on your own kernel binary, because the source code you
 174 compiled and the compiler you used are probably different.)
 175
 176 The first line in the backtrace refers to @func{debug_panic}, the
 177 function that implements kernel panics.  Because backtraces commonly
 178 result from kernel panics, @func{debug_panic} will often be the first
 179 function shown in a backtrace.
 180
 181 The second line shows @func{file_seek} as the function that panicked,
 182 in this case as the result of an assertion failure.  In the source code
 183 tree used for this example, line 405 of @file{filesys/file.c} is the
 184 assertion
 185
 186 @example
 187 ASSERT (file_ofs >= 0);
 188 @end example
 189
 190 @noindent
 191 (This line was also cited in the assertion failure message.)
 192 Thus, @func{file_seek} panicked because it passed a negative file offset
 193 argument.
 194
 195 The third line indicates that @func{seek} called @func{file_seek},
 196 presumably without validating the offset argument.  In this submission,
 197 @func{seek} implements the @code{seek} system call.
 198
 199 The fourth line shows that @func{syscall_handler}, the system call
 200 handler, invoked @func{seek}.
 201
 202 The fifth and sixth lines are the interrupt handler entry path.
 203
 204 The remaining lines are for addresses below @code{PHYS_BASE}.  This
 205 means that they refer to addresses in the user program, not in the
 206 kernel.  If you know what user program was running when the kernel
 207 panicked, you can re-run @command{backtrace} on the user program, like
 208 so: (typing the command on a single line, of course):
 209
 210 @example
 211 backtrace tests/filesys/extended/grow-too-big 0xc0106eff 0xc01102fb
 212 0xc010dc22 0xc010cf67 0xc0102319 0xc010325a 0x804812c 0x8048a96
 213 0x8048ac8
 214 @end example
 215
 216 The results look like this:
 217
 218 @example
 219 0xc0106eff: (unknown)
 220 0xc01102fb: (unknown)
 221 0xc010dc22: (unknown)
 222 0xc010cf67: (unknown)
 223 0xc0102319: (unknown)
 224 0xc010325a: (unknown)
 225 0x0804812c: test_main (...xtended/grow-too-big.c:20)
 226 0x08048a96: main (tests/main.c:10)
 227 0x08048ac8: _start (lib/user/entry.c:9)
 228 @end example
 229
 230 You can even specify both the kernel and the user program names on
 231 the command line, like so:
 232
 233 @example
 234 backtrace kernel.o tests/filesys/extended/grow-too-big 0xc0106eff
 235 0xc01102fb 0xc010dc22 0xc010cf67 0xc0102319 0xc010325a 0x804812c
 236 0x8048a96 0x8048ac8
 237 @end example
 238
 239 The result is a combined backtrace:
 240
 241 @example
 242 In kernel.o:
 243 0xc0106eff: debug_panic (lib/debug.c:86)
 244 0xc01102fb: file_seek (filesys/file.c:405)
 245 0xc010dc22: seek (userprog/syscall.c:744)
 246 0xc010cf67: syscall_handler (userprog/syscall.c:444)
 247 0xc0102319: intr_handler (threads/interrupt.c:334)
 248 0xc010325a: intr_entry (threads/intr-stubs.S:38)
 249 In tests/filesys/extended/grow-too-big:
 250 0x0804812c: test_main (...xtended/grow-too-big.c:20)
 251 0x08048a96: main (tests/main.c:10)
 252 0x08048ac8: _start (lib/user/entry.c:9)
 253 @end example
 254
 255 Here's an extra tip for anyone who read this far: @command{backtrace}
 256 is smart enough to strip the @code{Call stack:} header and @samp{.}
 257 trailer from the command line if you include them.  This can save you
 258 a little bit of trouble in cutting and pasting.  Thus, the following
 259 command prints the same output as the first one we used:
 260
 261 @example
 262 backtrace kernel.o Call stack: 0xc0106eff 0xc01102fb 0xc010dc22
 263 0xc010cf67 0xc0102319 0xc010325a 0x804812c 0x8048a96 0x8048ac8.
 264 @end example
 265
 266 @node GDB
 267 @section GDB
 268
 269 You can run Pintos under the supervision of the GDB debugger.
 270 First, start Pintos with the @option{--gdb} option, e.g.@:
 271 @command{pintos --gdb -- run mytest}.  Second, open a second terminal on
 272 the same machine and
 273 use @command{pintos-gdb} to invoke GDB on
 274 @file{kernel.o}:@footnote{@command{pintos-gdb} is a wrapper around
 275 @command{gdb} (80@var{x}86) or @command{i386-elf-gdb} (SPARC) that loads
 276 the Pintos macros at startup.}
 277 @example
 278 pintos-gdb kernel.o
 279 @end example
 280 @noindent and issue the following GDB command:
 281 @example
 282 target remote localhost:1234
 283 @end example
 284
 285 Now GDB is connected to the simulator over a local
 286 network connection.  You can now issue any normal GDB
 287 commands.  If you issue the @samp{c} command, the simulated BIOS will take
 288 control, load Pintos, and then Pintos will run in the usual way.  You
 289 can pause the process at any point with @key{Ctrl+C}.
 290
 291 @menu
 292 * Using GDB::
 293 * Example GDB Session::
 294 * Debugging User Programs::
 295 * GDB FAQ::
 296 @end menu
 297
 298 @node Using GDB
 299 @subsection Using GDB
 300
 301 You can read the GDB manual by typing @code{info gdb} at a
 302 terminal command prompt.  Here's a few commonly useful GDB commands:
 303
 304 @deffn {GDB Command} c
 305 Continues execution until @key{Ctrl+C} or the next breakpoint.
 306 @end deffn
 307
 308 @deffn {GDB Command} break function
 309 @deffnx {GDB Command} break file:line
 310 @deffnx {GDB Command} break *address
 311 Sets a breakpoint at @var{function}, at @var{line} within @var{file}, or
 312 @var{address}.
 313 (Use a @samp{0x} prefix to specify an address in hex.)
 314
 315 Use @code{break main} to make GDB stop when Pintos starts running.
 316 @end deffn
 317
 318 @deffn {GDB Command} p expression
 319 Evaluates the given @var{expression} and prints its value.
 320 If the expression contains a function call, that function will actually
 321 be executed.
 322 @end deffn
 323
 324 @deffn {GDB Command} l *address
 325 Lists a few lines of code around @var{address}.
 326 (Use a @samp{0x} prefix to specify an address in hex.)
 327 @end deffn
 328
 329 @deffn {GDB Command} bt
 330 Prints a stack backtrace similar to that output by the
 331 @command{backtrace} program described above.
 332 @end deffn
 333
 334 @deffn {GDB Command} p/a address
 335 Prints the name of the function or variable that occupies @var{address}.
 336 (Use a @samp{0x} prefix to specify an address in hex.)
 337 @end deffn
 338
 339 @deffn {GDB Command} diassemble function
 340 Disassembles @var{function}.
 341 @end deffn
 342
 343 We also provide a set of macros specialized for debugging Pintos,
 344 written by Godmar Back @email{gback@@cs.vt.edu}.  You can type
 345 @code{help user-defined} for basic help with the macros.  Here is an
 346 overview of their functionality, based on Godmar's documentation:
 347
 348 @deffn {GDB Macro} debugpintos
 349 Attach debugger to a waiting pintos process on the same machine.
 350 Shorthand for @code{target remote localhost:1234}.
 351 @end deffn
 352
 353 @deffn {GDB Macro} dumplist list type element
 354 Prints the elements of @var{list}, which should be a @code{struct} list
 355 that contains elements of the given @var{type} (without the word
 356 @code{struct}) in which @var{element} is the @struct{list_elem} member
 357 that links the elements.
 358
 359 Example: @code{dumplist all_list thread all_elem} prints all elements of
 360 @struct{thread} that are linked in @code{struct list all_list} using the
 361 @code{struct list_elem all_elem} which is part of @struct{thread}.
 362 (This assumes that you have added @code{all_list} and @code{all_elem}
 363 yourself.)
 364 @end deffn
 365
 366 @deffn {GDB Macro} btthread thread
 367 Shows the backtrace of @var{thread}, which is a pointer to the
 368 @struct{thread} of the thread whose backtrace it should show.  For the
 369 current thread, this is identical to the @code{bt} (backtrace) command.
 370 It also works for any thread suspended in @func{schedule},
 371 provided you know where its kernel stack page is located.
 372 @end deffn
 373
 374 @deffn {GDB Macro} btthreadlist list element
 375 Shows the backtraces of all threads in @var{list}, the @struct{list} in
 376 which the threads are kept.  Specify @var{element} as the
 377 @struct{list_elem} field used inside @struct{thread} to link the threads
 378 together.
 379
 380 Example: @code{btthreadlist all_list all_elem} shows the backtraces of
 381 all threads contained in @code{struct list all_list}, linked together by
 382 @code{all_elem}.  This command is useful to determine where your threads
 383 are stuck when a deadlock occurs.  Please see the example scenario below.
 384 (This assumes that you have added @code{all_list} and @code{all_elem}
 385 yourself.)
 386 @end deffn
 387
 388 @deffn {GDB Macro} btpagefault
 389 Print a backtrace of the current thread after a page fault exception.
 390 Normally, when a page fault exception occurs, GDB will stop
 391 with a message that might say:
 392
 393 @example
 394 Program received signal 0, Signal 0.
 395 0xc0102320 in intr0e_stub ()
 396 @end example
 397
 398 In that case, the @code{bt} command might not give a useful
 399 backtrace.  Use @code{btpagefault} instead.
 400
 401 You may also use @code{btpagefault} for page faults that occur in a user
 402 process.  In this case, you may also wish to load the user program's
 403 symbol table (@pxref{Debugging User Programs}).
 404 @end deffn
 405
 406 @deffn {GDB Macro} hook-stop
 407 GDB invokes this macro every time the simulation stops, which Bochs will
 408 do for every processor exception, among other reasons.  If the
 409 simulation stops due to a page fault, @code{hook-stop} will print a
 410 message that says and explains further whether the page fault occurred
 411 in the kernel or in user code.
 412
 413 If the exception occurred from user code, @code{hook-stop} will say:
 414 @example
 415 pintos-debug: a page fault exception occurred in user mode
 416 pintos-debug: hit 'c' to continue, or 's' to step to intr_handler
 417 @end example
 418
 419 In Project 2, a page fault in a user process leads to the termination of
 420 the process.  You should expect those page faults to occur in the
 421 robustness tests where we test that your kernel properly terminates
 422 processes that try to access invalid addresses.  To debug those, set a
 423 break point in @func{page_fault} in @file{exception.c}, which you will
 424 need to modify accordingly.
 425
 426 In Project 3, a page fault in a user process no longer automatically
 427 leads to the termination of a process.  Instead, it may require reading in
 428 data for the page the process was trying to access, either
 429 because it was swapped out or because this is the first time it's
 430 accessed.  In either case, you will reach @func{page_fault} and need to
 431 take the appropriate action there.
 432
 433 If the page fault did not occur in user mode while executing a user
 434 process, then it occurred in kernel mode while executing kernel code.
 435 In this case, @code{hook-stop} will print this message:
 436 @example
 437 pintos-debug: a page fault occurred in kernel mode
 438 @end example
 439 followed by the output of the @code{btpagefault} command.
 440
 441 Before Project 3, a page fault exception in kernel code is always a bug
 442 in your kernel, because your kernel should never crash.  Starting with
 443 Project 3, the situation will change if you use @func{get_user} and
 444 @func{put_user} strategy to verify user memory accesses
 445 (@pxref{Accessing User Memory}).
 446
 447 If you don't want GDB to stop for page faults, then issue the command
 448 @code{handle SIGSEGV nostop}.  GDB will still print a message for
 449 every page fault, but it will not come back to a command prompt.
 450 @end deffn
 451
 452 @node Example GDB Session
 453 @subsection Example GDB Session
 454
 455 This section narrates a sample GDB session, provided by Godmar Back.
 456 This example illustrates how one might debug a Project 1 solution in
 457 which occasionally a thread that calls @func{timer_sleep} is not woken
 458 up.  With this bug, tests such as @code{mlfqs_load_1} get stuck.
 459
 460 This session was captured with a slightly older version of Bochs and the
 461 GDB macros for Pintos, so it looks slightly different than it would now.
 462 Program output is shown in normal type, user input in @strong{strong}
 463 type.
 464
 465 First, I start Pintos:
 466
 467 @smallexample
 468 $ @strong{pintos -v --gdb -- -q -mlfqs run mlfqs-load-1}
 469 Writing command line to /tmp/gDAlqTB5Uf.dsk...
 470 bochs -q
 471 ========================================================================
 472                        Bochs x86 Emulator 2.2.5
 473              Build from CVS snapshot on December 30, 2005
 474 ========================================================================
 475 00000000000i[     ] reading configuration from bochsrc.txt
 476 00000000000i[     ] Enabled gdbstub
 477 00000000000i[     ] installing nogui module as the Bochs GUI
 478 00000000000i[     ] using log file bochsout.txt
 479 Waiting for gdb connection on localhost:1234
 480 @end smallexample
 481
 482 @noindent Then, I open a second window on the same machine and start GDB:
 483
 484 @smallexample
 485 $ @strong{pintos-gdb kernel.o}
 486 GNU gdb Red Hat Linux (6.3.0.0-1.84rh)
 487 Copyright 2004 Free Software Foundation, Inc.
 488 GDB is free software, covered by the GNU General Public License, and you are
 489 welcome to change it and/or distribute copies of it under certain conditions.
 490 Type "show copying" to see the conditions.
 491 There is absolutely no warranty for GDB.  Type "show warranty" for details.
 492 This GDB was configured as "i386-redhat-linux-gnu"...
 493 Using host libthread_db library "/lib/libthread_db.so.1".
 494 @end smallexample
 495
 496 @noindent Then, I tell GDB to attach to the waiting Pintos emulator:
 497
 498 @smallexample
 499 (gdb) @strong{debugpintos}
 500 Remote debugging using localhost:1234
 501 0x0000fff0 in ?? ()
 502 Reply contains invalid hex digit 78
 503 @end smallexample
 504
 505 @noindent Now I tell Pintos to run by executing @code{c} (short for
 506 @code{continue}) twice:
 507
 508 @smallexample
 509 (gdb) @strong{c}
 510 Continuing.
 511 Reply contains invalid hex digit 78
 512 (gdb) @strong{c}
 513 Continuing.
 514 @end smallexample
 515
 516 @noindent Now Pintos will continue and output:
 517
 518 @smallexample
 519 Pintos booting with 4,096 kB RAM...
 520 Kernel command line: -q -mlfqs run mlfqs-load-1
 521 374 pages available in kernel pool.
 522 373 pages available in user pool.
 523 Calibrating timer...  102,400 loops/s.
 524 Boot complete.
 525 Executing 'mlfqs-load-1':
 526 (mlfqs-load-1) begin
 527 (mlfqs-load-1) spinning for up to 45 seconds, please wait...
 528 (mlfqs-load-1) load average rose to 0.5 after 42 seconds
 529 (mlfqs-load-1) sleeping for another 10 seconds, please wait...
 530 @end smallexample
 531
 532 @noindent
 533 @dots{}until it gets stuck because of the bug I had introduced.  I hit
 534 @key{Ctrl+C} in the debugger window:
 535
 536 @smallexample
 537 Program received signal 0, Signal 0.
 538 0xc010168c in next_thread_to_run () at ../../threads/thread.c:649
 539 649       while (i <= PRI_MAX && list_empty (&ready_list[i]))
 540 (gdb)
 541 @end smallexample
 542
 543 @noindent
 544 The thread that was running when I interrupted Pintos was the idle
 545 thread.  If I run @code{backtrace}, it shows this backtrace:
 546
 547 @smallexample
 548 (gdb) @strong{bt}
 549 #0  0xc010168c in next_thread_to_run () at ../../threads/thread.c:649
 550 #1  0xc0101778 in schedule () at ../../threads/thread.c:714
 551 #2  0xc0100f8f in thread_block () at ../../threads/thread.c:324
 552 #3  0xc0101419 in idle (aux=0x0) at ../../threads/thread.c:551
 553 #4  0xc010145a in kernel_thread (function=0xc01013ff , aux=0x0)
 554     at ../../threads/thread.c:575
 555 #5  0x00000000 in ?? ()
 556 @end smallexample
 557
 558 @noindent
 559 Not terribly useful.  What I really like to know is what's up with the
 560 other thread (or threads).  Since I keep all threads in a linked list
 561 called @code{all_list}, linked together by a @struct{list_elem} member
 562 named @code{all_elem}, I can use the @code{btthreadlist} macro from the
 563 macro library I wrote.  @code{btthreadlist} iterates through the list of
 564 threads and prints the backtrace for each thread:
 565
 566 @smallexample
 567 (gdb) @strong{btthreadlist all_list all_elem}
 568 pintos-debug: dumping backtrace of thread 'main' @@0xc002f000
 569 #0  0xc0101820 in schedule () at ../../threads/thread.c:722
 570 #1  0xc0100f8f in thread_block () at ../../threads/thread.c:324
 571 #2  0xc0104755 in timer_sleep (ticks=1000) at ../../devices/timer.c:141
 572 #3  0xc010bf7c in test_mlfqs_load_1 () at ../../tests/threads/mlfqs-load-1.c:49
 573 #4  0xc010aabb in run_test (name=0xc0007d8c "mlfqs-load-1")
 574     at ../../tests/threads/tests.c:50
 575 #5  0xc0100647 in run_task (argv=0xc0110d28) at ../../threads/init.c:281
 576 #6  0xc0100721 in run_actions (argv=0xc0110d28) at ../../threads/init.c:331
 577 #7  0xc01000c7 in main () at ../../threads/init.c:140
 578
 579 pintos-debug: dumping backtrace of thread 'idle' @@0xc0116000
 580 #0  0xc010168c in next_thread_to_run () at ../../threads/thread.c:649
 581 #1  0xc0101778 in schedule () at ../../threads/thread.c:714
 582 #2  0xc0100f8f in thread_block () at ../../threads/thread.c:324
 583 #3  0xc0101419 in idle (aux=0x0) at ../../threads/thread.c:551
 584 #4  0xc010145a in kernel_thread (function=0xc01013ff , aux=0x0)
 585     at ../../threads/thread.c:575
 586 #5  0x00000000 in ?? ()
 587 @end smallexample
 588
 589 @noindent
 590 In this case, there are only two threads, the idle thread and the main
 591 thread.  The kernel stack pages (to which the @struct{thread} points)
 592 are at @t{0xc0116000} and @t{0xc002f000}, respectively.  The main thread
 593 is stuck in @func{timer_sleep}, called from @code{test_mlfqs_load_1}.
 594
 595 Knowing where threads are stuck can be tremendously useful, for instance
 596 when diagnosing deadlocks or unexplained hangs.
 597
 598 @node Debugging User Programs
 599 @subsection Debugging User Programs
 600
 601 You can also use GDB to debug a user program running under
 602 Pintos.  Start by issuing this GDB command to load the
 603 program's symbol table:
 604 @example
 605 add-symbol-file @var{program}
 606 @end example
 607 @noindent
 608 where @var{program} is the name of the program's executable (in the host
 609 file system, not in the Pintos file system).  After this, you should be
 610 able to debug the user program the same way you would the kernel, by
 611 placing breakpoints, inspecting data, etc.  Your actions apply to every
 612 user program running in Pintos, not just to the one you want to debug,
 613 so be careful in interpreting the results.  Also, a name that appears in
 614 both the kernel and the user program will actually refer to the kernel
 615 name.  (The latter problem can be avoided by giving the user executable
 616 name on the GDB command line, instead of @file{kernel.o}, and then using
 617 @code{add-symbol-file} to load @file{kernel.o}.)
 618
 619 @node GDB FAQ
 620 @subsection FAQ
 621
 622 @table @asis
 623 @item GDB can't connect to Bochs.
 624
 625 If the @command{target remote} command fails, then make sure that both
 626 GDB and @command{pintos} are running on the same machine by
 627 running @command{hostname} in each terminal.  If the names printed
 628 differ, then you need to open a new terminal for GDB on the
 629 machine running @command{pintos}.
 630
 631 @item GDB doesn't recognize any of the macros.
 632
 633 If you start GDB with @command{pintos-gdb}, it should load the Pintos
 634 macros automatically.  If you start GDB some other way, then you must
 635 issue the command @code{source @var{pintosdir}/src/misc/gdb-macros},
 636 where @var{pintosdir} is the root of your Pintos directory, before you
 637 can use them.
 638
 639 @item Can I debug Pintos with DDD?
 640
 641 Yes, you can.  DDD invokes GDB as a subprocess, so you'll need to tell
 642 it to invokes @command{pintos-gdb} instead:
 643 @example
 644 ddd --gdb --debugger pintos-gdb
 645 @end example
 646
 647 @item Can I use GDB inside Emacs?
 648
 649 Yes, you can.  Emacs has special support for running GDB as a
 650 subprocess.  Type @kbd{M-x gdb} and enter your @command{pintos-gdb}
 651 command at the prompt.  The Emacs manual has information on how to use
 652 its debugging features in a section titled ``Debuggers.''
 653
 654 @item GDB is doing something weird.
 655
 656 If you notice strange behavior while using GDB, there
 657 are three possibilities: a bug in your
 658 modified Pintos, a bug in Bochs's
 659 interface to GDB or in GDB itself, or
 660 a bug in the original Pintos code.  The first and second
 661 are quite likely, and you should seriously consider both.  We hope
 662 that the third is less likely, but it is also possible.
 663 @end table
 664
 665 @node Triple Faults
 666 @section Triple Faults
 667
 668 When a CPU exception handler, such as a page fault handler, cannot be
 669 invoked because it is missing or defective, the CPU will try to invoke
 670 the ``double fault'' handler.  If the double fault handler is itself
 671 missing or defective, that's called a ``triple fault.''  A triple fault
 672 causes an immediate CPU reset.
 673
 674 Thus, if you get yourself into a situation where the machine reboots in
 675 a loop, that's probably a ``triple fault.''  In a triple fault
 676 situation, you might not be able to use @func{printf} for debugging,
 677 because the reboots might be happening even before everything needed for
 678 @func{printf} is initialized.
 679
 680 There are at least two ways to debug triple faults.  First, you can run
 681 Pintos in Bochs under GDB (@pxref{GDB}).  If Bochs has been built
 682 properly for Pintos, a triple fault under GDB will cause it to print the
 683 message ``Triple fault: stopping for gdb'' on the console and break into
 684 the debugger.  (If Bochs is not running under GDB, a triple fault will
 685 still cause it to reboot.)  You can then inspect where Pintos stopped,
 686 which is where the triple fault occurred.
 687
 688 Another option is what I call ``debugging by infinite loop.''
 689 Pick a place in the Pintos code, insert the infinite loop
 690 @code{for (;;);} there, and recompile and run.  There are two likely
 691 possibilities:
 692
 693 @itemize @bullet
 694 @item
 695 The machine hangs without rebooting.  If this happens, you know that
 696 the infinite loop is running.  That means that whatever caused the
 697 reboot must be @emph{after} the place you inserted the infinite loop.
 698 Now move the infinite loop later in the code sequence.
 699
 700 @item
 701 The machine reboots in a loop.  If this happens, you know that the
 702 machine didn't make it to the infinite loop.  Thus, whatever caused the
 703 reboot must be @emph{before} the place you inserted the infinite loop.
 704 Now move the infinite loop earlier in the code sequence.
 705 @end itemize
 706
 707 If you move around the infinite loop in a ``binary search'' fashion, you
 708 can use this technique to pin down the exact spot that everything goes
 709 wrong.  It should only take a few minutes at most.
 710
 711 @node Modifying Bochs
 712 @section Modifying Bochs
 713
 714 An advanced debugging technique is to modify and recompile the
 715 simulator.  This proves useful when the simulated hardware has more
 716 information than it makes available to the OS.  For example, page
 717 faults have a long list of potential causes, but the hardware does not
 718 report to the OS exactly which one is the particular cause.
 719 Furthermore, a bug in the kernel's handling of page faults can easily
 720 lead to recursive faults, but a ``triple fault'' will cause the CPU to
 721 reset itself, which is hardly conducive to debugging.
 722
 723 In a case like this, you might appreciate being able to make Bochs
 724 print out more debug information, such as the exact type of fault that
 725 occurred.  It's not very hard.  You start by retrieving the source
 726 code for Bochs 2.2.6 from @uref{http://bochs.sourceforge.net} and
 727 saving the file @file{bochs-2.2.6.tar.gz} into a directory.
 728 The script @file{pintos/src/misc/bochs-2.2.6-build.sh}
 729 applies a number of patches contained in @file{pintos/src/misc}
 730 to the Bochs tree, then builds Bochs and installs it in a directory
 731 of your choice.
 732 Run this script without arguments to learn usage instructions.
 733 To use your @file{bochs} binary with @command{pintos},
 734 put it in your @env{PATH}, and make sure that it is earlier than
 735 @file{@value{localpintosbindir}/bochs}.
 736
 737 Of course, to get any good out of this you'll have to actually modify
 738 Bochs.  Instructions for doing this are firmly out of the scope of
 739 this document.  However, if you want to debug page faults as suggested
 740 above, a good place to start adding @func{printf}s is
 741 @func{BX_CPU_C::dtranslate_linear} in @file{cpu/paging.cc}.
 742
 743 @node Debugging Tips
 744 @section Tips
 745
 746 The page allocator in @file{threads/palloc.c} and the block allocator in
 747 @file{threads/malloc.c} clear all the bytes in memory to
 748 @t{0xcc} at time of free.  Thus, if you see an attempt to
 749 dereference a pointer like @t{0xcccccccc}, or some other reference to
 750 @t{0xcc}, there's a good chance you're trying to reuse a page that's
 751 already been freed.  Also, byte @t{0xcc} is the CPU opcode for ``invoke
 752 interrupt 3,'' so if you see an error like @code{Interrupt 0x03 (#BP
 753 Breakpoint Exception)}, then Pintos tried to execute code in a freed page or
 754 block.
 755
 756 An assertion failure on the expression @code{sec_no < d->capacity}
 757 indicates that Pintos tried to access a file through an inode that has
 758 been closed and freed.  Freeing an inode clears its starting sector
 759 number to @t{0xcccccccc}, which is not a valid sector number for disks
 760 smaller than about 1.6 TB.