pintos-os.org Git - pintos-anon/blob - doc/debug.texi

   1 @node Debugging Tools
   2 @appendix Debugging Tools
   3
   4 Many tools lie at your disposal for debugging Pintos.  This appendix
   5 introduces you to a few of them.
   6
   7 @menu
   8 * printf::
   9 * ASSERT::
  10 * Function and Parameter Attributes::
  11 * Backtraces::
  12 * GDB::
  13 * Triple Faults::
  14 * Modifying Bochs::
  15 * Debugging Tips::
  16 @end menu
  17
  18 @node printf
  19 @section @code{printf()}
  20
  21 Don't underestimate the value of @func{printf}.  The way
  22 @func{printf} is implemented in Pintos, you can call it from
  23 practically anywhere in the kernel, whether it's in a kernel thread or
  24 an interrupt handler, almost regardless of what locks are held.
  25
  26 @func{printf} is useful for more than just examining data.
  27 It can also help figure out when and where something goes wrong, even
  28 when the kernel crashes or panics without a useful error message.  The
  29 strategy is to sprinkle calls to @func{printf} with different strings
  30 (e.g.@: @code{"<1>"}, @code{"<2>"}, @dots{}) throughout the pieces of
  31 code you suspect are failing.  If you don't even see @code{<1>} printed,
  32 then something bad happened before that point, if you see @code{<1>}
  33 but not @code{<2>}, then something bad happened between those two
  34 points, and so on.  Based on what you learn, you can then insert more
  35 @func{printf} calls in the new, smaller region of code you suspect.
  36 Eventually you can narrow the problem down to a single statement.
  37 @xref{Triple Faults}, for a related technique.
  38
  39 @node ASSERT
  40 @section @code{ASSERT}
  41
  42 Assertions are useful because they can catch problems early, before
  43 they'd otherwise be noticed.  Ideally, each function should begin with a
  44 set of assertions that check its arguments for validity.  (Initializers
  45 for functions' local variables are evaluated before assertions are
  46 checked, so be careful not to assume that an argument is valid in an
  47 initializer.)  You can also sprinkle assertions throughout the body of
  48 functions in places where you suspect things are likely to go wrong.
  49 They are especially useful for checking loop invariants.
  50
  51 Pintos provides the @code{ASSERT} macro, defined in @file{<debug.h>},
  52 for checking assertions.
  53
  54 @defmac ASSERT (expression)
  55 Tests the value of @var{expression}.  If it evaluates to zero (false),
  56 the kernel panics.  The panic message includes the expression that
  57 failed, its file and line number, and a backtrace, which should help you
  58 to find the problem.  @xref{Backtraces}, for more information.
  59 @end defmac
  60
  61 @node Function and Parameter Attributes
  62 @section Function and Parameter Attributes
  63
  64 These macros defined in @file{<debug.h>} tell the compiler special
  65 attributes of a function or function parameter.  Their expansions are
  66 GCC-specific.
  67
  68 @defmac UNUSED
  69 Appended to a function parameter to tell the compiler that the
  70 parameter might not be used within the function.  It suppresses the
  71 warning that would otherwise appear.
  72 @end defmac
  73
  74 @defmac NO_RETURN
  75 Appended to a function prototype to tell the compiler that the
  76 function never returns.  It allows the compiler to fine-tune its
  77 warnings and its code generation.
  78 @end defmac
  79
  80 @defmac NO_INLINE
  81 Appended to a function prototype to tell the compiler to never emit
  82 the function in-line.  Occasionally useful to improve the quality of
  83 backtraces (see below).
  84 @end defmac
  85
  86 @defmac PRINTF_FORMAT (@var{format}, @var{first})
  87 Appended to a function prototype to tell the compiler that the function
  88 takes a @func{printf}-like format string as the argument numbered
  89 @var{format} (starting from 1) and that the corresponding value
  90 arguments start at the argument numbered @var{first}.  This lets the
  91 compiler tell you if you pass the wrong argument types.
  92 @end defmac
  93
  94 @node Backtraces
  95 @section Backtraces
  96
  97 When the kernel panics, it prints a ``backtrace,'' that is, a summary
  98 of how your program got where it is, as a list of addresses inside the
  99 functions that were running at the time of the panic.  You can also
 100 insert a call to @func{debug_backtrace}, prototyped in
 101 @file{<debug.h>}, to print a backtrace at any point in your code.
 102 @func{debug_backtrace_all}, also declared in @file{<debug.h>},
 103 prints backtraces of all threads.
 104
 105 The addresses in a backtrace are listed as raw hexadecimal numbers,
 106 which are difficult to interpret.  We provide a tool called
 107 @command{backtrace} to translate these into function names and source
 108 file line numbers.
 109 Give it the name of your @file{kernel.o} as the first argument and the
 110 hexadecimal numbers composing the backtrace (including the @samp{0x}
 111 prefixes) as the remaining arguments.  It outputs the function name
 112 and source file line numbers that correspond to each address.
 113
 114 If the translated form of a backtrace is garbled, or doesn't make
 115 sense (e.g.@: function A is listed above function B, but B doesn't
 116 call A), then it's a good sign that you're corrupting a kernel
 117 thread's stack, because the backtrace is extracted from the stack.
 118 Alternatively, it could be that the @file{kernel.o} you passed to
 119 @command{backtrace} is not the same kernel that produced
 120 the backtrace.
 121
 122 Sometimes backtraces can be confusing without any corruption.
 123 Compiler optimizations can cause surprising behavior.  When a function
 124 has called another function as its final action (a @dfn{tail call}), the
 125 calling function may not appear in a backtrace at all.  Similarly, when
 126 function A calls another function B that never returns, the compiler may
 127 optimize such that an unrelated function C appears in the backtrace
 128 instead of A.  Function C is simply the function that happens to be in
 129 memory just after A.  In the threads project, this is commonly seen in
 130 backtraces for test failures; see @ref{The pass function fails, ,
 131 @func{pass} Fails}, for more information.
 132
 133 @menu
 134 * Backtrace Example::
 135 @end menu
 136
 137 @node Backtrace Example
 138 @subsection Example
 139
 140 Here's an example.  Suppose that Pintos printed out this following call
 141 stack, which is taken from an actual Pintos submission for the file
 142 system project:
 143
 144 @example
 145 Call stack: 0xc0106eff 0xc01102fb 0xc010dc22 0xc010cf67 0xc0102319
 146 0xc010325a 0x804812c 0x8048a96 0x8048ac8.
 147 @end example
 148
 149 You would then invoke the @command{backtrace} utility like shown below,
 150 cutting and pasting the backtrace information into the command line.
 151 This assumes that @file{kernel.o} is in the current directory.  You
 152 would of course enter all of the following on a single shell command
 153 line, even though that would overflow our margins here:
 154
 155 @example
 156 backtrace kernel.o 0xc0106eff 0xc01102fb 0xc010dc22 0xc010cf67
 157 0xc0102319 0xc010325a 0x804812c 0x8048a96 0x8048ac8
 158 @end example
 159
 160 The backtrace output would then look something like this:
 161
 162 @example
 163 0xc0106eff: debug_panic (lib/debug.c:86)
 164 0xc01102fb: file_seek (filesys/file.c:405)
 165 0xc010dc22: seek (userprog/syscall.c:744)
 166 0xc010cf67: syscall_handler (userprog/syscall.c:444)
 167 0xc0102319: intr_handler (threads/interrupt.c:334)
 168 0xc010325a: intr_entry (threads/intr-stubs.S:38)
 169 0x0804812c: (unknown)
 170 0x08048a96: (unknown)
 171 0x08048ac8: (unknown)
 172 @end example
 173
 174 (You will probably not see exactly the same addresses if you run the
 175 command above on your own kernel binary, because the source code you
 176 compiled and the compiler you used are probably different.)
 177
 178 The first line in the backtrace refers to @func{debug_panic}, the
 179 function that implements kernel panics.  Because backtraces commonly
 180 result from kernel panics, @func{debug_panic} will often be the first
 181 function shown in a backtrace.
 182
 183 The second line shows @func{file_seek} as the function that panicked,
 184 in this case as the result of an assertion failure.  In the source code
 185 tree used for this example, line 405 of @file{filesys/file.c} is the
 186 assertion
 187
 188 @example
 189 ASSERT (file_ofs >= 0);
 190 @end example
 191
 192 @noindent
 193 (This line was also cited in the assertion failure message.)
 194 Thus, @func{file_seek} panicked because it passed a negative file offset
 195 argument.
 196
 197 The third line indicates that @func{seek} called @func{file_seek},
 198 presumably without validating the offset argument.  In this submission,
 199 @func{seek} implements the @code{seek} system call.
 200
 201 The fourth line shows that @func{syscall_handler}, the system call
 202 handler, invoked @func{seek}.
 203
 204 The fifth and sixth lines are the interrupt handler entry path.
 205
 206 The remaining lines are for addresses below @code{PHYS_BASE}.  This
 207 means that they refer to addresses in the user program, not in the
 208 kernel.  If you know what user program was running when the kernel
 209 panicked, you can re-run @command{backtrace} on the user program, like
 210 so: (typing the command on a single line, of course):
 211
 212 @example
 213 backtrace tests/filesys/extended/grow-too-big 0xc0106eff 0xc01102fb
 214 0xc010dc22 0xc010cf67 0xc0102319 0xc010325a 0x804812c 0x8048a96
 215 0x8048ac8
 216 @end example
 217
 218 The results look like this:
 219
 220 @example
 221 0xc0106eff: (unknown)
 222 0xc01102fb: (unknown)
 223 0xc010dc22: (unknown)
 224 0xc010cf67: (unknown)
 225 0xc0102319: (unknown)
 226 0xc010325a: (unknown)
 227 0x0804812c: test_main (...xtended/grow-too-big.c:20)
 228 0x08048a96: main (tests/main.c:10)
 229 0x08048ac8: _start (lib/user/entry.c:9)
 230 @end example
 231
 232 You can even specify both the kernel and the user program names on
 233 the command line, like so:
 234
 235 @example
 236 backtrace kernel.o tests/filesys/extended/grow-too-big 0xc0106eff
 237 0xc01102fb 0xc010dc22 0xc010cf67 0xc0102319 0xc010325a 0x804812c
 238 0x8048a96 0x8048ac8
 239 @end example
 240
 241 The result is a combined backtrace:
 242
 243 @example
 244 In kernel.o:
 245 0xc0106eff: debug_panic (lib/debug.c:86)
 246 0xc01102fb: file_seek (filesys/file.c:405)
 247 0xc010dc22: seek (userprog/syscall.c:744)
 248 0xc010cf67: syscall_handler (userprog/syscall.c:444)
 249 0xc0102319: intr_handler (threads/interrupt.c:334)
 250 0xc010325a: intr_entry (threads/intr-stubs.S:38)
 251 In tests/filesys/extended/grow-too-big:
 252 0x0804812c: test_main (...xtended/grow-too-big.c:20)
 253 0x08048a96: main (tests/main.c:10)
 254 0x08048ac8: _start (lib/user/entry.c:9)
 255 @end example
 256
 257 Here's an extra tip for anyone who read this far: @command{backtrace}
 258 is smart enough to strip the @code{Call stack:} header and @samp{.}
 259 trailer from the command line if you include them.  This can save you
 260 a little bit of trouble in cutting and pasting.  Thus, the following
 261 command prints the same output as the first one we used:
 262
 263 @example
 264 backtrace kernel.o Call stack: 0xc0106eff 0xc01102fb 0xc010dc22
 265 0xc010cf67 0xc0102319 0xc010325a 0x804812c 0x8048a96 0x8048ac8.
 266 @end example
 267
 268 @node GDB
 269 @section GDB
 270
 271 You can run Pintos under the supervision of the GDB debugger.
 272 First, start Pintos with the @option{--gdb} option, e.g.@:
 273 @command{pintos --gdb -- run mytest}.  Second, open a second terminal on
 274 the same machine and
 275 use @command{pintos-gdb} to invoke GDB on
 276 @file{kernel.o}:@footnote{@command{pintos-gdb} is a wrapper around
 277 @command{gdb} (80@var{x}86) or @command{i386-elf-gdb} (SPARC) that loads
 278 the Pintos macros at startup.}
 279 @example
 280 pintos-gdb kernel.o
 281 @end example
 282 @noindent and issue the following GDB command:
 283 @example
 284 target remote localhost:1234
 285 @end example
 286
 287 Now GDB is connected to the simulator over a local
 288 network connection.  You can now issue any normal GDB
 289 commands.  If you issue the @samp{c} command, the simulated BIOS will take
 290 control, load Pintos, and then Pintos will run in the usual way.  You
 291 can pause the process at any point with @key{Ctrl+C}.
 292
 293 @menu
 294 * Using GDB::
 295 * Example GDB Session::
 296 * Debugging User Programs::
 297 * GDB FAQ::
 298 @end menu
 299
 300 @node Using GDB
 301 @subsection Using GDB
 302
 303 You can read the GDB manual by typing @code{info gdb} at a
 304 terminal command prompt.  Here's a few commonly useful GDB commands:
 305
 306 @deffn {GDB Command} c
 307 Continues execution until @key{Ctrl+C} or the next breakpoint.
 308 @end deffn
 309
 310 @deffn {GDB Command} break function
 311 @deffnx {GDB Command} break file:line
 312 @deffnx {GDB Command} break *address
 313 Sets a breakpoint at @var{function}, at @var{line} within @var{file}, or
 314 @var{address}.
 315 (Use a @samp{0x} prefix to specify an address in hex.)
 316
 317 Use @code{break main} to make GDB stop when Pintos starts running.
 318 @end deffn
 319
 320 @deffn {GDB Command} p expression
 321 Evaluates the given @var{expression} and prints its value.
 322 If the expression contains a function call, that function will actually
 323 be executed.
 324 @end deffn
 325
 326 @deffn {GDB Command} l *address
 327 Lists a few lines of code around @var{address}.
 328 (Use a @samp{0x} prefix to specify an address in hex.)
 329 @end deffn
 330
 331 @deffn {GDB Command} bt
 332 Prints a stack backtrace similar to that output by the
 333 @command{backtrace} program described above.
 334 @end deffn
 335
 336 @deffn {GDB Command} p/a address
 337 Prints the name of the function or variable that occupies @var{address}.
 338 (Use a @samp{0x} prefix to specify an address in hex.)
 339 @end deffn
 340
 341 @deffn {GDB Command} diassemble function
 342 Disassembles @var{function}.
 343 @end deffn
 344
 345 We also provide a set of macros specialized for debugging Pintos,
 346 written by Godmar Back @email{gback@@cs.vt.edu}.  You can type
 347 @code{help user-defined} for basic help with the macros.  Here is an
 348 overview of their functionality, based on Godmar's documentation:
 349
 350 @deffn {GDB Macro} debugpintos
 351 Attach debugger to a waiting pintos process on the same machine.
 352 Shorthand for @code{target remote localhost:1234}.
 353 @end deffn
 354
 355 @deffn {GDB Macro} dumplist list type element
 356 Prints the elements of @var{list}, which should be a @code{struct} list
 357 that contains elements of the given @var{type} (without the word
 358 @code{struct}) in which @var{element} is the @struct{list_elem} member
 359 that links the elements.
 360
 361 Example: @code{dumplist all_list thread all_elem} prints all elements of
 362 @struct{thread} that are linked in @code{struct list all_list} using the
 363 @code{struct list_elem all_elem} which is part of @struct{thread}.
 364 (This assumes that you have added @code{all_list} and @code{all_elem}
 365 yourself.)
 366 @end deffn
 367
 368 @deffn {GDB Macro} btthread thread
 369 Shows the backtrace of @var{thread}, which is a pointer to the
 370 @struct{thread} of the thread whose backtrace it should show.  For the
 371 current thread, this is identical to the @code{bt} (backtrace) command.
 372 It also works for any thread suspended in @func{schedule},
 373 provided you know where its kernel stack page is located.
 374 @end deffn
 375
 376 @deffn {GDB Macro} btthreadlist list element
 377 Shows the backtraces of all threads in @var{list}, the @struct{list} in
 378 which the threads are kept.  Specify @var{element} as the
 379 @struct{list_elem} field used inside @struct{thread} to link the threads
 380 together.
 381
 382 Example: @code{btthreadlist all_list all_elem} shows the backtraces of
 383 all threads contained in @code{struct list all_list}, linked together by
 384 @code{all_elem}.  This command is useful to determine where your threads
 385 are stuck when a deadlock occurs.  Please see the example scenario below.
 386 (This assumes that you have added @code{all_list} and @code{all_elem}
 387 yourself.)
 388 @end deffn
 389
 390 @deffn {GDB Macro} btpagefault
 391 Print a backtrace of the current thread after a page fault exception.
 392 Normally, when a page fault exception occurs, GDB will stop
 393 with a message that might say:
 394
 395 @example
 396 Program received signal 0, Signal 0.
 397 0xc0102320 in intr0e_stub ()
 398 @end example
 399
 400 In that case, the @code{bt} command might not give a useful
 401 backtrace.  Use @code{btpagefault} instead.
 402
 403 You may also use @code{btpagefault} for page faults that occur in a user
 404 process.  In this case, you may also wish to load the user program's
 405 symbol table (@pxref{Debugging User Programs}).
 406 @end deffn
 407
 408 @deffn {GDB Macro} hook-stop
 409 GDB invokes this macro every time the simulation stops, which Bochs will
 410 do for every processor exception, among other reasons.  If the
 411 simulation stops due to a page fault, @code{hook-stop} will print a
 412 message that says and explains further whether the page fault occurred
 413 in the kernel or in user code.
 414
 415 If the exception occurred from user code, @code{hook-stop} will say:
 416 @example
 417 pintos-debug: a page fault exception occurred in user mode
 418 pintos-debug: hit 'c' to continue, or 's' to step to intr_handler
 419 @end example
 420
 421 In Project 2, a page fault in a user process leads to the termination of
 422 the process.  You should expect those page faults to occur in the
 423 robustness tests where we test that your kernel properly terminates
 424 processes that try to access invalid addresses.  To debug those, set a
 425 break point in @func{page_fault} in @file{exception.c}, which you will
 426 need to modify accordingly.
 427
 428 In Project 3, a page fault in a user process no longer automatically
 429 leads to the termination of a process.  Instead, it may require reading in
 430 data for the page the process was trying to access, either
 431 because it was swapped out or because this is the first time it's
 432 accessed.  In either case, you will reach @func{page_fault} and need to
 433 take the appropriate action there.
 434
 435 If the page fault did not occur in user mode while executing a user
 436 process, then it occurred in kernel mode while executing kernel code.
 437 In this case, @code{hook-stop} will print this message:
 438 @example
 439 pintos-debug: a page fault occurred in kernel mode
 440 @end example
 441 followed by the output of the @code{btpagefault} command.
 442
 443 Before Project 3, a page fault exception in kernel code is always a bug
 444 in your kernel, because your kernel should never crash.  Starting with
 445 Project 3, the situation will change if you use @func{get_user} and
 446 @func{put_user} strategy to verify user memory accesses
 447 (@pxref{Accessing User Memory}).
 448
 449 If you don't want GDB to stop for page faults, then issue the command
 450 @code{handle SIGSEGV nostop}.  GDB will still print a message for
 451 every page fault, but it will not come back to a command prompt.
 452 @end deffn
 453
 454 @node Example GDB Session
 455 @subsection Example GDB Session
 456
 457 This section narrates a sample GDB session, provided by Godmar Back.
 458 This example illustrates how one might debug a Project 1 solution in
 459 which occasionally a thread that calls @func{timer_sleep} is not woken
 460 up.  With this bug, tests such as @code{mlfqs_load_1} get stuck.
 461
 462 This session was captured with a slightly older version of Bochs and the
 463 GDB macros for Pintos, so it looks slightly different than it would now.
 464 Program output is shown in normal type, user input in @strong{strong}
 465 type.
 466
 467 First, I start Pintos:
 468
 469 @smallexample
 470 $ @strong{pintos -v --gdb -- -q -mlfqs run mlfqs-load-1}
 471 Writing command line to /tmp/gDAlqTB5Uf.dsk...
 472 bochs -q
 473 ========================================================================
 474                        Bochs x86 Emulator 2.2.5
 475              Build from CVS snapshot on December 30, 2005
 476 ========================================================================
 477 00000000000i[     ] reading configuration from bochsrc.txt
 478 00000000000i[     ] Enabled gdbstub
 479 00000000000i[     ] installing nogui module as the Bochs GUI
 480 00000000000i[     ] using log file bochsout.txt
 481 Waiting for gdb connection on localhost:1234
 482 @end smallexample
 483
 484 @noindent Then, I open a second window on the same machine and start GDB:
 485
 486 @smallexample
 487 $ @strong{pintos-gdb kernel.o}
 488 GNU gdb Red Hat Linux (6.3.0.0-1.84rh)
 489 Copyright 2004 Free Software Foundation, Inc.
 490 GDB is free software, covered by the GNU General Public License, and you are
 491 welcome to change it and/or distribute copies of it under certain conditions.
 492 Type "show copying" to see the conditions.
 493 There is absolutely no warranty for GDB.  Type "show warranty" for details.
 494 This GDB was configured as "i386-redhat-linux-gnu"...
 495 Using host libthread_db library "/lib/libthread_db.so.1".
 496 @end smallexample
 497
 498 @noindent Then, I tell GDB to attach to the waiting Pintos emulator:
 499
 500 @smallexample
 501 (gdb) @strong{debugpintos}
 502 Remote debugging using localhost:1234
 503 0x0000fff0 in ?? ()
 504 Reply contains invalid hex digit 78
 505 @end smallexample
 506
 507 @noindent Now I tell Pintos to run by executing @code{c} (short for
 508 @code{continue}) twice:
 509
 510 @smallexample
 511 (gdb) @strong{c}
 512 Continuing.
 513 Reply contains invalid hex digit 78
 514 (gdb) @strong{c}
 515 Continuing.
 516 @end smallexample
 517
 518 @noindent Now Pintos will continue and output:
 519
 520 @smallexample
 521 Pintos booting with 4,096 kB RAM...
 522 Kernel command line: -q -mlfqs run mlfqs-load-1
 523 374 pages available in kernel pool.
 524 373 pages available in user pool.
 525 Calibrating timer...  102,400 loops/s.
 526 Boot complete.
 527 Executing 'mlfqs-load-1':
 528 (mlfqs-load-1) begin
 529 (mlfqs-load-1) spinning for up to 45 seconds, please wait...
 530 (mlfqs-load-1) load average rose to 0.5 after 42 seconds
 531 (mlfqs-load-1) sleeping for another 10 seconds, please wait...
 532 @end smallexample
 533
 534 @noindent
 535 @dots{}until it gets stuck because of the bug I had introduced.  I hit
 536 @key{Ctrl+C} in the debugger window:
 537
 538 @smallexample
 539 Program received signal 0, Signal 0.
 540 0xc010168c in next_thread_to_run () at ../../threads/thread.c:649
 541 649       while (i <= PRI_MAX && list_empty (&ready_list[i]))
 542 (gdb)
 543 @end smallexample
 544
 545 @noindent
 546 The thread that was running when I interrupted Pintos was the idle
 547 thread.  If I run @code{backtrace}, it shows this backtrace:
 548
 549 @smallexample
 550 (gdb) @strong{bt}
 551 #0  0xc010168c in next_thread_to_run () at ../../threads/thread.c:649
 552 #1  0xc0101778 in schedule () at ../../threads/thread.c:714
 553 #2  0xc0100f8f in thread_block () at ../../threads/thread.c:324
 554 #3  0xc0101419 in idle (aux=0x0) at ../../threads/thread.c:551
 555 #4  0xc010145a in kernel_thread (function=0xc01013ff , aux=0x0)
 556     at ../../threads/thread.c:575
 557 #5  0x00000000 in ?? ()
 558 @end smallexample
 559
 560 @noindent
 561 Not terribly useful.  What I really like to know is what's up with the
 562 other thread (or threads).  Since I keep all threads in a linked list
 563 called @code{all_list}, linked together by a @struct{list_elem} member
 564 named @code{all_elem}, I can use the @code{btthreadlist} macro from the
 565 macro library I wrote.  @code{btthreadlist} iterates through the list of
 566 threads and prints the backtrace for each thread:
 567
 568 @smallexample
 569 (gdb) @strong{btthreadlist all_list all_elem}
 570 pintos-debug: dumping backtrace of thread 'main' @@0xc002f000
 571 #0  0xc0101820 in schedule () at ../../threads/thread.c:722
 572 #1  0xc0100f8f in thread_block () at ../../threads/thread.c:324
 573 #2  0xc0104755 in timer_sleep (ticks=1000) at ../../devices/timer.c:141
 574 #3  0xc010bf7c in test_mlfqs_load_1 () at ../../tests/threads/mlfqs-load-1.c:49
 575 #4  0xc010aabb in run_test (name=0xc0007d8c "mlfqs-load-1")
 576     at ../../tests/threads/tests.c:50
 577 #5  0xc0100647 in run_task (argv=0xc0110d28) at ../../threads/init.c:281
 578 #6  0xc0100721 in run_actions (argv=0xc0110d28) at ../../threads/init.c:331
 579 #7  0xc01000c7 in main () at ../../threads/init.c:140
 580
 581 pintos-debug: dumping backtrace of thread 'idle' @@0xc0116000
 582 #0  0xc010168c in next_thread_to_run () at ../../threads/thread.c:649
 583 #1  0xc0101778 in schedule () at ../../threads/thread.c:714
 584 #2  0xc0100f8f in thread_block () at ../../threads/thread.c:324
 585 #3  0xc0101419 in idle (aux=0x0) at ../../threads/thread.c:551
 586 #4  0xc010145a in kernel_thread (function=0xc01013ff , aux=0x0)
 587     at ../../threads/thread.c:575
 588 #5  0x00000000 in ?? ()
 589 @end smallexample
 590
 591 @noindent
 592 In this case, there are only two threads, the idle thread and the main
 593 thread.  The kernel stack pages (to which the @struct{thread} points)
 594 are at @t{0xc0116000} and @t{0xc002f000}, respectively.  The main thread
 595 is stuck in @func{timer_sleep}, called from @code{test_mlfqs_load_1}.
 596
 597 Knowing where threads are stuck can be tremendously useful, for instance
 598 when diagnosing deadlocks or unexplained hangs.
 599
 600 @node Debugging User Programs
 601 @subsection Debugging User Programs
 602
 603 You can also use GDB to debug a user program running under
 604 Pintos.  Start by issuing this GDB command to load the
 605 program's symbol table:
 606 @example
 607 add-symbol-file @var{program}
 608 @end example
 609 @noindent
 610 where @var{program} is the name of the program's executable (in the host
 611 file system, not in the Pintos file system).  After this, you should be
 612 able to debug the user program the same way you would the kernel, by
 613 placing breakpoints, inspecting data, etc.  Your actions apply to every
 614 user program running in Pintos, not just to the one you want to debug,
 615 so be careful in interpreting the results.  Also, a name that appears in
 616 both the kernel and the user program will actually refer to the kernel
 617 name.  (The latter problem can be avoided by giving the user executable
 618 name on the GDB command line, instead of @file{kernel.o}, and then using
 619 @code{add-symbol-file} to load @file{kernel.o}.)
 620
 621 @node GDB FAQ
 622 @subsection FAQ
 623
 624 @table @asis
 625 @item GDB can't connect to Bochs.
 626
 627 If the @command{target remote} command fails, then make sure that both
 628 GDB and @command{pintos} are running on the same machine by
 629 running @command{hostname} in each terminal.  If the names printed
 630 differ, then you need to open a new terminal for GDB on the
 631 machine running @command{pintos}.
 632
 633 @item GDB doesn't recognize any of the macros.
 634
 635 If you start GDB with @command{pintos-gdb}, it should load the Pintos
 636 macros automatically.  If you start GDB some other way, then you must
 637 issue the command @code{source @var{pintosdir}/src/misc/gdb-macros},
 638 where @var{pintosdir} is the root of your Pintos directory, before you
 639 can use them.
 640
 641 @item Can I debug Pintos with DDD?
 642
 643 Yes, you can.  DDD invokes GDB as a subprocess, so you'll need to tell
 644 it to invokes @command{pintos-gdb} instead:
 645 @example
 646 ddd --gdb --debugger pintos-gdb
 647 @end example
 648
 649 @item Can I use GDB inside Emacs?
 650
 651 Yes, you can.  Emacs has special support for running GDB as a
 652 subprocess.  Type @kbd{M-x gdb} and enter your @command{pintos-gdb}
 653 command at the prompt.  The Emacs manual has information on how to use
 654 its debugging features in a section titled ``Debuggers.''
 655
 656 @item GDB is doing something weird.
 657
 658 If you notice strange behavior while using GDB, there
 659 are three possibilities: a bug in your
 660 modified Pintos, a bug in Bochs's
 661 interface to GDB or in GDB itself, or
 662 a bug in the original Pintos code.  The first and second
 663 are quite likely, and you should seriously consider both.  We hope
 664 that the third is less likely, but it is also possible.
 665 @end table
 666
 667 @node Triple Faults
 668 @section Triple Faults
 669
 670 When a CPU exception handler, such as a page fault handler, cannot be
 671 invoked because it is missing or defective, the CPU will try to invoke
 672 the ``double fault'' handler.  If the double fault handler is itself
 673 missing or defective, that's called a ``triple fault.''  A triple fault
 674 causes an immediate CPU reset.
 675
 676 Thus, if you get yourself into a situation where the machine reboots in
 677 a loop, that's probably a ``triple fault.''  In a triple fault
 678 situation, you might not be able to use @func{printf} for debugging,
 679 because the reboots might be happening even before everything needed for
 680 @func{printf} is initialized.
 681
 682 There are at least two ways to debug triple faults.  First, you can run
 683 Pintos in Bochs under GDB (@pxref{GDB}).  If Bochs has been built
 684 properly for Pintos, a triple fault under GDB will cause it to print the
 685 message ``Triple fault: stopping for gdb'' on the console and break into
 686 the debugger.  (If Bochs is not running under GDB, a triple fault will
 687 still cause it to reboot.)  You can then inspect where Pintos stopped,
 688 which is where the triple fault occurred.
 689
 690 Another option is what I call ``debugging by infinite loop.''
 691 Pick a place in the Pintos code, insert the infinite loop
 692 @code{for (;;);} there, and recompile and run.  There are two likely
 693 possibilities:
 694
 695 @itemize @bullet
 696 @item
 697 The machine hangs without rebooting.  If this happens, you know that
 698 the infinite loop is running.  That means that whatever caused the
 699 reboot must be @emph{after} the place you inserted the infinite loop.
 700 Now move the infinite loop later in the code sequence.
 701
 702 @item
 703 The machine reboots in a loop.  If this happens, you know that the
 704 machine didn't make it to the infinite loop.  Thus, whatever caused the
 705 reboot must be @emph{before} the place you inserted the infinite loop.
 706 Now move the infinite loop earlier in the code sequence.
 707 @end itemize
 708
 709 If you move around the infinite loop in a ``binary search'' fashion, you
 710 can use this technique to pin down the exact spot that everything goes
 711 wrong.  It should only take a few minutes at most.
 712
 713 @node Modifying Bochs
 714 @section Modifying Bochs
 715
 716 An advanced debugging technique is to modify and recompile the
 717 simulator.  This proves useful when the simulated hardware has more
 718 information than it makes available to the OS.  For example, page
 719 faults have a long list of potential causes, but the hardware does not
 720 report to the OS exactly which one is the particular cause.
 721 Furthermore, a bug in the kernel's handling of page faults can easily
 722 lead to recursive faults, but a ``triple fault'' will cause the CPU to
 723 reset itself, which is hardly conducive to debugging.
 724
 725 In a case like this, you might appreciate being able to make Bochs
 726 print out more debug information, such as the exact type of fault that
 727 occurred.  It's not very hard.  You start by retrieving the source
 728 code for Bochs 2.2.6 from @uref{http://bochs.sourceforge.net} and
 729 saving the file @file{bochs-2.2.6.tar.gz} into a directory.
 730 The script @file{pintos/src/misc/bochs-2.2.6-build.sh}
 731 applies a number of patches contained in @file{pintos/src/misc}
 732 to the Bochs tree, then builds Bochs and installs it in a directory
 733 of your choice.
 734 Run this script without arguments to learn usage instructions.
 735 To use your @file{bochs} binary with @command{pintos},
 736 put it in your @env{PATH}, and make sure that it is earlier than
 737 @file{@value{localpintosbindir}/bochs}.
 738
 739 Of course, to get any good out of this you'll have to actually modify
 740 Bochs.  Instructions for doing this are firmly out of the scope of
 741 this document.  However, if you want to debug page faults as suggested
 742 above, a good place to start adding @func{printf}s is
 743 @func{BX_CPU_C::dtranslate_linear} in @file{cpu/paging.cc}.
 744
 745 @node Debugging Tips
 746 @section Tips
 747
 748 The page allocator in @file{threads/palloc.c} and the block allocator in
 749 @file{threads/malloc.c} clear all the bytes in memory to
 750 @t{0xcc} at time of free.  Thus, if you see an attempt to
 751 dereference a pointer like @t{0xcccccccc}, or some other reference to
 752 @t{0xcc}, there's a good chance you're trying to reuse a page that's
 753 already been freed.  Also, byte @t{0xcc} is the CPU opcode for ``invoke
 754 interrupt 3,'' so if you see an error like @code{Interrupt 0x03 (#BP
 755 Breakpoint Exception)}, then Pintos tried to execute code in a freed page or
 756 block.
 757
 758 An assertion failure on the expression @code{sec_no < d->capacity}
 759 indicates that Pintos tried to access a file through an inode that has
 760 been closed and freed.  Freeing an inode clears its starting sector
 761 number to @t{0xcccccccc}, which is not a valid sector number for disks
 762 smaller than about 1.6 TB.