fs: Add two LWN links to NFS articles by Neil Brown.
[aple.git] / Debugging.m4
3 All software sucks, be it open-source of proprietary. The only
4 question is what can be done with particular instance of suckage,
5 and that's where having the source matters. -- Al Viro (2004)
7 », __file__)
9 SECTION(«Introduction»)
11 <p> It's safe to bet that every non-trivial program contains bugs.
12 Bugs in the operating system kernel are often fatal in that they
13 lead to a system crash which requires a reboot. However, thanks to
14 the concept of virtual memory, bugs in applications usually affect
15 neither the operating system nor independent processes which happen
16 to run at the same time. This makes user space programs easier to
17 debug than kernel code because the debugging tools will usually run
18 as a separate process. </p>
20 <p> We then look at <code>valgrind</code> and <code>gdb</code>, two popular tools
21 which help to locate bugs in application software. <code>valgrind</code>
22 is easy to use but is also limited because it does not alter the
23 target process. On the other hand, <code>gdb</code> is much more powerful
24 but also infamous for being hard to learn. The exercises aim to get
25 the reader started with both tools. </p>
27 <p> A couple of exercises on <code>gcc</code>, the GNU C compiler, ask the
28 reader to incorporate debugging information into an executable and
29 to see the effect of various diagnostic messages which show up when
30 valid but dubious code is being encountered. Warning messages can
31 be classified into one of two categories. First, there are
32 warnings which are based on static code analysis. These so-called
33 <em>compile-time</em> warnings are printed by the compiler when the
34 executable is being created from its source code. The second approach,
35 called <em>code instrumentation</em>, instructs the compiler to
36 add sanity checks to the executable, along with additional code that
37 prints a warning when one of these checks fails. In contrast to the
38 compile-time warnings, the messages of the second category show up at
39 <em>run-time</em>, and are generated by the compiled program rather
40 than by the compiler. </p>
44 <ul>
45 <li> The <a href="#deref.c",<code>deref.c</code></a> program is
46 flawed because <code>argv[1]</code> will be <code>NULL</code> if
47 the program is invoked with no arguments. What is going to happen
48 in this case? Compile the program (<code>cc deref.c -o deref</code>)
49 and run it (<code>./deref</code>) to confirm. </li>
51 <li> Run the <code>deref</code> program under strace (<code>strace
52 deref</code>) and discuss the meaning of <code>si_addr</code> at the
53 end of the output. </li>
55 <li> Assume the buggy <code>deref</code> program is executed from the
56 wrapper script <a href="#deref.sh"><code>deref.sh</code></a>. Explain
57 why <code>strace deref.sh</code> does not show very useful
58 information. Run <code>strace -f deref.sh</code> compare the output
59 to the output of the first <code>strace</code> command where you ran
60 the binary without the wrapper script. Discuss the implications of
61 your findings with respect to debugging. </li>
64 <li> "Prove" that the <a href="#strerror.c"><code>strerror.c</code></a>
65 program works by compiling and running it, passing <code>1</code>
66 as the only argument. Discuss in how far it is possible to prove
67 correctness of a program by running it. </li>
69 <li> Unfortunately, <code>strerror.c</code> has more flaws than
70 lines. Point out as many as you can. </li>
72 <li> Note that despite <code>strerror.c</code> code is full of bugs,
73 the code compiles cleany. Discuss the reasons and the implications
74 of this fact. </li>
76 <li> Run <code>pinfo gcc</code> and read <code>Invoking GCC->Warning
77 Options</code> to get an idea about the possible options to activate
78 diagnostic messages. </li>
80 <li> Recompile <code>strerror.c</code> with <code>-Wall</code> and
81 explain all warnings shown. </li>
83 <li> Run <code>valgrind strerror 1</code> and explain the message
84 about the "conditional jump". </li>
86 <li> Run <code>valgrind strerror</code> with no arguments and explain
87 the part about the "invalid read". Why is it clear that this part
88 refers to a <code>NULL</code> pointer dereference? </li>
90 <li> Why is it a big difference if you run <code>strerror 2</code>
91 instead of <code>strerror 1</code>? Run <code>valgrind strerror
92 1</code> and <code>valgrind strerror 2</code> to confirm your
93 answer. </li>
95 <li> The <a href="#print_arg1.c"><code>print_arg1.c</code></a> program
96 crashes due to <code>NULL</code> pointer dereference when it is called
97 with no arguments. Compile the program and run it to confirm. </li>
99 <li> Run <code>gdb print_arg1</code>. At the <code>(gdb</code>)
100 prompt, execute the following commands and explain the output: </li>
102 <ul>
103 <li> <code>run</code></li>
104 <li> <code>bt</code> </li>
105 </ul>
107 <li> Run <code>ls -l print_arg1; size print_arg1</code> and
108 discuss the meaning of the <code>text</code>, <code>data</code>
109 and <code>bss</code> columns in the output of the <code>size</code>
110 command. Compile the program again, this time with <code>-g</code>
111 to add debugging information to the executable. Run <code>ls -l
112 print_arg1; size print_arg1</code> again and discuss the difference,
113 in particular the impact on performance due to the presence of the
114 debugging information. </li>
116 <li> Rerun the above <code>gdb</code> commands and note how the
117 debugging information that has been stored in the executable makes
118 the <code>bt</code> output much more useful. Discuss how debugging
119 could be activated for third-party software for which the source code
120 is available. </li>
122 <li> Compile the program another two times with <code>-O0</code>
123 and with <code>-O3</code> to optimize at different levels (where
124 <code>-O0</code> means "don't optimize at all"). Rerun the above
125 <code>gdb</code> commands on either excutable and note the difference
126 regarding <code>print_it(</code>). </li>
128 <li> Compile <a href="#ubsan.c"><code>ubsan.c</code></a> and run the
129 program as follows: <code>./ubsan 123456 123456</code>. Explain why
130 the the result is not the square of <code>123456</code>. Recompile
131 it with <code>-fsanitize=undefined</code>. Then run the program again
132 with the same options. </li>
133 </ul>
137 SUBSECTION(«deref.c»)
139 <pre>
140 #include <stdio.h>
141 #include <string.h>
142 int main(int argc, char **argv)
143 {
144 printf("arg has %zu chars\n", strlen(argv[1]));
145 }
146 </pre>
148 SUBSECTION(«deref.sh»)
150 <pre>
151 #!/bin/sh
152 ./deref
153 </pre>
155 SUBSECTION(«strerror.c»)
157 <pre>
158 #include "stdio.h"
159 #include "stdlib.h"
160 #include "string.h"
161 #include "assert.h"
163 /* print "system error: ", and the error string of a system call number */
164 int main(int argc, char **argv)
165 {
166 unsigned errno, i;
167 char *result = malloc(25); /* 5 * 5 */
168 /* fail early on errors or if no option is given */
169 if (errno && argc == 0)
170 exit(0);
171 errno = atoi(argv[1]);
172 sprintf(result, strerror(errno));
173 printf("system error %d: %s\n", errno, result, argc);
174 }
175 </pre>
177 SUBSECTION(«print_arg1.c»)
179 <pre>
180 #include <stdio.h>
181 #include <stdlib.h>
183 static int print_it(char *arg)
184 {
185 return printf("arg is %d\n", atoi(arg));
186 }
188 int main(int argc, char **argv)
189 {
190 return print_it(argv[1]);
191 }
192 </pre>
194 SUBSECTION(«ubsan.c»)
196 <pre>
197 #include <stdio.h>
198 #include <stdlib.h>
199 int main(int argc, char **argv)
200 {
201 int factor1 = atoi(argv[1]), factor2 = atoi(argv[2]),
202 product = factor1 * factor2;
203 return printf("%d * %d = %d\n", factor1, factor2, product);
204 }
205 </pre>