When programmers socialize, they entertain each other by telling ``war stories,'' amusing anecdotes about spectacular or intriguing failures of computers and programs. These stories are part of the collected wisdom of programming, little parables that capture lessons about the good and bad engineering that goes into software. One of the reasons we wrote The Practice of Programming was to re-tell some of these stories and to pass on the experience they illustrate.
Here's a war story that didn't make it into the book. Not too long ago, we were using a commercial ``terminal emulator'', a program that simulates in a window an old-fashioned terminal in which to run a command interpreter or telnet session. By default, the emulator didn't keep text that scrolled off the top of the screen, but a pull-down menu of options allowed one to specify a number of lines of ``buffer'' to store text that would otherwise be lost. At one point, we were using the program to look at a large file, so we went to the menu and set the number of lines of buffer to 999, the largest value that would fit in the three-digit field. When we clicked OK, the window vanished, leaving only its title bar! Clearly, the program had a bug and, although it accepted the large value, it didn't handle it correctly.
Because the button to activate the pull-down menu was missing, there was no way to restore the buffer size and, we hoped, recover the window. So we deleted the window using the delete button on the title bar.
But we still wanted to use the program, so we started it again. Unfortunately, the program had recorded the large buffer value in permanent storage, so the new instance of the program was also represented by just a title bar. What had started as an attempt to read a file had now turned into a puzzling debugging problem. How could we get the program back?
We spent a long time searching unsuccessfully for the location of the file that recorded the size of the buffer, hoping to reset it. Then we tried to find some other way to access the program's data, such as by a separate interface to its parameters, but that too was fruitless.
Then we realized that the problem was likely to be in the display code for the text of the window itself and that, even though it was invisible, the pull-down menu button might still work if we could only find it. So we carefully and methodically started clicking the mouse along the bottom edge of the title bar until--aha!--the options menu popped up. We were then able to restore the buffer size, whereupon the program returned to the screen and we were able to continue working.
After this adventure, we read the documentation for the program, where we learned that the valid buffer size was at most 399 lines. We did a little more experimentation, now that we knew how to get the program back, and indeed 399 lines worked fine, while 400 caused the window to disappear.
As with any good programming war story, there are lessons in our little adventure.
This silly bug would have taken far less time to fix than it cost us to uncover; who knows how much time has been wasted by other users? It would have taken even less effort to have prevented it in the first place by a simple check for valid input. Writing software correctly is the best way to avoid problems, and one of the most important parts of being correct is defending against invalid input.
The program wasn't tested thoroughly either. An effective way to test code is to exercise it at its natural boundaries. The program's author must have known that 399 and 400 were critical values, since 399 represented the biggest possible buffer and 400 the smallest illegal value. If the program is going to fail, it will likely fail at one of those, along with small values like 0 and 1. It's easy to test such cases, often while the code is being written.
But when software has bugs, there are ways to track them down and fix them. Debugging is a process of narrowing down the possibilities, finding the critical input values that cause the problem, eliminating things that can't be related to the problem. Each debugging run is an experiment that tests a hypothesis. This example was hard because we didn't have source code, and the error symptom was that the display shrank to nothing!
A program should do sensible things, so that for typical applications it can be used without documentation; this is especially true for programs with graphical interfaces, where no one ever reads the instructions. The program could have been written without limits on the buffer size (easiest to use, and no documentation needed), or it could have included a "maximum 399" label near the input box (documentation visible when it's needed), or it could have truncated a too-large value to the maximum (silently enforcing a limit), or it could have checked the input and popped up a message box when the input is invalid (irritating but valid). Any of these choices is clearly better than proceeding with bad input.
We all make mistakes; war stories provide a way for us to learn from them. Many of the principles set down in The Practice of Programming were distilled from run-ins with programs like the one we described here. By following these rules, you can use the experience of others to help you write better software.
Sun Aug 29 07:30:50 EDT 1999
Copyright © 1999
Lucent Technologies. All rights reserved.