In his OOPSLA keynote, Martin Rinard, associate professor at MIT, talked about how we should make systems more resistant to errors, rather than putting a lot of effort into trying to create error-free systems.
Professor Rinard went trough the great schools of thought: rationalism, empiricism, and cluelessness (the latter maybe not as renowned as the others, however known to be practiced by golden retrievers and various blonde people in the entertainment industry). Finding that none of the three by themselves were sufficient when working with computer systems, he concluded that an approach using selective cluelessness would be the way to go.
The point of the selective cluelessness was that our cognitive ability is a limited resource – our brains can only understand so much, so we have to be selective in what we focus on. Hence, we have to choose some things to have no clue about. In this respect, programming languages should focus on reduce our needs to know everything what’s going on, so that we can stay focused on the problem at hand.
One of the inherent problems we face today when programming computers could be formulated like this:
- Programs are unforgiving and must therefore be correct.
- To make a program correct, we must completely understand the problem.
- Programming is difficult, therefore simplicity and elegance are keys to success.
- Unfortunately, simplicity and elegance are hard to come by.
- To be simple and elegant, you need to know what’s going on.
As systems get larger, you have to focus on subsystems, and then you lose ability to know what’s going on in other parts of the system.
Hence, you will find it harder to find simplicity and elegance, and the systems will get even more complicated.
Another point that Professor Rinard was making, was that brute force is often a better approach. If you use brute force, there are good changes things will work. If you try to get smart, it will all come down. In practice, simplicity is a nonstarter, and elegance is largely irrelevant in practice. Applications such as Windows, Linux, and Microsoft Office are hardly simple and elegant, however very successful.
In order to make better system, Professor Rinard argued that software should be made to be acceptable, not necessarily correct. Cost and difficulty of developing software is roughly proportional to the amount of correctness. Hence, systems should be made so that errors are more acceptable, rather than trying to make them error-free. The term that he used for this, was failure-oblivious computing.
The people at MIT have made a study where they tried to apply failure-obviousness to existing programs to test the result. They used applications such as Pine, Apache, and Sendmail. What they did, was that they focused on some well-known problem areas in applications:
- Reading/writing outside of arrays (a typical C/C++ problem)
- Memory does not get deallocated, resulting in memory leaks (also a typical C/C++ problem)
- Infinite loops
The researchers identified the problem areas in the applications, and changed the them so that they would just ignore instances where it wrote/read outside an array (write operations were discarded, reading returned a random value), and changed the program to overwrite old memory slots instead of allocating new ones. They allocated k chunks of memory for the program, and when the program needed a memory space for the k+1 object, the wrote it in the place 1 instead, overwriting whatever was there from before. Furthermore, systems where not allowed to run infinite loops, the systems were only allowed a finite number of loops.
The results of the study was that the failure-oblivious systems in fact were more stable than the other versions, and operated as expectedly anyways. The lessons to be learned was that the software could never crash, they would always continue and produce something, and that something was often good enough. In some cases, he argued, programs could just swallow exceptions and continue rather than halt. Thus, correctness was traded off for stability.