Day 30: Understand Where Resources are Wasted

Writing great performance code is as much an art as a science. Much of the process involves thinking critically about what you are doing and how to change the existing approaches. Unlike physical construction, for example, where rules of thumb have been developed over centuries, the software engineering discipline is still too young.

What is even more important, the needs and goals of different projects are so different that any rule created in a particular context is difficult to generalize to  other areas. This is even more true can it comes to measuring and improving performance.

Despite these general issues, it is still possible to achieve high performance in most programming environments. It is, however, a matter of experimentation and careful testing of the available development options.

Finding the exact areas where resources are wasted is one of the main tools available to engineers and software designers. By careful examination of a software system, it is possible to determine what regions are responsible for most of the time spent in processing as well as input-output tasks.

Use a profiler

The profiler is the first important tool for performance optimization. Most languages will allow you to determine the most used methods and functions, as well as how much time was spent in those activities.

This kind of information is precious because it shows where to spend time and effort. Without a good understanding of where resources are wasted, there is little advantage in doing performance optimization.

In fact, as has been famously stated by Knuth, early optimization may have bad consequences for the development of a system, since a priori we down’t know if these efforts will lead to any performance advantage. Premature optimization may, instead, result in algorithms and/or data structures that are suboptimal for a problem. Such “optimizations” can also remove flexibility from a design even when this is not necessary or desirable.

”Careful examination of a software system allow us to determine what regions are responsible for most of the time spent in processing.”

A typical example in C++ is the polemic about the use of virtual pointers. A lot of C++ programmers try to limit the number of virtual functions because they know that there is an inherent performance penalty in using them.

However, while it is true that there is a slight decrease in performance by the use of virtual functions, it turns out that this doesn’t matter in the vast majority of use cases. In C++, as in every other high performance language, the majority of the computational load is concentrated in the inner loops.

If we are concerned about the performance penalty of virtual functions, what needs to be done is to determine where the inner loops are, and avoid the use of virtual pointers in those areas alone. For everything else in the program (which is 99% of the code), there is no difference between using virtual functions or not.

Test Under High Load

Another important part of the process is to test the system under high load. Preferably, the kind of load that would be above any conceivable normal level operation.

This is necessary because some algorithms will perform differently under high load conditions. For example, if you use algorithms to operate on collections (such as those from the STL or from the Java standard library) the time necessary to perform some operations may be quadratic (or even worse) on the size of the input.

Such a change in behavior means that high load situations will see dramatic increases in time for otherwise common operations. This is the kind of performance issue that can and should be considered during the design phase, but that should also be thoroughly tested to avoid inconvenient results.

Don’t Guess

The whole idea of measuring performance is to avoid guessing games. Every time you need to guess what is going on in a system, the result is less time to do other important things, such as coding new features, removing existing bugs, or improving the documentation.

Instead of trying to prematurely optimize a system, and spending several hours in an “improvement” that may never gain you anything, it is better to spend a few minutes testing and profiling. As a result, you will know for sure where your efforts are better utilized.

Another side effect of this kind of controlled measurement is that you will have a better idea of how other parts of the system operate. So, in the future, you will know that a particular change in a class may have a small or large impact based on previous information. While this doesn’t replace the need for future profiling, it provides better guidance than the simple guessing game that is so common in this industry.

Day 29: Fix the Cause, Not the Symptom

A lot of programmers’ time is spent in fixing bugs. As code evolves, it is inevitable that inconsistencies will appear and need to be removed.

Moreover, it has been observed by many in the software development industry that maintenance and bug fixing is the most expensive part of software projects. Therefore, it is not a surprise that fixing bugs is an important part of successful programming.

Clearly, a good part of reducing the cost of bugs is not allowing their existence in the first place. If we can prevent the appearance of errors by proper development techniques, we will be avoiding a lot of work that, as a result, won’t need to be done or repeated later.

A lot of the techniques we discussed in previous topics, such as unit testing, appropriate software design, and coding standards are targeted at this. However, despite the best efforts there are still situations in which fixing a bug is necessary.

Tips to Fix Programming Errors

When bugs are detected, a lot of energy can be spent in activities that don’t lead to an immediate resolution of the problem. If it is not possible to avoid the occurrence of bugs in the first place, we should at least make sure that the bug eradication phase is properly handled.

Here are a few tips that can be used to fix software defects:

Avoid using inefficient debugging techniques: There are cases when a printf is just what you need to detect what is going on in your software. However, there are bugs that are difficult to trace, and printing data at random spots may not do anything to help identifying the issue.

Good judgement is necessary to identify such cases. Do you think that it is better to find the relationship between several variables at a particular moment? In that case, it is probably better to use a debugger with a breakpoint set to a particular line. On the other hand, if it is important to observe the a sequence of values produced by the program over time, then it is probably wiser to have a printf of a particular variable. Again, it is a case of using the proper tool for each job.

Devise a debugging plan: The biggest problem when debugging code is to start making changes without understanding what is really going on. To identify the exact place where the problem is happening, you need to come up with a plan. The investigation that follows can be used to find out exactly where and why the error was caused.

Find a way to reproduce the error: In debugging, a lot of time is spent on tracking an error. However, the most important thing is to determine exactly how to reproduce the issue. If there is no certainty about how the bug can be reproduced, there will be no way to guarantee that any particular change in the code has fixed it.

Before doing anything else, try to determine the exact condition that triggered the bug in the first place. Also, make sure you have a proper way to achieve the same results in a controlled way.

Fixing the root cause and not the symptoms: A hallmark of unexperienced developers is the belief that you can fix a bug by removing the symptoms. This is a dangerous way of thinking, because it can make a difficult situation even worse.

When there is a bug, it is good that it become visible, so that developers have a chance to find its cause and close it. On the other hand, by fixing the symptoms of a bug, the root cause will continue undetected. Eventually, it can cause an even worse problem, and when this happens it will be even more difficult to detect than in the first time — because now the symptoms cannot be easily observed.

Create regression tests: a regression test is a tool used to reproduce the exact conditions that generated a bug and guarantee that it will not be present in the future. The reasoning for applying regression tests is that if an error happened in the past and fixed, we should make sure that will not reappear in the future.

A good method to implement regression tests is to create unit tests that exercise the intended behavior. The tests will make sure that the software will always provide the expected answer in the future.

Conclusion

As developers, we should do the best to avoid bugs in the first place. However, it is inevitable that errors will appear in our code over time. When this happens, we should strive to deal with errors responsibly and with a proper plan. Moreover, avoiding to deal with bugs is self defeating, because it will most probably cause other errors down the line, which will be even harder to fix.

Day 28: Frequently Repackage Code Into Smaller Libraries

A problem that is common to many programming projects is the deficient separation of code into distinct packaging units, such as libraries and modules or assemblies. Languages such as Java, Python, and C++ provide great flexibility for users to create such modules.

Despite this, many developers, especially beginners, misuse this flexibility by packing as much code as possible in individual source files and libraries. The result of these practices is source code that is difficult to manage and maintain.

Packaging and Interfaces

In a sense, packaging functionality into modules and libraries is an extension of the problem of devising good classes and interfaces. However, it is directed at a higher level: not the lower level of programming elements, such as functions and types, but at the level of source code packaging, linking, and distribution.

A typical example occurs when a new team starts to develop a large application. Lots of functionality are created, and for simplicity this functionality is hosted in a single executable. This is fine for the first iterations of a product, and it really streamlines the process of building and distributing software.

What happens over time, however, is that interfaces inside the product become easily intertwined, even when developers are careful about it. What is worse, however, is that a change in a module that processes network connections, for example, will prompt a rebuild of the whole project.

Depending on the language this may become a problem sooner or later. In C++, for instance, building processes may sometimes take hours. Python, on the other hand, doesn’t suffer from building time issues, but even then it is a hassle to test and make sure that the whole project is OK, just because one of the modules was changed.

Interface Separation

The ideal goal is to create libraries and other higher level modules (depending on the language), such that a routine change in that  module doesn’t need to force the repackaging, testing, or even building of other parts of the software.

An ideal situation, for example, is what happens when an operating system has a regular update (for example, a security fix). When that happens, application software will not need to be recompiled or even retested. And that is true even though every application depends heavily on the APIs provided by the operating system.

Clearly this level of separation is much harder to achieve for normal application code, simply because an OS kernel has been painstakingly designed to reduce coupling. But this doesn’t mean that we should’t strive to achieve some level of separation.

The most basic way of improve code organization in this sense is to create independent libraries to handle common application concerns as soon as possible. For example, a library to handle networking operations can be added to the project, and all interactions with the network routed to that module. As a side note, a lot of open source projects were started exactly in this way. For example, the now ubiquitous Gtk+ widget set was initially just another library developed by the creators of Gimp.

Advantages for Compiled Languages

Another great advantage of repackaging code in this way in a statically typed language such as C++ is the reduction in compilation and link time.

Large code bases in C++ tend to create huge dependencies due to the file inclusion system, which is based on a preprocessor. When not handled properly, such a system creates link time dependencies that are hard to break. By separating modules into libraries as soon as possible, one can reduce the dependencies in the long run.

In summary, code packaging is an important skill that you should strength by making thoughtful decisions during the design phase of a project. Creating libraries make it easier to handle dependencies in application code. In languages with long compile times such as C++, this practice can save you a lot of time and reduce dependency issues that may arise during future development.