Reducing Build Times in C++ Projects

One of the most frequent criticisms of the C++ language is the difficulty generated by the long compilation and link times. All of that contributes to the perceived duration of complete builds for that language.

Sadly, few people understand that build times are not just a factor of the complexity of the language or the individual tools, but also of the code organization. While C++ provides the means to reducing compilation time when necessary, project choose to adopt practices that makes it hard to achieve any speedup in compilation and linking times.

Avoiding Header Dependencies

One of the main problems affecting the speed of C++ compilation is the unnecessary inclusion of header files. C++ libraries are certainly a big culprit for these problems, because they employ the widespread tactic of including even the kitchen sink through a single header file. This is done in the name of simplicity, and it definitely makes life easier when working in small projects, but it has a negative effect on build times for large scale products.

For example, consider the infamous “windows.h” header in the Windows environment. With a single line, this include file makes a compiler request that results in processing tens of thousand lines of declarations. Many other libraries follow this misguided idea of making life “easier” with the use of an all-encompassing header file.

In a well designed library, header inclusion should be done only when needed. For example, if you are using only classes A and B, your list of include files should be limited to <a.h> and <b.h> only.

Another important tip: avoid including headers inside other other header files, unless strictly necessary. For example, if you only use pointers and references to a type, there is no need to include the header in the class declaration. It is enough to present a forward declaration of the class or struct.

Recursive header inclusion is insidious because it will affect everyone building against that header file. This generates unnecessary work for the compiler and even for the linker in the next build phase, because of the way template instantiation works.

Avoid Templates When Possible

Another common problem in modern C++ code is the overuse of  templates. Templates are a great technology and has been used by C++ designers to provide features that are not natively available in the language itself. For example, C++ templates allowed library designers to create useful containers such as vectors and map, in a way that would be difficult or impossible using other techniques.

It has been found, however, that templates have a number of problems. These problems are nowadays widely recognized as affecting code modularity in programs of all sizes. First, templates provide their own compile-time language, which may slow down compilation significantly. Second, templates generate terrible error messages due to the way they were designed. Third, templates make it hard to provide separate compilation, since the compiler needs to  have direct access to the definition of a template in order to generate code. Fourth, templates have the potential to unnecessarily increase the size of executables, due to the need to generate separate instantiations for each type. Fifth, generalized use of templates increases link time, because the linker has to remove copies of identical templates generated during compilation.

As we see, despite the advantages of using template libraries, there is also a lot of downsides that need to be carefully balanced. My personal view is that it is great to use the STL and other elements of the standard C++ library that use templates. After all, they are there for an important reason. They are also great for low level code that needs to run as quickly as possible.

On the other hand, most application code should be organized around classes and other object oriented concepts that do not depend on the use of templates. While templates have the potential to create very fast code, they can be difficult to reuse and extend. Object oriented code, however, may be just a little slower but it was designed with extensibility in mind. Therefore, application level constructs are much easier to handle when expressed with classes instead of templates. Additionally, there is a extensive body of knowledge about software design using object oriented concepts that is much more mature then the current techniques used for template-based software.

In summary, consider C++ templates as a technology that has advantages and pitfalls in the same way as other concepts in the language, such as pointers. Don’t use them everywhere.

Day 27: Design Resilient Interfaces

Interface design is at the heart of software engineering. While it takes a good program to implement an algorithm, it takes a top programmer to design clean interfaces that will make software development easier, not harder, as time goes.

First let consider our definition of resilient interface. In a few words, an interface is resilient if it can resist the test of time. In any project, as development progresses, a number of concepts will change and need to be reevaluated. This reevaluation will involve the needs and wants of users, managers, as well as the personal preferences of software developers. A resilient interface will help the development of these new features, instead of becoming a hindrance in the process.

Resilience in Web Software

For example, a good interface for a web system will provide the basis for future growth of the site, as new services and interfaces are added. In such a design, it is expected that data will be accessed in a uniform way. Data manipulated and stored by the service will also be, whenever possible, independent of how it is acquired. Observing these guidelines will, overtime, provide simple interfaces for the access and manipulation of data, and will result in a reduction in maintenance efforts.

On the other hand, an inappropriate software interface for a web service would difficult future expansion. For a concrete example, this could involve the tendency to tie particular data attributes to the way it is accessed. In that case, if the data access changes, then this kind of interface will become an obstacle that needs to be circumvented by future developers.

Core Qualities of Software Design

Good interfaces are about the essential qualities of the software. Rather than trying to capture simple details, the objective becomes to understand the core of a computational problem. But designing interfaces is not only about understanding the present requirements. It also involves the experience of understanding the inevitable changes in requirements over time.

That is why it is so difficult to create resilient interfaces for a completely new service: if you see a particular problem for the first time, it is hard to anticipate all the issues that will need to be addressed in the future. Therefore, it also becomes increasingly hard to anticipate the necessary flex points that can make that expansion possible.

Experience becomes a fundamental skill in determining those areas of the product that will require revision. The same design experience is also vital to avoid gratuitous generality, which can effectively impose a performance penalty on the software.

I have encountered many examples of poor design, and they frequently involve solving a simple problem with an algorithm that is more general than necessary. While one can sell such an strategy based on the perceived future “flexibility”, it doesn’t make sense to live with features that will never be used. The cost of applying an over-engineered solution will most certainly become a heavy toll to developers.

For example, I have seen frameworks that provide event-based capabilities that are useful, but hardly necessary for a database-driven application. The result was that maintenance costs became almost prohibitive, and the possibility of introducing bugs were much higher than necessary.

Adding flex points to a software design should be done with  care and full understanding of the possible directions for a product. Creating complex interfaces just for the sake of it can be a fatal decision for the future of a project.

Conclusion

Designing software interfaces is an activity that has an impact much higher than just implementing code. The implementation of a feature or a complete product may take a few weeks or months. But the design used will dictate the direction of the product for years to come.

A balanced design needs to take into consideration the needs of the product and the organization behind it. It should be possible to add features to the existing implementation, but we should also avoid the onerous maintenance of unused features and options.

Resilient interfaces, therefore, don’t appear as an individual, one-time effort. It is more commonly the result of experience with several designs, which contemplate deep understanding of the domain area and the programming language used in the implementation.

The C Standard

One of the biggest milestones for C was the completion of the standardization process. After the creation of the standard, which involved many people from several countries, there is a single set of features that is supported across C compilers in the market.

Before standardization, different vendors could, sometimes out of necessity, implement variations of the C language. For this reason, unexpected variations were present in the different compilers during the first years of the development of C. As a result, it was much harder to create C programs that would run on different machines, or even in the same machine but on distinct compilers. This became even more common with the proliferation of operating systems based on UNIX, which happened in the early 80s.

After the creation of the ANSI C standard, most of the compatibility issues have been solved, at least at a basic level. With enough care, it is nowadays possible to write C programs that will run unchanged in most of the computing platforms currently available.

The C standard is comprehensive, but simple, as is the language itself. It describes only the basic mechanisms available to all C programs, such as the lexical and syntactical conventions, the pre-processor and the signature of the main function, for example.

The standard also defines a rich set of basic libraries that are made available to any program that might decide to use them. The standard C library is composed of functions and data structures to perform operations such as input and output to files and the terminal, memory management, string manipulation, and mathematical operations, among others.

The set of standard libraries is available to programmers by using the #include directive. After a library is included, the program gains access to the desired subset of functionality.

Another important reason to know about the C standard is understanding that there are still some extensions provided by particular compiler vendors that are not supported in other environments. Such practices are usually done for the sake of convenience to users, but they can also be dangerous if used in code that might one day be ported to other platforms. The compiler documentation will usually point out what the platform specific extensions are, so that the programmers will be aware of the portability issues when using that feature. While it might be tempting to use a feature that can save some work now, it is always important to consider the need to compile that code in another environment. Since it is frequently hard to determine the importance of a piece of code as it is being written, the general advice is to avoid vendor-specific extensions whenever possible.

More recently, by the end of the nineties, a new standardization effort called C99 was created to add more modern features that were felt to be lacking in the language. For example, a boolean data type was added to simplify the manipulation of logical values. Additionally, variables can now be declared in any position where statements may appear, instead of only at the beginning of a block, as it was before C99.

Despite some clear advantages of C99, it is still not as widely supported as the traditional ANSI-C. Part of the reason is that many of the features introduced in C99 are already available in C++, and therefore users and vendors felt that they don’t need these features immediately implemented in their C code. It is expected, however, that newer versions of C compilers will gradually add support to all the features described in the C99 standard, and it is therefore a good measure the be aware of the differences.