Day 21: Design The Big Picture First

In a previous post I suggested that optimizing a program should only be done with knowledge of how time is spent. The reason is that we usually don’t know where a program is spending most of its time. This results in a lot of energy being wasted in optimizing what doesn’t need improvement.

Similarly, a lot of design effort is spent on areas of the system that don’t require so many resources. Many details are introduced very early in the process, when they may be more a hindrance than a help. The main reason why this happens is that we frequently design systems without thinking of the big picture first.

Writing code is hard. No matter how much experience you have, there is always some aspect of the problem that you don’t understand completely. And these are the problems that will bite you when you less expect.

Consider the architecture first. What are the main functions of the system? How these functions could be divided into higher level elements, such as services, applications, and libraries?

Once you determine how the system is functionally and physically defined, you can start to look at the structural organization of each element, such as major classes, functions, and libraries.

This physical design planning is essential to determine the best way for the system to develop and evolve as necessary. There are few things so frustrating as creating software that has bad architecture. Even though each piece might have been created successfully, the pieces don’t fit together and whole system becomes a heavy burden to maintain and evolve.

Design of Components

The way components interact in software application is also of great importance. Subsystems should have sane dependencies, so that they are easy to maintain and even to be interchanged if necessary.

For instance, a big problem that occurs in the design of components is cyclical dependencies. When two modules of the program need to know about each other, this creates an undesired coupling between components that is hard to break.

A practical example would be a music application in which the MP3 code depends on the recording code, and vice versa. In such a situation it is hard to create another application that deals only with MP3, because the recording code wants to be part of it. This is not only undesirable for the whole application, but can make it much harder to maintain each of the individual components.

A few design patterns offer help in this regard. For example, the MVC pattern is an object oriented design that was created to avoid the problem of coupling between the graphical interface and the application model. It is possible to find modern implementations of MVC from desktop GUI libraries to web frameworks. Even if you don’t have access to an MVC library, it is possible to use the design pattern when creating your own code in order to avoid this insidious type of dependency.

Conclusion

Lots of developers think that creating code is just a matter of putting together a set of classes or functions to do the job. This might work for simple applications, but most systems will require a thoughtful and maintainable design.

The best approach is to look at the big picture and design how the system is supposed to work. Think about the major components and how they interact. Make sure that you understand the interaction between components. Finally, look at the source code and how it should be organized. It is a bit of initial work that can solve a lot of problems in the future.

References

[1] Image credit: commons.wikimedia.org

The Problem with C++ Templates

C++ has evolved as a language that encompasses many modern programming techniques. In this way, C++ has managed to stay relevant, despite the many issues that it may have compared to safer languages such as Java or C#. In C++, one can decide the set of features that will be used from within a large set of alternatives. If you’re inclined to do so, you could use only an strictly object oriented style, or maybe a functional oriented style, if that suits you better. Moreover, with the additions proposed in C++11, you can even use closures to make your coding style more functional.

Among the features provided by C++, one that it pioneered was the use of template programming. While templates provide some of the features allowed by Lisp macros, they also enforce type safety. This safety helps a lot when working with a language such as C++, which is so strict about the correct use of types.

Templates and the Type System

Templates have been crucial to provide some features previously missing in C++. For example, the idea of generic containers can only be possible with some kind of type instantiation. Even other languages such as Java and C# had to introduce template-like features in order to cope with the problem of generic containers.

While templates are very important, they also bring a severe burden into the language. Templates operate at compile time, making decisions about how to instantiate a particular type given the options allowed by the available template code. This kind of decision is not only powerful, but also time consuming.

The trouble with using templates more generally is that they are not very programmer friendly. While templates solve important problems, they are also hard to use and design. Here follow a few issues that I believe are of high importance.

Module Separation

C++ provides header files as a method for creating modules and explicitly separating interface from implementation. However, templates make this separation harder to achieve. When a compiler creates code from a template, it needs not only access to the interface, but also to the whole definition of the template. The reason is that much of the required information used by a template is stored in its definition. For example, what are the methods accessed in a particular type? These properties are needed by the compiler to decide if a template can be expanded or not.

In the past, there were some proposals to fix the module separation issue. Using the external keyword along with a template declaration was one of the options proposed by the ANSI committee, but it was never implemented by the major compilers, and nowadays it has been deprecated. It seems that there will be no solution in foreseeable future, so the only thing it remains for software engineers is to minimize the use of templates unless they are really required.

Compilation Time

Another issue that is a consequence of the above is the compilation time necessary to use templates. Since all of the template must be accessible to the compiler, if no additional calculations were necessary we would still have a severe burden by including templates in a project. However, remember that templates provide a sub-language within C++ that is known to be Turing-complete. This means that badly behaving templates can very well turn into infinite loops, among other issues that need to be debugged — at compile time. This means that using templates requires a lot of care from the part of the programmer, since it can result both in compile-time as well as run-time bugs.

But it doesn’t stop there. Templates also have a very particular way to bloat executables, which can turn a simple looking solution into a nightmare of slow linking times. Templates have this tendency to generate additional code for each instantiation for a concrete type. For example, for every compilation unit, in the first time you create a std::vectorthe compiler will try to instantiate yet another version of std::vector, even if there already is a version of that template for each compilation unit of the program. This is not such a huge problem because a linker will check all existing object files and remove duplicate versions of template instantiations. The real problem is that this process takes time. In a big C++ project this can consume from a few seconds to several minutes.

The other issue, however, happens when the method described above doesn’t remove all the multiple copies of a template. In fact, there might be thousands of legitimate instantiations of a template in a single program. For example, if you have thousands of types (which is common in a C++ project), you may also have thousands of different instantiations of std::vector, and even millions of instantiations of std::map(not common, but possible).

Then you have design issues that make this even worse. I once worked in a project that had a template for String, where n was the maximum size of the string. The designers of this template thought they were doing a good service, until you realize that there is a copy of this template for each of the hundreds of string sizes used in the project. Also, using a template like this induces the creation of related templates, so you end up having ImmutableString, etc.

Conclusion

I just scratched the surface of possible problems when using templates in C++. My goal is not to discourage the use of existing templates, such as the STL, but to caution against the indiscriminate use of templates in C++ projects. I personally think that you should use templates only to go around limitations in C++, because that is why they were originally created. Trying to use templates as a normal programming paradigm unleashes more problems, as we just mentioned, than it can possibly solve.

Further Reading

C++ Templates: The Complete Guide: I like this book because it covers all the information you need to become knowledgeable about templates. The book starts from the basics, but also discusses advanced techniques for template writers.

Web Search is Losing Its Relevance

When web search was in its infancy, search engines worked to simplify the life of its users and provide high quality results. Google’s work was then thought be to be separating good content from irrelevant content for a particular set of keywords. That relationship worked well for some time, but it has changed for the worst in the last few years. The fact that Google makes so much money from each of us as search engine users has forced a change in the way they operate.

Gradually, Google is inverting the relationship and saying that not they, but web masters themselves should be responsible to handle meta information describing the content of web sites. For example, Google now provides several tools, broadly labeled as Webmaster Central, which can be used to enter all kinds of data about a particular web site, from a site map to a semantic description of its pages.

This may look like an improvement, but it is wrong in several levels if we analyse the situation with care. First, by using this type of strategy Google starts to dictate what is acceptable or not in their index, while promoting the use of tools like Webmaster Central to create a “semantic web”. The true question is: why should page writers care about this? Isn’t the search engine responsible for finding the information it needs to classify web pages?

Second, and in my opinion worse, is the fact that this strategy creates the incentives and the opportunity for spammers to do better than honest users. If you need to go through all this effort so that a web site is properly listed in a search engine, then the only people willing to do the work will be the very same ones who profit the most from it. In general, this means people that creates content farms and similar web sites in the first place. After all, with few notable exceptions they are the only ones making big bucks from Google — not the after hours hobbyist that maintains a small web site.

Changing Expectations

Google made a lot of sense in the early web. In a non-commercial Internet, content was organized mostly by hobbyists, people that created isolated web sites on particular topics. This was the case because the web evolved from university web sites, which still follow pretty much the same organizational style.

The evolution of the web and the possibility of content monetization made the creation of content a big cash cow. Increasingly, it is not the case that your questions will be answered by an independent site. Most of the world’s content is currently hosted by commercial web sites or sites that are non-commercial but linked to big non-profit organizations, such as Wikipedia.

Google has transformed the web into a place dominated by big content sites. Places such as StackOverflow thrive because they provide an increasing amount of content that Google indexes and sends to web search users. In other words, it depends on industrial scale content production.

However, after some time operating under this tendency, what becomes the point of using Google anyway? For example, if I have a generalist question, one that could be answered in an encyclopedia style, nowadays I just go to Wikipedia, and avoid spending my time looking at Google-spam. Similarly, for programming searches I could very well go to StackOverflow directly, and have probably the same chance of getting my questions answered.

In this scenario, I don’t see a need for Google. It is still useful only as an engine to find content inside particular web sites, the ones that you know and trust. What I see as the future is people choosing content web sites according to what they want to learn about, and staying on those web sites as long as possible, without exposing themselves to Google spam.

What Might Work

In the future we will probably need more directories (probably curated by people) than search engines. Web sites will be ranked by their qualities, and you will be able to search the ones that you select, while blacklisting or just ignoring the ones that people know are spam.

It sounds odd because the web was organized like this in the very beginning, with people and companies such as Yahoo creating a curated directory of sites based on areas of interest.

The future may be similar, but with a rank of web sites instead of single web pages, since people need to determine the general quality of a web site before using it. And it looks like Google is trying to do exactly this anyway, since they are using a heuristic to determine sites that should be penalized and the ones that should stay around.