Literate Programming with Plain C Files

Literate programming is the methodology for software development proposed by Don Knuth and used by himself to build the TeX system for document preparation. TeX is the system used as the basis for LaTeX, which is itself used by everyone in science and mathematics to write technical documents.

The original version of literate programming was developed in a dialect of the Pascal language and the system was called web (notice that at the time there was no such thing as the World Wide Web). Since then, Knuth has moved on to use C as his main language. The resulting literate programming system was improved and renamed to cweb.

Literate Programming

The main advantage of literate programming is that it allows the smooth integration of programming and documentation. Cweb programs are composed of documents that mix a description of the program with C code that implement what is being explained. This document is later on processed to create a source file as well as a documentation file.

Literate programming provides a paradigm shift in software development because code becomes secondary to the main description of a concept. The explanation of what a program does is what drives the programmer through a literate programming document. In a sense, source code in C (or any other language used by the system) exists only to illustrate what is is being discussed in the documentation part.

On the other hand, literate programming provides all tools necessary to put the coding sections together in the right order to create full-featured programs. Instead of programming according to the rigid sequence required by the C syntax, one can add new instructions to the code in a style that is more consistent with the general concepts, rather than having to worry about exact placement and with particular file locations.

Using cweb on Standard Projects

One of the critiques of literate programming is that one needs to apply it from the start to benefit from the possible advantages of this coding style. However, most existing code has been developed in a non-literate way, that is, it is code that follows the strict sequence dictated by language compilers. All established companies have a huge investment in non-literate code that is being used in production, and it is not realistic to believe that such code would be rewritten for the sake of a new programming methodology.

Consequently, if one were to convert from normal programming conventions to literate programming than they would need to rewrite a huge number of C, Java and other language-specific files. And notice that all this would be necessary just to start programming with literate programming tools such as Cweb.

One of the things that people don’t know, however, is that it is not strictly necessary to use separate literate programming files to create literate programs. Although tools such as cweb expect to read a file that is a mix of documentation and code, it is possible to trick the cweb tools to use C source code just as well, after a few modifications have been applied. That is the goal of the script called “CwebNoTangle” that I wrote some time ago.

CWebNoTangle script

The main idea behind the tool is that, if we intend to start using literate programming without rewriting the existing software, the first step is to avoid using ctangle altogether. CTangle is the part of cweb that reads literate programming files and convert them into C source files. Most of its work is concerned with putting together C fragments in the order that the C compiler expect them to appear.

For example, in C the header files for a source file are generally included in the top of the file. However, a literate program doesn’t need to define such header files at a particular point. Instead, it is possible to add header files at any location when they are first needed.

However, we just need to notice that when using a straight C file as source for the literate tool, it is not necessary to perform an independent reordering step. Instead, the source file is already in the order expected by the compiler. This means that the ctangle tool is not strictly needed to create a syntactically correct source file.

That’s why I called the set of scripts CWebNoTangle: they allow the use of cweb to create full documentation for software, without the need to use ctangle for creating C source files.

You can take a look at how the system works on this github project [1]. The code is written as normal C with extended comments, which store documentation that can be easily translated into TeX commands.

Conclusion

Literate Programming is a great tool for creating software that is easy to understand and modify. Traditional literate programming tools are hard to integrate with existing environments, however, because they require code to be written in a different format.

It is possible, however, to use normal C files as input for both source and documentation. A simple example of how to do this is provided in my CwebNoTangle script. You can use a similar system or create your own, but it is nice to see that there is a simple transition between traditional programming methods and literate programming.

References

[1] https://github.com/coliveira/Hyperbolic/blob/master/cwebnotangle
[2] Image credit: commons.wikimedia.org

  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • E-mail this story to a friend!
  • HackerNews
  • Reddit
  • StumbleUpon
  • Twitter

Day 21: Design The Big Picture First

In a previous post I suggested that optimizing a program should only be done with knowledge of how time is spent. The reason is that we usually don’t know where a program is spending most of its time. This results in a lot of energy being wasted in optimizing what doesn’t need improvement.

Similarly, a lot of design effort is spent on areas of the system that don’t require so many resources. Many details are introduced very early in the process, when they may be more a hindrance than a help. The main reason why this happens is that we frequently design systems without thinking of the big picture first.

Writing code is hard. No matter how much experience you have, there is always some aspect of the problem that you don’t understand completely. And these are the problems that will bite you when you less expect.

Consider the architecture first. What are the main functions of the system? How these functions could be divided into higher level elements, such as services, applications, and libraries?

Once you determine how the system is functionally and physically defined, you can start to look at the structural organization of each element, such as major classes, functions, and libraries.

This physical design planning is essential to determine the best way for the system to develop and evolve as necessary. There are few things so frustrating as creating software that has bad architecture. Even though each piece might have been created successfully, the pieces don’t fit together and whole system becomes a heavy burden to maintain and evolve.

Design of Components

The way components interact in software application is also of great importance. Subsystems should have sane dependencies, so that they are easy to maintain and even to be interchanged if necessary.

For instance, a big problem that occurs in the design of components is cyclical dependencies. When two modules of the program need to know about each other, this creates an undesired coupling between components that is hard to break.

A practical example would be a music application in which the MP3 code depends on the recording code, and vice versa. In such a situation it is hard to create another application that deals only with MP3, because the recording code wants to be part of it. This is not only undesirable for the whole application, but can make it much harder to maintain each of the individual components.

A few design patterns offer help in this regard. For example, the MVC pattern is an object oriented design that was created to avoid the problem of coupling between the graphical interface and the application model. It is possible to find modern implementations of MVC from desktop GUI libraries to web frameworks. Even if you don’t have access to an MVC library, it is possible to use the design pattern when creating your own code in order to avoid this insidious type of dependency.

Conclusion

Lots of developers think that creating code is just a matter of putting together a set of classes or functions to do the job. This might work for simple applications, but most systems will require a thoughtful and maintainable design.

The best approach is to look at the big picture and design how the system is supposed to work. Think about the major components and how they interact. Make sure that you understand the interaction between components. Finally, look at the source code and how it should be organized. It is a bit of initial work that can solve a lot of problems in the future.

References

[1] Image credit: commons.wikimedia.org

  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • E-mail this story to a friend!
  • HackerNews
  • Reddit
  • StumbleUpon
  • Twitter

The Problem with C++ Templates

C++ has evolved as a language that encompasses many modern programming techniques. In this way, C++ has managed to stay relevant, despite the many issues that it may have compared to safer languages such as Java or C#. In C++, one can decide the set of features that will be used from within a large set of alternatives. If you’re inclined to do so, you could use only an strictly object oriented style, or maybe a functional oriented style, if that suits you better. Moreover, with the additions proposed in C++11, you can even use closures to make your coding style more functional.

Among the features provided by C++, one that it pioneered was the use of template programming. While templates provide some of the features allowed by Lisp macros, they also enforce type safety. This safety helps a lot when working with a language such as C++, which is so strict about the correct use of types.

Templates and the Type System

Templates have been crucial to provide some features previously missing in C++. For example, the idea of generic containers can only be possible with some kind of type instantiation. Even other languages such as Java and C# had to introduce template-like features in order to cope with the problem of generic containers.

While templates are very important, they also bring a severe burden into the language. Templates operate at compile time, making decisions about how to instantiate a particular type given the options allowed by the available template code. This kind of decision is not only powerful, but also time consuming.

The trouble with using templates more generally is that they are not very programmer friendly. While templates solve important problems, they are also hard to use and design. Here follow a few issues that I believe are of high importance.

Module Separation

C++ provides header files as a method for creating modules and explicitly separating interface from implementation. However, templates make this separation harder to achieve. When a compiler creates code from a template, it needs not only access to the interface, but also to the whole definition of the template. The reason is that much of the required information used by a template is stored in its definition. For example, what are the methods accessed in a particular type? These properties are needed by the compiler to decide if a template can be expanded or not.

In the past, there were some proposals to fix the module separation issue. Using the external keyword along with a template declaration was one of the options proposed by the ANSI committee, but it was never implemented by the major compilers, and nowadays it has been deprecated. It seems that there will be no solution in foreseeable future, so the only thing it remains for software engineers is to minimize the use of templates unless they are really required.

Compilation Time

Another issue that is a consequence of the above is the compilation time necessary to use templates. Since all of the template must be accessible to the compiler, if no additional calculations were necessary we would still have a severe burden by including templates in a project. However, remember that templates provide a sub-language within C++ that is known to be Turing-complete. This means that badly behaving templates can very well turn into infinite loops, among other issues that need to be debugged — at compile time. This means that using templates requires a lot of care from the part of the programmer, since it can result both in compile-time as well as run-time bugs.

But it doesn’t stop there. Templates also have a very particular way to bloat executables, which can turn a simple looking solution into a nightmare of slow linking times. Templates have this tendency to generate additional code for each instantiation for a concrete type. For example, for every compilation unit, in the first time you create a std::vectorthe compiler will try to instantiate yet another version of std::vector, even if there already is a version of that template for each compilation unit of the program. This is not such a huge problem because a linker will check all existing object files and remove duplicate versions of template instantiations. The real problem is that this process takes time. In a big C++ project this can consume from a few seconds to several minutes.

The other issue, however, happens when the method described above doesn’t remove all the multiple copies of a template. In fact, there might be thousands of legitimate instantiations of a template in a single program. For example, if you have thousands of types (which is common in a C++ project), you may also have thousands of different instantiations of std::vector, and even millions of instantiations of std::map(not common, but possible).

Then you have design issues that make this even worse. I once worked in a project that had a template for String, where n was the maximum size of the string. The designers of this template thought they were doing a good service, until you realize that there is a copy of this template for each of the hundreds of string sizes used in the project. Also, using a template like this induces the creation of related templates, so you end up having ImmutableString, etc.

Conclusion

I just scratched the surface of possible problems when using templates in C++. My goal is not to discourage the use of existing templates, such as the STL, but to caution against the indiscriminate use of templates in C++ projects. I personally think that you should use templates only to go around limitations in C++, because that is why they were originally created. Trying to use templates as a normal programming paradigm unleashes more problems, as we just mentioned, than it can possibly solve.

Further Reading

C++ Templates: The Complete Guide: I like this book because it covers all the information you need to become knowledgeable about templates. The book starts from the basics, but also discusses advanced techniques for template writers.

  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • E-mail this story to a friend!
  • HackerNews
  • Reddit
  • StumbleUpon
  • Twitter

Web Search is Losing Its Relevance

When web search was in its infancy, search engines worked to simplify the life of its users and provide high quality results. Google’s work was then thought be to be separating good content from irrelevant content for a particular set of keywords. That relationship worked well for some time, but it has changed for the worst in the last few years. The fact that Google makes so much money from each of us as search engine users has forced a change in the way they operate.

Gradually, Google is inverting the relationship and saying that not they, but web masters themselves should be responsible to handle meta information describing the content of web sites. For example, Google now provides several tools, broadly labeled as Webmaster Central, which can be used to enter all kinds of data about a particular web site, from a site map to a semantic description of its pages.

This may look like an improvement, but it is wrong in several levels if we analyse the situation with care. First, by using this type of strategy Google starts to dictate what is acceptable or not in their index, while promoting the use of tools like Webmaster Central to create a “semantic web”. The true question is: why should page writers care about this? Isn’t the search engine responsible for finding the information it needs to classify web pages?

Second, and in my opinion worse, is the fact that this strategy creates the incentives and the opportunity for spammers to do better than honest users. If you need to go through all this effort so that a web site is properly listed in a search engine, then the only people willing to do the work will be the very same ones who profit the most from it. In general, this means people that creates content farms and similar web sites in the first place. After all, with few notable exceptions they are the only ones making big bucks from Google — not the after hours hobbyist that maintains a small web site.

Changing Expectations

Google made a lot of sense in the early web. In a non-commercial Internet, content was organized mostly by hobbyists, people that created isolated web sites on particular topics. This was the case because the web evolved from university web sites, which still follow pretty much the same organizational style.

The evolution of the web and the possibility of content monetization made the creation of content a big cash cow. Increasingly, it is not the case that your questions will be answered by an independent site. Most of the world’s content is currently hosted by commercial web sites or sites that are non-commercial but linked to big non-profit organizations, such as Wikipedia.

Google has transformed the web into a place dominated by big content sites. Places such as StackOverflow thrive because they provide an increasing amount of content that Google indexes and sends to web search users. In other words, it depends on industrial scale content production.

However, after some time operating under this tendency, what becomes the point of using Google anyway? For example, if I have a generalist question, one that could be answered in an encyclopedia style, nowadays I just go to Wikipedia, and avoid spending my time looking at Google-spam. Similarly, for programming searches I could very well go to StackOverflow directly, and have probably the same chance of getting my questions answered.

In this scenario, I don’t see a need for Google. It is still useful only as an engine to find content inside particular web sites, the ones that you know and trust. What I see as the future is people choosing content web sites according to what they want to learn about, and staying on those web sites as long as possible, without exposing themselves to Google spam.

What Might Work

In the future we will probably need more directories (probably curated by people) than search engines. Web sites will be ranked by their qualities, and you will be able to search the ones that you select, while blacklisting or just ignoring the ones that people know are spam.

It sounds odd because the web was organized like this in the very beginning, with people and companies such as Yahoo creating a curated directory of sites based on areas of interest.

The future may be similar, but with a rank of web sites instead of single web pages, since people need to determine the general quality of a web site before using it. And it looks like Google is trying to do exactly this anyway, since they are using a heuristic to determine sites that should be penalized and the ones that should stay around.

  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • E-mail this story to a friend!
  • HackerNews
  • Reddit
  • StumbleUpon
  • Twitter

Day 20: Spend More Time in Maintenance Mode

Software maintenance is traditionally thought as a process that will happen in the distant future, and will be needed only during bug fixing operations and to keep an existing piece of software current with new versions of a compiler or operating system. Maintenance is also viewed by many as an afterthought that doesn’t need to be dealt with until something happens and there is a clear requirement to modify an existing application.

The hidden truth, however, is that maintenance is an ongoing process that happens during the whole life cycle of the application. It is not only necessary after the system has been delivered but, on the contrary, it happens even after the first line of code has been created.

From the point of view of a developer, any code that has been created, doesn’t matter how long ago, is in maintenance. Of course, if it is fresh in your mind, the task becomes easier. Just edit the file in question to update a function, for example. However, in many situations the process may not be so easy, even if the software has not been released yet.

Keeping the content of a program in your head is just one of the challenges of becoming a programmer. Today’s software is huge, and all that complexity must be addressed on a daily basis. Even the best programmers will have some difficulty to remember exactly what a method or class does in particular.

That is why it is so important to write code as if it were in constant maintenance mode. The first thing that this implies is that you need to make it easier to understand what a piece of code is doing. That’s because, most of the time, the maintainer of the functionality will be yourself. Without good practices it is easy to forget what a class or method is supposed to do, for example.

Entering Maintenance Mode

The number one goal of maintenance is to make things obvious. Despite this, a lot of code we have is still created in “write-only” mode: a programmer creates something that may work (or not) but the result is so difficult to understand that it is easier to rewrite the whole thing than modifying it.

A major flaw of many developers is writing code that is hard to read and modify. Just notice that if something is hard to read, it will be difficult to understand, even if it was written by yourself. This is even harder to deal with in software that was written by somebody else, but don’t forget that it can happen on your own code.

For example, clever algorithms may be completely clear for you when they are fresh in your memory. However, after a few months (or even days), what was a clear strategy may now seem as a complicated mess. It is important to look at the future readers of your program as a real audience that you need to address.

Try to answer the following question as you write code: would someone else understand what this class or function represents? Even though it may seem easy to answer such a question as you are in the middle of a coding session, what about doing the same thing 6 months from now? If there is any double about the answer for this question, spend more time trying to make the code as clear as possible. Remember that it is better to spend a little bit more time describing what you mean now than spending several hours later to understand what is going on.

The same idea is valid for fixing bugs. Why not spending a few minutes checking the code that you just wrote, while the algorithm is fresh in your mind, instead of doing it when a client is waiting for your answer? Catching mistakes in code is so much easier to do right after you wrote some code that it doesn’t make any sense to wait any longer.

Performing Regular Maintenance

The other maintenance technique you can adopt is to be proactive in everything you write. Sometimes there is no time to identify every problem in a program. However, it is still possible to spend at least a few minutes every day doing maintenance work. I would recommend to use any available down time to perform regular maintenance tasks, such as:

  • creating test cases
  • removing dead code
  • fixing code formatting
  • checking for simple mistakes
  • improving the build system

These are tasks that, over the lifetime of a project, will amount to a huge difference in terms of software quality. If only a few bugs can be removed by this simple process, it can result in a dramatic reduction in future maintenance costs.

Conclusion

Maintenance is considered to be an afterthought by many developers. However, instead of pushing this task to the future, it makes more sense to embrace maintenance activities during the life cycle of the project. It is easier to perform such changes when the code is still fresh in your mind. It is also much cheaper than doing this after a major bug has appeared in the application.

  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • E-mail this story to a friend!
  • HackerNews
  • Reddit
  • StumbleUpon
  • Twitter