Literate Programming with Plain C Files

Literate programming is the methodology for software development proposed by Don Knuth and used by himself to build the TeX system for document preparation. TeX is the system used as the basis for LaTeX, which is itself used by everyone in science and mathematics to write technical documents.

The original version of literate programming was developed in a dialect of the Pascal language and the system was called web (notice that at the time there was no such thing as the World Wide Web). Since then, Knuth has moved on to use C as his main language. The resulting literate programming system was improved and renamed to cweb.

Literate Programming

The main advantage of literate programming is that it allows the smooth integration of programming and documentation. Cweb programs are composed of documents that mix a description of the program with C code that implement what is being explained. This document is later on processed to create a source file as well as a documentation file.

Literate programming provides a paradigm shift in software development because code becomes secondary to the main description of a concept. The explanation of what a program does is what drives the programmer through a literate programming document. In a sense, source code in C (or any other language used by the system) exists only to illustrate what is is being discussed in the documentation part.

On the other hand, literate programming provides all tools necessary to put the coding sections together in the right order to create full-featured programs. Instead of programming according to the rigid sequence required by the C syntax, one can add new instructions to the code in a style that is more consistent with the general concepts, rather than having to worry about exact placement and with particular file locations.

Using cweb on Standard Projects

One of the critiques of literate programming is that one needs to apply it from the start to benefit from the possible advantages of this coding style. However, most existing code has been developed in a non-literate way, that is, it is code that follows the strict sequence dictated by language compilers. All established companies have a huge investment in non-literate code that is being used in production, and it is not realistic to believe that such code would be rewritten for the sake of a new programming methodology.

Consequently, if one were to convert from normal programming conventions to literate programming than they would need to rewrite a huge number of C, Java and other language-specific files. And notice that all this would be necessary just to start programming with literate programming tools such as Cweb.

One of the things that people don’t know, however, is that it is not strictly necessary to use separate literate programming files to create literate programs. Although tools such as cweb expect to read a file that is a mix of documentation and code, it is possible to trick the cweb tools to use C source code just as well, after a few modifications have been applied. That is the goal of the script called “CwebNoTangle” that I wrote some time ago.

CWebNoTangle script

The main idea behind the tool is that, if we intend to start using literate programming without rewriting the existing software, the first step is to avoid using ctangle altogether. CTangle is the part of cweb that reads literate programming files and convert them into C source files. Most of its work is concerned with putting together C fragments in the order that the C compiler expect them to appear.

For example, in C the header files for a source file are generally included in the top of the file. However, a literate program doesn’t need to define such header files at a particular point. Instead, it is possible to add header files at any location when they are first needed.

However, we just need to notice that when using a straight C file as source for the literate tool, it is not necessary to perform an independent reordering step. Instead, the source file is already in the order expected by the compiler. This means that the ctangle tool is not strictly needed to create a syntactically correct source file.

That’s why I called the set of scripts CWebNoTangle: they allow the use of cweb to create full documentation for software, without the need to use ctangle for creating C source files.

You can take a look at how the system works on this github project [1]. The code is written as normal C with extended comments, which store documentation that can be easily translated into TeX commands.

Conclusion

Literate Programming is a great tool for creating software that is easy to understand and modify. Traditional literate programming tools are hard to integrate with existing environments, however, because they require code to be written in a different format.

It is possible, however, to use normal C files as input for both source and documentation. A simple example of how to do this is provided in my CwebNoTangle script. You can use a similar system or create your own, but it is nice to see that there is a simple transition between traditional programming methods and literate programming.

References

[1] https://github.com/coliveira/Hyperbolic/blob/master/cwebnotangle
[2] Image credit: commons.wikimedia.org

Similar Posts:

About the Author

Carlos Oliveira holds a PhD in Systems Engineering and Optimization from University of Florida. He works as a software engineer, with more than 10 years of experience in developing high performance, commercial and scientific applications in C++, Java, and Objective-C. His most Recent Book is Practical C++ Financial Programming.

2 Responses to “Literate Programming with Plain C Files”

  1. Hey Carlos,

    of course you are right that investments lie in plain code that is documented in-place or somewhere else. I think its a nice start for a project to be able to weave plain source code files.

    But I want to emphasize, that’s not only the weaved output that comes from literate programming, but a whole new way to think about source code structure.

    It would help if at least a part of the industry would start thinking of programs as documents that are readable and indexed in a way, we are used to by scientific literature.

    Best regards,

    Ingo Krabbe

    PS.: I assume you know about that cweb feature, to order programs in named paragraphs of code, but a reader of your nice article should know about that too!

    By Ingo Krabbe on Jun 1, 2015

  2. Hi Ingo,
    Yes, I know that reordering the sequence of code is one of the main advantages of cweb. However, I think some projects are not ready to start with this radical change in how software is organized. What I proposed is a little step towards the larger goal of improving code documentation.

    By coliveira on Jul 4, 2015

Post a Comment