Day 23: Isolate Code Dependencies

It is not a secret to experienced developers that most of the work in modern programming consists of interfacing application code with external libraries. For instance, if you work on application programming (as opposed to system development), there is no reason why you should be required to write another logging package or another data base access layer.

To reinforce this paradigm, the order of the day in the commercial world it to adopt languages with big libraries, and use these libraries as much as possible. This has been already referred to as “glue code programming”, and it is the reality of almost everyone working on software development.

There are advantages and disadvantages in this trend, which I will not consider here. Instead I would like to consider the issue of dependency management in application code that is written in such an environment.

One of the big problems we have when using libraries is that importing and calling them is easy, but each time a new library is added it creates a new dependency in the code base. Like with any other piece of code, having several external dependencies is bad for maintenance, especially if you expect the application to last for a long time.

Protect Yourself

One of the issues is that libraries change. Some of them may disappear for whatever reason (particularly if it is provided by another company). Other libraries are updated, which mean that a few methods are deprecated, a few others are added, and your client code has to adapt. This kind of changes happen everywhere, even in standard libraries, as has been the case of Java for example.

The more libraries you use in an application, the larger the potential for problems related to API changes and deprecations. A program that uses dozens of libraries may experience frequent changes. This is even more true if one uses a library that is still in heavy development, and subject to frequent changes.

Dependencies and Separate Compilation

Another issue may arise if you use languages with separate compilation such as C++: the compilation time required by any module is correlated to the number of libraries you’re incorporating. In C++ the process of library inclusion depends on adding header files. Because header files are traditionally a coarse grained method to include libraries, we may be required to compile many such files to gain access to just a small functionality from a single library.

For example, if we use a graphical library to draw shapes, we might want to draw only a rectangle using a RectangleShape object. However, C++ headers usually include everything provided by a library, including an EllipseShape or even a GLBufferedComplexShape. The issue is that it becomes too time consuming to compile libraries such as this one each time you need to use a small part of its functionality.

Using Library Interfaces

The good news is that there are techniques to avoid the unnecessary coupling between external libraries and application code. The techniques are simple and can be systematically applied to software written in any language. If you happen to use C++, however, there is the extra advantage of reducing compilation times for each translation unit.

The main idea is to create a wrapper class that localizes the usage of a library to a single file. In this way, it doesn’t matter how hard it is to use the external library, since we never really have to deal with it other than in the single class providing access to its functionality.

For example, the library mentioned above provides access to several shape classes, but all we need is a single class. In that case, the basic application of this technique is to create a class that provides access to all the the functionality required, while hiding everything else from the rest of the code base.

For instance, we could create a class called MySquareShape that is responsible for accessing the necessary methods from the shape library. At the same time, the class provides only the methods that the application is really using.

In the C++ world, this translates into a single translation unit that includes the original library header file. For example, if the header is called “shapelib.h”, no other file in the project will include it other than the one that provides the interface to ShapeLib.

Use Cases

The method described above can be used with any library. However, it makes little sense to create wrappers for common libraries that are used all the time, such as classes in the standard library. It takes some judgement to determine when it makes sense to use such a strategy or not.

The main advantage is that the code becomes more generic, in the sense that it doesn’t depend directly on a particular library to be available. It is always possible to redefine the wrapper class to use a different implementation or an alternate library if necessary.

A well known example of this method in use is provided by the Qt library. The designers of that library managed to reduce the dependency on a particular GUI library to a couple of files that could be replaced in different operating systems. The result is that it is relatively easy to port Qt applications to different windowing systems.

Conclusion

A big mistake that many programmers do is to create code that depends directly on libraries that are system-dependent. As a general rule, we should avoid creating code that may quickly become obsolete.

Among the techniques available for improved portability, the use of wrapper classes (or functions) has been used with success in several projects. Even if you decide to use some other method, by isolating code dependencies we can make help code base to respond more naturally to changes in the programming environment and reduce the unavoidable maintenance costs inherent to every programming project.

The Wasteful Legacy of Programming as Language

A few years ago I visited a friend who is a graduate student in linguistics. After some time he asked me if I was aware of the work by Chomsky on formal languages. I told him that yes, Chomsky work was a basis for much of the developments in theoretical computer science. More than that, I was glad to learn that there was something technical that I could share and discuss with other people in linguistics.

At the time I found this was just a great coincidence. It was only recently, though, that I started to think seriously about the implications of the idea that much of our understanding of computer programming stated with the study of human languages.

Language and Computers

One of the early ambitions in computer science was to understand, or at least be able to parse, human language. In order to do this, computer scientists explored models of how language worked in general. Based on these models, researchers figured out that we could classify languages in terms of complexity. Moreover, many computer languages could be entirely contained in some of the lower levels of this hierarchy, which is called the Chomsky hierarchy.

This was an incredible discovery for scientists working on compilers and interpreters, because its developments resulted in the necessary tools to efficiently parse languages. After that, instead of creating ad hoc parsers, which was the norm in the early period, one could apply a rigorous method to find a parser for a particular programming language.

The insidious side of this success, however, is that it completely changed the perceptions computer scientists have of how to give instructions to computers. We now have a frame of mind in which the important thing in the process of instructing a computer is to define a suitable language – and the more powerful the language the better.

The problem of this line of reasoning is that it creates a number of hurdles that are completely artificial, as we will soon discuss. For example, it makes us believe that there is something fundamentally different between programming languages, when in fact it doesn’t.

Computer Languages versus Human Languages

When we look at human languages, it does make sense that different languages exist. After all, language is a cultural concept. For example, as a speaker of two languages, it is very clear to me that speaking in English conveys different meaning and context, compared to things spoken in my mother language. There are a set of ideas that are better expressed in English, as well as there is a huge subset of thoughts that I wouldn’t be able to express well in anything other than my primary language.

As human beings, we want language to be inexact and able to convey several meanings. Contrary to the popular belief among mathematicians and engineers, this is a good feature of human languages, because it mimics the way we think. The truth about the study of language is that we are dealing with the whole culture of a community, not just with a stream of tokens that convey meaning.

The opposite, however, is true for computer languages. While we might strongly favor the use of human-like language, computers don’t really care if we use a bracket of a “begin” symbol to start a block of commands. Which means that we are, by introducing the subtleties of human language, complicating the task of how to instruct computers. Instead, we could just avoid the subtle differences in programming language and use a simple, uniform representation.

Unnecessary Work

The other problem with using current programming languages is that they create a lot of unnecessary work. Computer scientists have throughout the years devised baroque languages such as C++ and Java in an effort to provide programming instructions in a powerful way. It turns out, however, that the whole steps of writing a sequence of tokens, parsing that stream, and checking the syntax is unnecessary.

Everything in a computer program can be interpreted as a tree of expressions. If you’re encoding that tree in a block generated by a bracketed expression or a for loop it is not important for the computer. There is no reason why we cannot manipulate the syntax tree directly, for example, using a tree model on the screen, instead of parsing tokens used to recreate the tree every single time.

As a result, the use of a language is unnecessary in both sides: computers don’t care where the stream of bits is coming from. It could be coming from a direct binary representation or from a Java program with all variables in German.

Human beings, on the other hand, don’t necessarily need to use the language either. Although it may look like a useful feature, the existence of a language implies the manual labor of keeping a textual representation, as well as the algorithmic work of translating that representation back into a tree of expressions. The point is that, for all we care, we could just as well be moving boxes around the screen to represent variables — there is no necessity to create a textual document representing a program.

Divide and Conquer

Still, the other issue with programming languages is that it tends to divide programmers into artificial categories, one for each language or even flavor of a language. For example the only difference between a Java programmer and a C# programmer is that they use somewhat different keywords and have to program against a different run time library. The only reason C# and Java programmers think of themselves as different is that they use the concept of language to define themselves.

Being more polemic, the only difference between a C programmer and a Lisp programmer is in what their languages force them to do. A C programmer is forced by its language to write pointer-related code. While a Lisp programmer is forced to think in terms of lists of expressions (even when they’re using other data structures).

A language-less programming system, on the other hand, would avoid this kind of issue altogether by providing only the means to do work, without forcing programmers to use a pre-defined way of thinking. Whatever the form used to interact with the program, a language-less system doesn’t care if parts of the program are shaped as s-expressions or C blocks: it just combine them as needed to create a program.

In a program written in this way, using garbage collection is just a matter of plugging the right run time support, which you could get along with the system or you could buy from a third party. Instead, we erroneously consider garbage collection as a fundamental attribute of the language in use. As another example, in a language agnostic system we would would be able to create and call closures by just adding a simple module to the system, not by devising yet another language that supports closures as a first class citizen.

But the advantages don’t stop there. Without the need for a language, it also disappears the need for a compiler. Everything is such a system is already parsed and ready to use. When we combine pieces of code they already know how to glue themselves to each other, so there is no need for a linker either.

A system would regenerate a new executable in a fraction of a second, instead of going through the painful and wasteful process of language interpretation, code generation, linking. For example, what exactly is the need to recompile an entire file when we change only a single character? In a traditional language, that is entirely necessary, because you could be modifying the token “blass” to “class”, which changes entirely the meaning of the whole file. In a language-less system, this could never happen, because each transformation is related to the existing syntax tree, which is already stored in the system. Therefore, a single change like this can only have local effect in the program.

Why Such a System Doesn’t Exist?

There are several reasons, but the least important of them is technology. We have graphical capabilities in any modern desktop computer that are beyond what is necessary to implement the visualization features for such a programming environment. The main difficulty separating us from that goal is the experience we already have as programmers.

Each one of us have been trained to use textual languages as the main way of interact with computers. We have learned to use associated tools, such as grep and make, that are imperfect but that give good results if you are disciplined.

Moreover, it is satisfying to use the huge amount of code available with the existing technologies. Everyone of us have been conditioned for many years to look at what has been done in our programming language, and even reuse the code if needed.

I think that this is the main reason why visual languages have had such a hard time to establish themselves. After using such a language for a few minutes, you immediately feel that it is so much easier to go back and write yet another program in C or Java or Python. Of course, it will always be much more productive to use a tool that you’re used to.

However, we have never been able to really improve a language-less system enough to see their advantages as compared to traditional language-based systems. I believe that there will be a big jump in programming quality the day a group of people decide to develop a true language-agnostic programming system. And when programmers in general learn that they don’t need a language to create valid code they will think that we were just crazy for using, for such a long time, a concept as wasteful as that of programming language.

Photo credit: commons.wikimedia.com

Main Advantages of Typed Languages

The concept of data types is one of the most powerful in the study of programming languages. Many languages have been devised with the main goal of delivering a type system with improved capabilities. Some of these languages coming from research labs, such as ML and Haskell, have become successful in their own right. Other languages have tried to emulate some of the techniques developed there.

Early languages such as Fortran had a type system that was composed of very few basic types. In a sense, this mimicked the available support on assembly languages, which have commonly provided only access to integer, floating point, and character data types. However, a large number of extensions to these simple type systems have been proposed in later languages.

For example, languages such as C have since the beginning supported a type system that is a little more sophisticated than the one provided by Fortran. With support for structures, enumerations and unions, C has provided at least a feel of the options now available on higher level languages. Such facilities allowed programmers to create more complex software than it was possible with the earlier generation of languages.

Fundamental Advantages

There are a few reasons why so much effort has been spent, both in terms of research time as well as compiler development time, to improve type checking in modern programming languages. Chief among them is the idea that it is better to catch errors in a program as soon as possible. When the compiler is able to verify the correct use of types, a whole set of errors is immediately removed from the resulting program. All operations that read and modify data of a particular type are known to be valid, because the compiler had the opportunity to check the correctness of that operation.

Although this may not seem such a big result, the possibility of checking types open the doors to other guarantees that are difficult to enforce without a type system. For example, a classic application of user-defined data types is to guarantee that data is handled in a uniform way by the system. For example, an Employee data type can be used to maintain all accesses to employee-related data contained to a single object in memory. In this way, it is easy to verify the name of the employee for which the salary is being modified. Without type checking of user-defined types, it is difficult to enforce that related data is stored in the same object.

In addition, more recent languages also provide mechanisms to encapsulate related data, so that only a small number of procedures can have access to the memory object. For example, in C++ it is possible to define a member variable as private or protected, so that only member functions in a particular class (or derived classes in the case of protected members) can have proper permission to access that data. That level of protection can be enforced because of type checking provided by the C++ compiler.

Refactoring

Avoiding conversion problems at compile time is one of the advantages of type systems, but it is not the only one. Another advantage is the possibility of using automated tools more efficiently to perform such tasks as refactoring or static analysis, for instance.

Refactoring is a good example of feature that is much easier to support in the presence of a type system. The basic idea of refactoring is to provide program transformations that simplify the code, while maintaining its operation unchanged.

In a typed language it is much easier for automated tools to determine the kinds of operations that would maintain the run-time and compile-time properties of an existing program. Since the type system requires particular properties from each operation in the target code, it is relatively safe to perform simplifying operations that maintain these properties.

On the other hand, non-typed languages have a disadvantage in this respect. Due to the dynamic nature of their expressions, it is hard for an automated tool to determine the exact result of some expressions. For example, an expression in a dynamic language can legitimately return objects of different types each time it is executed. What this means is that the refactoring tool may not guarantee that a transformation is valid for all inputs.

For example, changing the type of an argument passed to a function is a simple transformation in a typed language. In a dynamic language, on the other hand, it becomes hard to determine what needs to be done when the modified function is called. Only manual checking (or a well constructed set of automated tests) can determine if the resulting program is valid.

Conclusion

Typed languages have been the subject of research in computer science for more than four decades. Such languages provide the great advantage of improved support for automated checking of code. While the standard checking allowed by simple types is limited, much more sophisticated strategies can be devised when such a system is applied to user-defined data types.

We just scratched the surface of the possibilities here, but it is useful to become aware of some of the advantages provided by programming languages such as Haskell and ML, which provide a well designed type system, and in a lesser extent to languages such as C++ and Java, which provide some of the features and advantages provided by the concept of type checking.