Day 23: Isolate Code Dependencies

It is not a secret to experienced developers that most of the work in modern programming consists of interfacing application code with external libraries. For instance, if you work on application programming (as opposed to system development), there is no reason why you should be required to write another logging package or another data base access layer.

To reinforce this paradigm, the order of the day in the commercial world it to adopt languages with big libraries, and use these libraries as much as possible. This has been already referred to as “glue code programming”, and it is the reality of almost everyone working on software development.

There are advantages and disadvantages in this trend, which I will not consider here. Instead I would like to consider the issue of dependency management in application code that is written in such an environment.

One of the big problems we have when using libraries is that importing and calling them is easy, but each time a new library is added it creates a new dependency in the code base. Like with any other piece of code, having several external dependencies is bad for maintenance, especially if you expect the application to last for a long time.

Protect Yourself

One of the issues is that libraries change. Some of them may disappear for whatever reason (particularly if it is provided by another company). Other libraries are updated, which mean that a few methods are deprecated, a few others are added, and your client code has to adapt. This kind of changes happen everywhere, even in standard libraries, as has been the case of Java for example.

The more libraries you use in an application, the larger the potential for problems related to API changes and deprecations. A program that uses dozens of libraries may experience frequent changes. This is even more true if one uses a library that is still in heavy development, and subject to frequent changes.

Dependencies and Separate Compilation

Another issue may arise if you use languages with separate compilation such as C++: the compilation time required by any module is correlated to the number of libraries you’re incorporating. In C++ the process of library inclusion depends on adding header files. Because header files are traditionally a coarse grained method to include libraries, we may be required to compile many such files to gain access to just a small functionality from a single library.

For example, if we use a graphical library to draw shapes, we might want to draw only a rectangle using a RectangleShape object. However, C++ headers usually include everything provided by a library, including an EllipseShape or even a GLBufferedComplexShape. The issue is that it becomes too time consuming to compile libraries such as this one each time you need to use a small part of its functionality.

Using Library Interfaces

The good news is that there are techniques to avoid the unnecessary coupling between external libraries and application code. The techniques are simple and can be systematically applied to software written in any language. If you happen to use C++, however, there is the extra advantage of reducing compilation times for each translation unit.

The main idea is to create a wrapper class that localizes the usage of a library to a single file. In this way, it doesn’t matter how hard it is to use the external library, since we never really have to deal with it other than in the single class providing access to its functionality.

For example, the library mentioned above provides access to several shape classes, but all we need is a single class. In that case, the basic application of this technique is to create a class that provides access to all the the functionality required, while hiding everything else from the rest of the code base.

For instance, we could create a class called MySquareShape that is responsible for accessing the necessary methods from the shape library. At the same time, the class provides only the methods that the application is really using.

In the C++ world, this translates into a single translation unit that includes the original library header file. For example, if the header is called “shapelib.h”, no other file in the project will include it other than the one that provides the interface to ShapeLib.

Use Cases

The method described above can be used with any library. However, it makes little sense to create wrappers for common libraries that are used all the time, such as classes in the standard library. It takes some judgement to determine when it makes sense to use such a strategy or not.

The main advantage is that the code becomes more generic, in the sense that it doesn’t depend directly on a particular library to be available. It is always possible to redefine the wrapper class to use a different implementation or an alternate library if necessary.

A well known example of this method in use is provided by the Qt library. The designers of that library managed to reduce the dependency on a particular GUI library to a couple of files that could be replaced in different operating systems. The result is that it is relatively easy to port Qt applications to different windowing systems.

Conclusion

A big mistake that many programmers do is to create code that depends directly on libraries that are system-dependent. As a general rule, we should avoid creating code that may quickly become obsolete.

Among the techniques available for improved portability, the use of wrapper classes (or functions) has been used with success in several projects. Even if you decide to use some other method, by isolating code dependencies we can make help code base to respond more naturally to changes in the programming environment and reduce the unavoidable maintenance costs inherent to every programming project.

  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • E-mail this story to a friend!
  • HackerNews
  • Reddit
  • StumbleUpon
  • Twitter

The Wasteful Legacy of Programming as Language

A few years ago I visited a friend who is a graduate student in linguistics. After some time he asked me if I was aware of the work by Chomsky on formal languages. I told him that yes, Chomsky work was a basis for much of the developments in theoretical computer science. More than that, I was glad to learn that there was something technical that I could share and discuss with other people in linguistics.

At the time I found this was just a great coincidence. It was only recently, though, that I started to think seriously about the implications of the idea that much of our understanding of computer programming stated with the study of human languages.

Language and Computers

One of the early ambitions in computer science was to understand, or at least be able to parse, human language. In order to do this, computer scientists explored models of how language worked in general. Based on these models, researchers figured out that we could classify languages in terms of complexity. Moreover, many computer languages could be entirely contained in some of the lower levels of this hierarchy, which is called the Chomsky hierarchy.

This was an incredible discovery for scientists working on compilers and interpreters, because its developments resulted in the necessary tools to efficiently parse languages. After that, instead of creating ad hoc parsers, which was the norm in the early period, one could apply a rigorous method to find a parser for a particular programming language.

The insidious side of this success, however, is that it completely changed the perceptions computer scientists have of how to give instructions to computers. We now have a frame of mind in which the important thing in the process of instructing a computer is to define a suitable language – and the more powerful the language the better.

The problem of this line of reasoning is that it creates a number of hurdles that are completely artificial, as we will soon discuss. For example, it makes us believe that there is something fundamentally different between programming languages, when in fact it doesn’t.

Computer Languages versus Human Languages

When we look at human languages, it does make sense that different languages exist. After all, language is a cultural concept. For example, as a speaker of two languages, it is very clear to me that speaking in English conveys different meaning and context, compared to things spoken in my mother language. There are a set of ideas that are better expressed in English, as well as there is a huge subset of thoughts that I wouldn’t be able to express well in anything other than my primary language.

As human beings, we want language to be inexact and able to convey several meanings. Contrary to the popular belief among mathematicians and engineers, this is a good feature of human languages, because it mimics the way we think. The truth about the study of language is that we are dealing with the whole culture of a community, not just with a stream of tokens that convey meaning.

The opposite, however, is true for computer languages. While we might strongly favor the use of human-like language, computers don’t really care if we use a bracket of a “begin” symbol to start a block of commands. Which means that we are, by introducing the subtleties of human language, complicating the task of how to instruct computers. Instead, we could just avoid the subtle differences in programming language and use a simple, uniform representation.

Unnecessary Work

The other problem with using current programming languages is that they create a lot of unnecessary work. Computer scientists have throughout the years devised baroque languages such as C++ and Java in an effort to provide programming instructions in a powerful way. It turns out, however, that the whole steps of writing a sequence of tokens, parsing that stream, and checking the syntax is unnecessary.

Everything in a computer program can be interpreted as a tree of expressions. If you’re encoding that tree in a block generated by a bracketed expression or a for loop it is not important for the computer. There is no reason why we cannot manipulate the syntax tree directly, for example, using a tree model on the screen, instead of parsing tokens used to recreate the tree every single time.

As a result, the use of a language is unnecessary in both sides: computers don’t care where the stream of bits is coming from. It could be coming from a direct binary representation or from a Java program with all variables in German.

Human beings, on the other hand, don’t necessarily need to use the language either. Although it may look like a useful feature, the existence of a language implies the manual labor of keeping a textual representation, as well as the algorithmic work of translating that representation back into a tree of expressions. The point is that, for all we care, we could just as well be moving boxes around the screen to represent variables — there is no necessity to create a textual document representing a program.

Divide and Conquer

Still, the other issue with programming languages is that it tends to divide programmers into artificial categories, one for each language or even flavor of a language. For example the only difference between a Java programmer and a C# programmer is that they use somewhat different keywords and have to program against a different run time library. The only reason C# and Java programmers think of themselves as different is that they use the concept of language to define themselves.

Being more polemic, the only difference between a C programmer and a Lisp programmer is in what their languages force them to do. A C programmer is forced by its language to write pointer-related code. While a Lisp programmer is forced to think in terms of lists of expressions (even when they’re using other data structures).

A language-less programming system, on the other hand, would avoid this kind of issue altogether by providing only the means to do work, without forcing programmers to use a pre-defined way of thinking. Whatever the form used to interact with the program, a language-less system doesn’t care if parts of the program are shaped as s-expressions or C blocks: it just combine them as needed to create a program.

In a program written in this way, using garbage collection is just a matter of plugging the right run time support, which you could get along with the system or you could buy from a third party. Instead, we erroneously consider garbage collection as a fundamental attribute of the language in use. As another example, in a language agnostic system we would would be able to create and call closures by just adding a simple module to the system, not by devising yet another language that supports closures as a first class citizen.

But the advantages don’t stop there. Without the need for a language, it also disappears the need for a compiler. Everything is such a system is already parsed and ready to use. When we combine pieces of code they already know how to glue themselves to each other, so there is no need for a linker either.

A system would regenerate a new executable in a fraction of a second, instead of going through the painful and wasteful process of language interpretation, code generation, linking. For example, what exactly is the need to recompile an entire file when we change only a single character? In a traditional language, that is entirely necessary, because you could be modifying the token “blass” to “class”, which changes entirely the meaning of the whole file. In a language-less system, this could never happen, because each transformation is related to the existing syntax tree, which is already stored in the system. Therefore, a single change like this can only have local effect in the program.

Why Such a System Doesn’t Exist?

There are several reasons, but the least important of them is technology. We have graphical capabilities in any modern desktop computer that are beyond what is necessary to implement the visualization features for such a programming environment. The main difficulty separating us from that goal is the experience we already have as programmers.

Each one of us have been trained to use textual languages as the main way of interact with computers. We have learned to use associated tools, such as grep and make, that are imperfect but that give good results if you are disciplined.

Moreover, it is satisfying to use the huge amount of code available with the existing technologies. Everyone of us have been conditioned for many years to look at what has been done in our programming language, and even reuse the code if needed.

I think that this is the main reason why visual languages have had such a hard time to establish themselves. After using such a language for a few minutes, you immediately feel that it is so much easier to go back and write yet another program in C or Java or Python. Of course, it will always be much more productive to use a tool that you’re used to.

However, we have never been able to really improve a language-less system enough to see their advantages as compared to traditional language-based systems. I believe that there will be a big jump in programming quality the day a group of people decide to develop a true language-agnostic programming system. And when programmers in general learn that they don’t need a language to create valid code they will think that we were just crazy for using, for such a long time, a concept as wasteful as that of programming language.

Photo credit: commons.wikimedia.com

  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • E-mail this story to a friend!
  • HackerNews
  • Reddit
  • StumbleUpon
  • Twitter

Main Advantages of Typed Languages

The concept of data types is one of the most powerful in the study of programming languages. Many languages have been devised with the main goal of delivering a type system with improved capabilities. Some of these languages coming from research labs, such as ML and Haskell, have become successful in their own right. Other languages have tried to emulate some of the techniques developed there.

Early languages such as Fortran had a type system that was composed of very few basic types. In a sense, this mimicked the available support on assembly languages, which have commonly provided only access to integer, floating point, and character data types. However, a large number of extensions to these simple type systems have been proposed in later languages.

For example, languages such as C have since the beginning supported a type system that is a little more sophisticated than the one provided by Fortran. With support for structures, enumerations and unions, C has provided at least a feel of the options now available on higher level languages. Such facilities allowed programmers to create more complex software than it was possible with the earlier generation of languages.

Fundamental Advantages

There are a few reasons why so much effort has been spent, both in terms of research time as well as compiler development time, to improve type checking in modern programming languages. Chief among them is the idea that it is better to catch errors in a program as soon as possible. When the compiler is able to verify the correct use of types, a whole set of errors is immediately removed from the resulting program. All operations that read and modify data of a particular type are known to be valid, because the compiler had the opportunity to check the correctness of that operation.

Although this may not seem such a big result, the possibility of checking types open the doors to other guarantees that are difficult to enforce without a type system. For example, a classic application of user-defined data types is to guarantee that data is handled in a uniform way by the system. For example, an Employee data type can be used to maintain all accesses to employee-related data contained to a single object in memory. In this way, it is easy to verify the name of the employee for which the salary is being modified. Without type checking of user-defined types, it is difficult to enforce that related data is stored in the same object.

In addition, more recent languages also provide mechanisms to encapsulate related data, so that only a small number of procedures can have access to the memory object. For example, in C++ it is possible to define a member variable as private or protected, so that only member functions in a particular class (or derived classes in the case of protected members) can have proper permission to access that data. That level of protection can be enforced because of type checking provided by the C++ compiler.

Refactoring

Avoiding conversion problems at compile time is one of the advantages of type systems, but it is not the only one. Another advantage is the possibility of using automated tools more efficiently to perform such tasks as refactoring or static analysis, for instance.

Refactoring is a good example of feature that is much easier to support in the presence of a type system. The basic idea of refactoring is to provide program transformations that simplify the code, while maintaining its operation unchanged.

In a typed language it is much easier for automated tools to determine the kinds of operations that would maintain the run-time and compile-time properties of an existing program. Since the type system requires particular properties from each operation in the target code, it is relatively safe to perform simplifying operations that maintain these properties.

On the other hand, non-typed languages have a disadvantage in this respect. Due to the dynamic nature of their expressions, it is hard for an automated tool to determine the exact result of some expressions. For example, an expression in a dynamic language can legitimately return objects of different types each time it is executed. What this means is that the refactoring tool may not guarantee that a transformation is valid for all inputs.

For example, changing the type of an argument passed to a function is a simple transformation in a typed language. In a dynamic language, on the other hand, it becomes hard to determine what needs to be done when the modified function is called. Only manual checking (or a well constructed set of automated tests) can determine if the resulting program is valid.

Conclusion

Typed languages have been the subject of research in computer science for more than four decades. Such languages provide the great advantage of improved support for automated checking of code. While the standard checking allowed by simple types is limited, much more sophisticated strategies can be devised when such a system is applied to user-defined data types.

We just scratched the surface of the possibilities here, but it is useful to become aware of some of the advantages provided by programming languages such as Haskell and ML, which provide a well designed type system, and in a lesser extent to languages such as C++ and Java, which provide some of the features and advantages provided by the concept of type checking.

  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • E-mail this story to a friend!
  • HackerNews
  • Reddit
  • StumbleUpon
  • Twitter

Learning from Great Developers

One of the most common questions about software development is how to become more productive as a programmer. It is well known that there is a big difference in productivity between mediocre programmers and great programmers, although this is still a little understood phenomenon. For example, there has been anecdotal evidence that the amount of work a great programmer can produce in a week cannot be matched by a mediocre one even in a year’s worth of effort.

So, what is the source of the difference? In general, it includes everything from intellectual capacity to focus, but there is usually a small number of common elements that can be observed in the work of every good developer. It is helpful to try to incorporate such elements to our normal workflow as much as possible. Here I talk about three practices that I have commonly observed on highly skilled developers.

Doing the right thing first

A great foundation for creating correct software involves the avoidance of cargo-cult: that is, the practice of doing things just because they seem to work on the surface. Smart people don’t do something because somebody else told them to do, or because it seems to work most of the time.
Excellent programmers go the extra mile to understand how something works.

This practice can be generalized to any aspect of life, but it is absolutely essential in software development. For example, the cause for most software bugs usually lies hidden in something that was not well understood by the programmer when the code was initially written.

People who don’t look for the root causes of what they are doing are prone to development that uses cookie-cut strategies, including the following: using copy-and-paste from online forums as their primary way of solving problems; incorporating code generation tools that create templates that seem correct, without understanding why and how they work; using libraries and frameworks that are unnecessary just because they are employed by existing code.

The best way to avoid this issue is to make sure you understand what you are doing at every step of software creation. Given the complexity of today’s architecture it is true that complete knowledge is very hard or maybe impossible to achieve. However, good programmers strive to learn at least the essential facts about everything they are doing.

Being proactive in improving the system organization

Another issue that is very important is the way your organize software. Great professionals take care not only of how the current work is performed, but also how past work is organized. They continuously look for methods that can improve the existing code base in a way that will make future decisions easier to make.

There are several approaches that one can take in this direction. One is to use refactoring tools, which are quickly becoming so common on most IDEs. Modern refactoring tools can help turn spaghetti code into reasonably readable code with little work. And using automated testing tools can make the whole process even simpler.

Another suggestion is to allocate time to fix small issues, including things like formatting, proper commenting, and related problems. Although such small things don’t seem to be a high priority issue, they have the potential to turn a code base into a unmanageable mess if left without proper attention. Good programmers know that this kind of effort can reap dividends almost immediately, due to the improved maintainability of the resulting code.

Development Speed

Another overlooked factor in programmer productivity is speed. Usually people equate quick effort with solutions that were not well designed. While it takes time to do the right thing, it is also important to proceed at full speed when a good solution has been decided upon.

There are several factors that affect programming speed. Most of them have to do with badly managed programming environments. For example, some projects require approval even for small changes in a code base. This kind of bureaucracy can yield bad results for the general quality of a software project.

Another issue that may impair development speed is the use of wrong or outdated tools. For example, there is no excuse for failing to use a modern source control application. We currently have so many high quality options such as subversion, git, and mercurial, and they are all free. A good developer should be able to just pick the best tools available and use them as needed.

To progress quickly it is also essential to have fast feedback. This is usually achieved by the use of testing tools, such as unit testing frameworks and libraries. These days, there are unit testing frameworks for every major programming language, so improving in this aspect should be a matter of simply designing tests, and applying them in a way that exercises the functionality you’re implementing or modifying.

Conclusion

Great developers are hard to find, but everyone can learn important lessons from their practices. I have observed throughout my career that good developers are extremely concerned with doing things for the right reasons, by understanding as much as they can about the problem and the language of implementation. They are also very proactive about organizing the system for optimum work performance. Finally, they are able to produce high quality work in a speedy manner, and they use whatever tools are necessary to achieve such results.

Image credit: commons.wikimedia.org

  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • E-mail this story to a friend!
  • HackerNews
  • Reddit
  • StumbleUpon
  • Twitter

Day 22: Write Something Completely Different

Software development is a very specialized area. It often enables the creation of strict specialities, where a programmer spends his whole time learning the intricacies of a particular language and computer environment and employing that knowledge to particular applications.
While it is true that most of us go to school and learn a common introductory computer language, it is very unlikely that jobs will be available for the exact language you learned during those years.

For example, Pascal was the language used to teach programmers for a long time. I was one of the students that had his first programming experience with Pascal. Despite this, as one of the consequence of the increasing specialization in software development, nowadays hardly anyone writes code in Pascal anymore. All the “real” programming is performed in languages that are reasonably good at exploring the existing computational environment and supported by major manufacturers of PCs, Macs, and mobile devices.

When we look at job descriptions, the specialization of the profession becomes even more impressive. Each job requires a different subset of skills that can only be acquired with years of job experience. All of this leads to a scenario where people feel compelled to spend a long time working with the same tools and libraries.

Specialization versus Exploration

One of the main consequence of high specialization in software engineering is that it is nowadays possible that you spend 99% of your working time writing code in a single language, for a single computer system.

While this may lead to an increase in productivity, in the long term it may also be harmful to your personal development as a engineer. It is not only useful, but also desirable that we have more varied experiences. The current trend in specialization may lead to more productivity in the near future, but it may also impair the development of equality important skills, such as an appreciation of system architectures in general, the a wide knowledge of different systems and a familiarity with alternative solutions that might just as well be useful to the problems you’re solving.

Boredom and burn out are frequently an adverse result of overspecialization. Under some circumstances it becomes increasingly hard to focus on a single aspect of the system. Good software engineers understand that this means they need to look at something different to improve their careers.

How to Embrace Diversity

There are no rules as to what you can do to break your common patterns. Just look for things that entice your imagination and jump at the opportunity of increasing your development possibilities. Here are a few suggestions that may help you, though.

  • Learn a language that provides a different programming paradigm. For example, if you are good at object oriented programming, try to learn a functional language. This will help you see problems in a different way. Certain programming languages are adequate for a particular application domain, so making this change can also help you to understand these different problem domains as well.
  • Investigate new programming environments. This is a good way to learn how things can be done differently from what you’re used to. For example, if you are used to graphical IDEs that can do everything with a mouse click, try to get your hand into a command line based environment. You will certainly feel you’re unproductive at the beginning, but the idea is to learn different tricks from command line environments that can be translated into your own programming workflow.
  • Experiment with different application areas. Sometimes we’re so used to our particular domain that considering a different area may provide a lot of insights into the way we work. For example, if your area is web programing, try to create something that requires number crunching with floating-point libraries. It is certainly a nice challenge that may help you in learning new concepts. On the other hand, someone currently working with scientific computing could look into creating a basic GUI-based application, or even a simple game.

As you see, the possibilities are limitless. Spending a few hours in such an exercise will not transform you in a specialist in a new area of programming, but will give you a few insights on how programming works in other domains. Then, if one day you have a similar problem in your hands, the solution will be much easier to figure out based on the concepts that you grasped during the exploration period.

Conclusion

Software developers don’t need to spend their whole time solving a narrow class of problems. In fact, it is sometimes just as helpful to look into different directions, which can provide a renewed focus and fresh ideas to be applied at your main goal.

Using exploratory programming exercises can be a great tool to improve your programming skills, even if you’re only using these concepts for small time periods. The general ideas that you will absorb can be applied in many areas and benefit current projects as well as future work.

Image credit: commons.wikimedia.org

  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • E-mail this story to a friend!
  • HackerNews
  • Reddit
  • StumbleUpon
  • Twitter