The Beauty of Short Methods

Writing short methods is one of the pieces of conventional wisdom that have been passed around as a form of agile technique. Many programers adhere to the use of short methods without thinking too much about it, while others disregard the idea altogether, more concerned with the practical activity of creating usable code.

There is, however, a number of surprising reasons why writing short methods can improve the quality of software. Some of these reasons have nothing to with just writing auto-commenting code, as some people describe it. One of the main issues is the distinction between the static description of code provided by a set of classes, as compared with the dynamic and complex view provided by algorithms.

Dynamic Properties of Algorithms

When we start on Computer Science at college, we discover a new world of scientific pursuit in the area of computational systems. Students are then presented to algorithms, and taught that this is the foundations of programming. We are lead to believe that using correct algorithms is necessary and sufficient to write good software.

There is no question that algorithms are a foundational concept, but they give programmers the false sense that explicitly writing an algorithm is the best way of solving complex problems in programming. Throughout my experience as a software engineer, I have found that combining small methods from several classes and libraries is a far more common way of solving problems – a technique that covers the huge majority of programming tasks available nowadays.

Despite all the ideas about the simplicity and elegance of small methods, I believe the main reason to use them is to transform the complex dynamic nature of algorithms into a static description of a solution in terms of classes and methods.

Complex Algorithmic Methods

The main advantage and liability of thinking in algorithms is that you learn to reason in terms of complex logic. It is true that, after you really understand an algorithm, its logic become clear enough that you may use it in several other contexts, as with any other mathematical concept.

There is a problem with this, however: whole algorithms are not so easy to recognize at first sight, especially by someone who hasn’t been educated in computer science. Even if you have good training, it is not always clear that an algorithm can be recognized when it is used in the context of a larger method. If that is the case, a programmer needs to document that a particular algorithm is being used, and the reader needs to believe that the implementation is correct.

Also, it is not easy to verify that an algorithm implemented as a large method is correct. This is a big problem encountered when moving from algorithms in a classroom setting to their concrete implementation. While in a classroom, an algorithm can be proved as a mathematical entity. It can be shown as correct by simply proving its mathematical properties.

However, when an algorithm is converted into software, all kinds of mundane issues happen that directly affect its result. For example, Knuth once mentioning that a very small percentage of implementations of a particular quick sorting algorithm appearing in the literature are actually correct.

Small Functions as a Vehicle to Algorithms

Small methods are a much more fruitful way to express complex algorithms, because they have an important property: they are able to convert the dynamic world of algorithms into the static world of classes and their associated structures. This is not a small feature, and it is the root of the power of the Smalltalk approach.

In a Smalltalk program, every method is just a few lines long. Incidentally, that is why they have such a simplistic editor that can display only a small number of lines. When one sees this kind of program, there is the feeling that every method does very little. However, this allows a number of important features that are core to the philosophy promoted by the language.

First, it is very easy to determine how a single method works. The two-to-five-lines rule used when creating new methods makes it simple enough to understand their contents just by inspection – unlike anything you could do with long methods or functions. As an example, a number of static analyzers for languages such as Java and Objective-C fail when a method is too long. If a computer has problems when considering the branching possibilities in large methods, imagine how difficult it is for humans!

As another advantage, all the complexity of an implementation can been moved from the leaves of the system (the methods’ contents) to the internal nodes composed of classes and their methods. As a result, there is a large number of self documenting methods that can be analyzed by just looking at a class diagram or class browser. This is exactly the reason why the class browser is the standard tool in Smalltalk systems.

Finally, all the dynamic aspects of a program are stored in the code either using polymorphism (which further reduces the complexity of each individual method) or as part of the methods themselves, using selection and repetition structures. However, since these control structures are used in very small pieces of code, the resulting interactions are also very easy to understand.

Small Methods in Traditional Languages

Small methods can be a powerful weapon when trying to reduce the complexity inherent to software programming. Looking in this way, it is hard to understand why this kind of methodology is not more widespread in the programming field.

The first reason is the belief that programming with short functions/methods is bad from a performance standpoint. This kind of view is less relevant each year due to the increasing performance of computer systems, but is still around. The best answer is that doing otherwise is a premature optimization, and code quality should be the number one priority for a software engineer.

Another valid reason is the increase in class complexity when a large number of small functions or methods are used. This can justify the idea of using longer methods and a smaller number of methods per class. The solution for this problem, however, is already in place with modern IDEs. Each on of them allow programmers to traverse the hierarchy of classes in a system, most frequently using graphical tools to simplify the task.

When using this kind of system, it is important to emphasize the complexity shift from single methods to the whole class, while moving from analyzing the contents of a method to looking at class names and connections between classes. Modern IDEs also provide tools for moving methods around, using refactoring techniques that have been significantly improved during the last decade.

The good news is that, although Smalltalk is built around the idea of very short methods, it is not necessary to use Smalltalk to benefit from that concept. It is nowadays possible to use the same techniques in any object oriented language. In fact, it is possible to do this even on traditional languages such as C or Pascal, although this will result in losing the benefits of polymorphism and encapsulation – which can even be a reasonable tradeoff depending on the type of application being written.

Five Common Mistakes in the Early Design Stage

Creating great software requires attention to several details that can make the difference between a well-designed system in one hand, and a failure in terms of features and maintainability in another. Every software engineer has a big list of elements that are necessary for a successful project and that should be part of the work cycle no matter what happens in the team.

In this post, however, I would like to emphasize a few mistakes that are so common in this field, but that keep repeating themselves due to bad management practices. It would be great to maintain this list in mind, since this kind of information can help us make better decisions whenever we are faced with similar situations.

Solving too much of a problem

A lot of failures in software design have to do with a lack of clear focus on a particular area. For example, inexperienced programmers and managers try to work on more than they can easily accomplish. The main cause for the resulting failure is trying to solve a big problem all at once.

The reason why it is so much better to focus on a particular goal has to do with the nature of problem solving: it is easier to solve several smaller problems than finding a single solution for a big one. When human beings try to handle complex details they get overwhelmed — and this happens even with smart people. In fact, a great part of being smart is to learn how to break problems into smaller pieces that can be easily solved.

Similarly, when developers try to create a program with too many responsibilities, the result is usually less satisfying. To avoid this pitfall, try to break problems into parts that can be easily solved independently. Then, come up with a way to combine the solution so that it becomes transparent for users. As most problems in software design, there are several ways to achieve the same result. Experience can teach you the solutions that work better over time. Just become aware of the complexity of each subproblem you’re trying to solve, and try to reduce the complexity by breaking up the problem if necessary.

Solving too little of the problem

While this is far less common that the previous problem, it also happens that systems may be underpowered for the problem they aim to solve. For example, they may miss critical steps that are necessary for the execution of a process, requiring manual intervention during a workflow. When this happens, users frequently need to take additional steps to fix issues that should be covered by the application in the first place.

They main cause for this problem is a failure in the design and requirement gathering phase. It is possible that software developers didn’t have enough experience to determine the main requirements of a full solution for the problem they were trying to solve. This is a cause of much frustration for users, and it usually leads people to disregard software as always incomplete and unreliable.

The best way to fix this category of problem is to have a better understanding of the user’s needs. Many time this issue can be solved not only by adding more features to an application, but by making it more flexible, so that different users can exercise the application in the way that better match their needs. Most successful programs make a conscious effort to create flexible solutions that can be quickly adapted by users as necessary.

Designing around the user interface

While the user interface is the most visible element of a software application, it may not be the best way of analyzing and organizing the functionality of a piece of software. Unless you are working on an application that absolutely needs to have a particular interface, working first with the functionality in mind is better than blindly applying user interface elements from the beginning.

In many cases, software engineers use the interface as a focal point of feature design, this being one of the core ideas proposed by agile methodologies. They frequently forget, however, that the GUI is merely a presentation method for something of more fundamental in importance for the user. The logic and data model components contain must of the necessary parts of a program; moreover, these parts are mostly independent of any graphical interface.

Such a policy of functionality-based design may be hard to enforce if you start the project working directly on the user interface. As a result, a number of bad decisions may occur from an unnecessary focus on UI in detriment to the real functionality of the application. Just as an example, in the past it was common to have graphical applications with menus and main windows, even when that was not the best way to present functionality to users.

Not integrating properly with other systems

In most cases a system cannot be used in isolation. Even the simplest ones have to be aware of the hardware, operating system, and programming environment where they live. Similarly, software should be designed to take into consideration the degree of integration required from other systems that have already been implemented.

In a company, for example, we need to integrate with database systems, and well as other applications that provide business functionality. Web sites need to integrate with other technologies provided by the environment used such as Java, Ruby, or PHP. A smart software engineer should be able discern the best way to integrate into the environment where the program will live.

Failure in this area usually leads to systems that are underpowered because they don’t use and integrate the functionality available in the environment. Therefore this type of issue can also be viewed as a cause for the second item mentioned above. Try to make the best of existing technology in order to avoid this problem.

Not Using Common Solutions

Many engineers have trouble using established solutions for common software problems. This may come from lack of understanding of a particular platform or programming environment. In other occasions it is common to observe the familiar feeling of “not made here”, which leads many developers to recreate existing technology.

The truth is that current programming technology is too complex to avoid using libraries and other applications developed by third parties. In fact, I would characterize as a liability the urge to implement everything in a software project, for the simple reason that a single person (or team) cannot have the competency to implement all parts needed by a modern application.

The best practice is to evaluate and use libraries and frameworks that have been proven to solve all the main technical problems found during development. This has been made much easier by the emergence of open source software. A major advantage of open source is that solutions can be created for common problems and consequently shared by a whole community. A strong open source scene is a must have for most modern programming environments.

Whatever the reason for this problem, it is essential to leverage the knowledge and work available through existing solutions. Trying to reinvent the wheel is in most cases just an example of bad practices in software engineering.

Conclusion

Developing software requires attention to several details. I have mentioned only five of the factors that can make or break a project, depending on how well they are managed. A lot of other issues remain, but being careful about the items mentioned here can definitely help in the successful completion of complex software projects.

Disadvantages of Statically Typed Languages

It is widely understood that typed languages have advantages that make them suitable for development of a wide range of systems. In fact, statically typed languages such as C++ and Java are currently the most successful in terms of commercial adoption for the development of desktop and server-side applications. Most operating systems, including Linux, Windows, and Mac OS X, are written on C/C++ at their cores. Moreover, user applications in the desktop are frequently written in typed languages: traditionally C++ has been used for this purpose, but we nowadays also have C#, Objective-C and Java as contenders for that position.

The increase in importance of web services, however, has made it possible to use dynamic languages (where types are not strictly enforced at compile time) for large scale development. Languages such as Python, Ruby, and PHP, which were previously treated as simple tools to create scripts, are now used to write software for some of the world’s biggest web sites. As a result, companies such as Facebook and Google rely every day on software written using dynamic languages.

The main reason for the uprising of dynamic languages is the nature of web-based programming. Typical web applications are concerned mostly with storing and retrieving data, where the heavy lifting is done by database systems. The other piece of the puzzle is generating markup language: interpretation of the markup and traditional event handling is performed in the client by a web browser. For this reason, web applications have a very limited set of responsibilities as compared to what is typically assigned to a desktop application, which is responsible for all tasks associated to displaying and manipulating data.

Given this change is programming needs, it is interesting to understand how dynamic languages compare to statically typed languages in the areas of usability and maintainability.

Strengths of Dynamic Languages

Although typed languages provide many useful services that can improve the performance and safety of resulting applications, they are not, however, the solution for all kinds of software engineering issues. A lot of programming problems can be more easily expressed in languages that have a relaxed notion of data types. For example, dynamic languages such as Lisp and Javascript have made it possible to express complicated requirements without the introduction of new types for each concept. Instead, functions and macros are used to define types that are checked are run time only, which reduces the amount work necessary to write a compiler.

At the heart of the problem with compile-time types in programming languages is the requirement of creating user defined classes for each new concept. For example, when a new concept is introduced in an object-oriented system, some languages (such as Java) require that this information be encoded as a new user defined type — which happens to be a statically defined element referred to as a “class”. Moreover, the new type will only be allowed to use operations that have been white-listed for its use. For instance, the inheritance mechanism is frequently used for this purpose. It is also possible to manually add methods to a new class definition as needed.

Forcing Programmers to Deal with Types

The idea of compile-time type checking is also important when safety is a concern, specially in a language with a low-overhead run-time. A prime example of this is C, which manages to use compile time checking to provide some degree of type safety, even in the absence of a proper run-time.

However, when this idea is not properly managed, it ends up generating more work for programmers than it can possibly save. One example of this insidious problem happens when two libraries treat the same concept using similar — but different– types. When this occurs, programmers are required to deal themselves with the differences. They end up needing to provide translation layers that make it possible for the two pieces of code to interoperate.

Working for the Compiler

One of the worst feelings I have when programming in C++ or Java is the sense of what I call “fighting the compiler”. Essentially, that is the part of the work in which one needs to make the compiler happy, by fixing all type errors so that a program can compile without issues. Usually it happens when we’re doing some kind of manual refactoring and several places have similar type checking issues that result in a large list of errors.

Slow compilation is another problem that frequently arises during the development of programs in a typed language. The main reason is that the compiler has to be able not only to parse the language in question, but also check each expression for correctness with respect to the type system. Depending on the complexity of the expressions used, the computational time required to perform these operations can be comparable to the total time for code generation. This is especially true when the language has a complex syntax such as C++, which even includes templates that can perform recursion at compile time.

This means that the requirements of compile-time type systems may increase the overhead on programmers. And few things can be more annoying than having to wait a long compilation time for seemingly small changes in a code base — as a result, productivity suffers. This may be a negative force in a large project, even when the advantages provided by type-checking are factored into the equation.

Conclusion

Compile-time checking provides several advantages, including the automatic elimination of a large class of errors. However, the rise of dynamic languages during the last few years has provide an opportunity for the discussion of the disadvantages of strict compile-time type checking.

Among the problems caused by type checking during compilation is the increased time that programmers need to spent for a full build. Sometimes it is much better to be able to test a program quickly, and let full-scale type checking for a later phase. Traditional typed languages do not allow this relaxed approach, which is one of the key advantages of dynamic languages.

In the future, we expect programming languages to provide better trade offs in this arena. Newer languages such as C# and Go have already introduced new ideas in this respect, but would like to see even more improvements in the next few years.