Day 25: Learning to Document Properly

Documentation is a polemic topic in programming because there are so many ways of doing it. While everyone agrees at some level that software documentation is important, there is rarely agreement between two programmers on exactly what needs to be documented and how.

Despite the misunderstandings, there are still some common practices that can be incorporated in a daily development workflow. To simplify, let us start discussing some areas of a code base where most everyone agrees that there should be extensive documentation: the public API. We will see why and how such interfaces should be documented, and then move to other cases.

Documenting APIs

If there is a part of a code base that really needs good documentation, that is the external API. This is specially important because, in many cases, the interface of the library is the only part of the code available to users. This is a common situation in languages such as C++, where header files can be distributed independently from the implementation sources. A similar case happens when only the Javadoc files are distributed for a Java library.

When the source code for a library is not distributed, the comments in the public API are the only way to determine how to properly use an interface — in the absence of a more extensive manual. Moreover, even if the complete source code of a library is accessible, API developers need to remember that it is sometimes a hassle to look up the implementation for an API — especially when a code browser is not available. Whatever the situation, looking at the implementation files should not be necessary to understand how to properly use an API.

When documenting libraries, it is very important to provide clear examples of how a method can be called, and if possible, the context in which such a call is correct. Sadly, many libraries come with documentation that provides detailed explanations for each method individually, but fails at explaining how that method can be used along with other classes to achieve a desired result. This is a frustrating experience for programmers, and can make your library harder to use, even if it is well designed and implemented.

Documenting Internal Header Files

Header files that are internal to the project have many of the same challenges presented by external libraries. In a sense, each header file is a mini API that defines how other parts of the library or application will interact with that specific section of the code. For this reason, programmers need to be careful to present a complete picture of what that module or class are trying to accomplish.

On the other hand, many internal header files have low importance when viewed along with other parts of the code base. Because they are mostly used as an implementation detail, several of these internal header files and classes don’t really require further explanation. Depending on the particular phase of the project, these files may be frequently written and modified, and are subject to changes that may render any documentation useless within a short time span. Therefore, providing extensive documentation for some private classes and header files may be completely unnecessary.

A common guideline when working with internal header files is to treat them according with their relative importance to the project. There are some small header files, containing only POD classes, for example, that don’t require any time spent in documentation. These are essentially dependent files, which receive their meaning from other, more important definitions in the project.

More important header files should receive better documentation, however. For example, classes that encapsulate fundamental concepts in the application should be fully document, sometimes with as much care as one would spend when writing external APIs.

Documenting Internal Code

Internal code is the easiest to handle, because developers have complete control over what it will look like. At the same time, it is the most contentious area in code documentation because each software author has a different idea of what needs to be documented or not.

General guidelines are still possible here. It is now understood that there is little value in trying to document “how” something is done. For example, the typical comment:

i++; // increment the counter “i”

should be avoided at all costs: it just adds more noise to the code, without contributing anything. Any C++ or Java programmer will be completely familiar with the meaning of the line of code above. It would be a different situation, however, if the comment tried to explain “why” the expression is necessary:

i++; // incrementing here because the next section assumes? // all counters have been updated

The comment is now useful. Its purpose is clear, and it explains in a few words why the counter needs to be updated at this particular location, and not after the next instructions are executed.

It is easy to understand the difference between “how” and “why” comments once you start asking these questions yourself. Whenever you are about create a new comment, think first: is this describing what happens or why it happens? The first type of comment should be avoided. The second type of comment may be useful enough to warrant its inclusion.

Conclusion

Commenting code is an stylistic decision. Every programmer has a different way to handle comments. Some developers never write comments, while a few others like to create small essays for each section of code (consider, for example, literate programming).

Despite the differences, comments should always be handled with proper care. They are more important when code is made available to external users, who don’t have access to the implementation. They should also appear in sections of the code where an explanation is necessary for a complete understanding — clearly describing why, instead of just how, something is done.

  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • E-mail this story to a friend!
  • HackerNews
  • Reddit
  • StumbleUpon
  • Twitter

Day 24: Refactoring Code Frequently

Software is not a static entity that exists in a single, defined way. The great advantage of software over traditional engineering building material is that it can be modeled for different uses as necessary. This advantage, however, incurs in a penalty that is easily overlooked: it is necessary to maintain software during its life time so that it maintains the original qualities, even when facing changes in the way it is used.

This flexibility along with its associated maintenance problems can be easily seen in any non-trivial piece of software, be it commercial or open source. An application or library usually starts as a way of solving a particular need. As development progresses, however, the same software is frequently adapted to solve more and more complex use cases. As a result, code needs to adapt itself to these changing requirements, changes that sometimes modify important assumptions upon which the original code was written.

The main challenge faced by developers is to allow a piece of software to evolve in an organic way, so that it will be provide a response to these sometimes conflicting requirements. As an example, the same application that was once able to process a single file with a particular purpose may have, a few years later, be able to handle multiple files, each one with a different type of contents.

An Answer to Changing Requirements

Because of these difficult requirements, a lot of software artifacts just decay due to a lack of proper maintenance. As changes happen to the environment where it exists, a program will start to provide incorrect answers some of the times, or even most of the time. These are bugs that have been introduced by the very same evolution that is reshaping the program to be able to cope with different requirements.

Despite this, it is possible to maintain software properly even through big changes happen to its processing capabilities. Refactoring is the central concept enabling this adaptation on a daily basis. With the help of refactoring, programmers are able to slightly modify the way a piece of software works, so that it can satisfy specific characteristics needed by new requirements. At the same time, refactoring can be used along with unit testing to improve confidence on the quality of the existing code base.

Refactoring to the Rescue

Refactoring surfaced as an automated way of modifying programs, especially code written in an object-oriented fashion. Smalltalk was among the first languages to provide a refactoring facility, so that programmers could make small changes with confidence. It is important to understand that changes allowed by refactoring are usually minimal so that they could be easily automated. Examples include: renaming methods, changing the order of arguments in a method declaration and all method calls, or moving a method from a subclass to its parent class.

The main advantage of refactoring compared to other approaches for code modification is that refactoring is (or can be) completely automated. With the help of a refactoring tool, it is very simple to make changes to a code base that will make the program cleaner, while at the same time maintaining its correctness.

When to Use Refactoring

Another advantage of refactoring is that it doesn’t need to be considered as a task separated from coding. Good software engineers make liberal use of refactoring in their workflows. In fact, some people use refactoring very frequently, in order to provide one or more of the following:

Improving readability: a simple reason to use refactoring is to improve the readability of existing code. Sometimes it is possible to combine two or more statements and make a method much easier to read. Conversely, it is possible to split a statement in one or more, while at the same time improving the readability of the whole section of code. These are powerful uses of the capabilities of a refactoring tool.

Adapting to changes in the underlying code: sometimes the implementation of a new feature will require just a slight change to existing code. For example, that change may require a new parameter to be added to a method; or maybe an existing method will need to be moved to another class. Instead of doing this manually, one can use a refactoring tool to achieve the same results with less work.

Naming methods and variables properly: another good use of refactoring is making sure that methods and variables reflect their true meaning in the program. Sometimes, as a piece of software evolves, the names used in the initial implementation may lose their meaning, either because of business reasons or because other parts of the implementation have evolved too. Refactoring is a simple way to solve this issue, since it guarantees that all other parts depending on these methods or variables (including automatic tests) will be properly updated.

Conclusion

Refactoring is a powerful tool, and it should be part of any programmer’s tool chest. However, like any tool it takes some training to get used to it. Some of this ties with the general idea of properly learning your software environment. Most modern environments (command line or IDE-based) have provisions for running refactoring tools.

Also, refactoring tools are available for many modern languages, the most common ones including Java, Smalltalk, Python, and C++. Therefore, the use of refactoring has just become a matter of mastering the underlying programming system. The frequent utilization of such tools can make our lives easier, and our code even better.

  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • E-mail this story to a friend!
  • HackerNews
  • Reddit
  • StumbleUpon
  • Twitter

The Beauty of Short Methods

Writing short methods is one of the pieces of conventional wisdom that have been passed around as a form of agile technique. Many programers adhere to the use of short methods without thinking too much about it, while others disregard the idea altogether, more concerned with the practical activity of creating usable code.

There is, however, a number of surprising reasons why writing short methods can improve the quality of software. Some of these reasons have nothing to with just writing auto-commenting code, as some people describe it. One of the main issues is the distinction between the static description of code provided by a set of classes, as compared with the dynamic and complex view provided by algorithms.

Dynamic Properties of Algorithms

When we start on Computer Science at college, we discover a new world of scientific pursuit in the area of computational systems. Students are then presented to algorithms, and taught that this is the foundations of programming. We are lead to believe that using correct algorithms is necessary and sufficient to write good software.

There is no question that algorithms are a foundational concept, but they give programmers the false sense that explicitly writing an algorithm is the best way of solving complex problems in programming. Throughout my experience as a software engineer, I have found that combining small methods from several classes and libraries is a far more common way of solving problems – a technique that covers the huge majority of programming tasks available nowadays.

Despite all the ideas about the simplicity and elegance of small methods, I believe the main reason to use them is to transform the complex dynamic nature of algorithms into a static description of a solution in terms of classes and methods.

Complex Algorithmic Methods

The main advantage and liability of thinking in algorithms is that you learn to reason in terms of complex logic. It is true that, after you really understand an algorithm, its logic become clear enough that you may use it in several other contexts, as with any other mathematical concept.

There is a problem with this, however: whole algorithms are not so easy to recognize at first sight, especially by someone who hasn’t been educated in computer science. Even if you have good training, it is not always clear that an algorithm can be recognized when it is used in the context of a larger method. If that is the case, a programmer needs to document that a particular algorithm is being used, and the reader needs to believe that the implementation is correct.

Also, it is not easy to verify that an algorithm implemented as a large method is correct. This is a big problem encountered when moving from algorithms in a classroom setting to their concrete implementation. While in a classroom, an algorithm can be proved as a mathematical entity. It can be shown as correct by simply proving its mathematical properties.

However, when an algorithm is converted into software, all kinds of mundane issues happen that directly affect its result. For example, Knuth once mentioning that a very small percentage of implementations of a particular quick sorting algorithm appearing in the literature are actually correct.

Small Functions as a Vehicle to Algorithms

Small methods are a much more fruitful way to express complex algorithms, because they have an important property: they are able to convert the dynamic world of algorithms into the static world of classes and their associated structures. This is not a small feature, and it is the root of the power of the Smalltalk approach.

In a Smalltalk program, every method is just a few lines long. Incidentally, that is why they have such a simplistic editor that can display only a small number of lines. When one sees this kind of program, there is the feeling that every method does very little. However, this allows a number of important features that are core to the philosophy promoted by the language.

First, it is very easy to determine how a single method works. The two-to-five-lines rule used when creating new methods makes it simple enough to understand their contents just by inspection – unlike anything you could do with long methods or functions. As an example, a number of static analyzers for languages such as Java and Objective-C fail when a method is too long. If a computer has problems when considering the branching possibilities in large methods, imagine how difficult it is for humans!

As another advantage, all the complexity of an implementation can been moved from the leaves of the system (the methods’ contents) to the internal nodes composed of classes and their methods. As a result, there is a large number of self documenting methods that can be analyzed by just looking at a class diagram or class browser. This is exactly the reason why the class browser is the standard tool in Smalltalk systems.

Finally, all the dynamic aspects of a program are stored in the code either using polymorphism (which further reduces the complexity of each individual method) or as part of the methods themselves, using selection and repetition structures. However, since these control structures are used in very small pieces of code, the resulting interactions are also very easy to understand.

Small Methods in Traditional Languages

Small methods can be a powerful weapon when trying to reduce the complexity inherent to software programming. Looking in this way, it is hard to understand why this kind of methodology is not more widespread in the programming field.

The first reason is the belief that programming with short functions/methods is bad from a performance standpoint. This kind of view is less relevant each year due to the increasing performance of computer systems, but is still around. The best answer is that doing otherwise is a premature optimization, and code quality should be the number one priority for a software engineer.

Another valid reason is the increase in class complexity when a large number of small functions or methods are used. This can justify the idea of using longer methods and a smaller number of methods per class. The solution for this problem, however, is already in place with modern IDEs. Each on of them allow programmers to traverse the hierarchy of classes in a system, most frequently using graphical tools to simplify the task.

When using this kind of system, it is important to emphasize the complexity shift from single methods to the whole class, while moving from analyzing the contents of a method to looking at class names and connections between classes. Modern IDEs also provide tools for moving methods around, using refactoring techniques that have been significantly improved during the last decade.

The good news is that, although Smalltalk is built around the idea of very short methods, it is not necessary to use Smalltalk to benefit from that concept. It is nowadays possible to use the same techniques in any object oriented language. In fact, it is possible to do this even on traditional languages such as C or Pascal, although this will result in losing the benefits of polymorphism and encapsulation – which can even be a reasonable tradeoff depending on the type of application being written.

  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • E-mail this story to a friend!
  • HackerNews
  • Reddit
  • StumbleUpon
  • Twitter

Five Common Mistakes in the Early Design Stage

Creating great software requires attention to several details that can make the difference between a well-designed system in one hand, and a failure in terms of features and maintainability in another. Every software engineer has a big list of elements that are necessary for a successful project and that should be part of the work cycle no matter what happens in the team.

In this post, however, I would like to emphasize a few mistakes that are so common in this field, but that keep repeating themselves due to bad management practices. It would be great to maintain this list in mind, since this kind of information can help us make better decisions whenever we are faced with similar situations.

Solving too much of a problem

A lot of failures in software design have to do with a lack of clear focus on a particular area. For example, inexperienced programmers and managers try to work on more than they can easily accomplish. The main cause for the resulting failure is trying to solve a big problem all at once.

The reason why it is so much better to focus on a particular goal has to do with the nature of problem solving: it is easier to solve several smaller problems than finding a single solution for a big one. When human beings try to handle complex details they get overwhelmed — and this happens even with smart people. In fact, a great part of being smart is to learn how to break problems into smaller pieces that can be easily solved.

Similarly, when developers try to create a program with too many responsibilities, the result is usually less satisfying. To avoid this pitfall, try to break problems into parts that can be easily solved independently. Then, come up with a way to combine the solution so that it becomes transparent for users. As most problems in software design, there are several ways to achieve the same result. Experience can teach you the solutions that work better over time. Just become aware of the complexity of each subproblem you’re trying to solve, and try to reduce the complexity by breaking up the problem if necessary.

Solving too little of the problem

While this is far less common that the previous problem, it also happens that systems may be underpowered for the problem they aim to solve. For example, they may miss critical steps that are necessary for the execution of a process, requiring manual intervention during a workflow. When this happens, users frequently need to take additional steps to fix issues that should be covered by the application in the first place.

They main cause for this problem is a failure in the design and requirement gathering phase. It is possible that software developers didn’t have enough experience to determine the main requirements of a full solution for the problem they were trying to solve. This is a cause of much frustration for users, and it usually leads people to disregard software as always incomplete and unreliable.

The best way to fix this category of problem is to have a better understanding of the user’s needs. Many time this issue can be solved not only by adding more features to an application, but by making it more flexible, so that different users can exercise the application in the way that better match their needs. Most successful programs make a conscious effort to create flexible solutions that can be quickly adapted by users as necessary.

Designing around the user interface

While the user interface is the most visible element of a software application, it may not be the best way of analyzing and organizing the functionality of a piece of software. Unless you are working on an application that absolutely needs to have a particular interface, working first with the functionality in mind is better than blindly applying user interface elements from the beginning.

In many cases, software engineers use the interface as a focal point of feature design, this being one of the core ideas proposed by agile methodologies. They frequently forget, however, that the GUI is merely a presentation method for something of more fundamental in importance for the user. The logic and data model components contain must of the necessary parts of a program; moreover, these parts are mostly independent of any graphical interface.

Such a policy of functionality-based design may be hard to enforce if you start the project working directly on the user interface. As a result, a number of bad decisions may occur from an unnecessary focus on UI in detriment to the real functionality of the application. Just as an example, in the past it was common to have graphical applications with menus and main windows, even when that was not the best way to present functionality to users.

Not integrating properly with other systems

In most cases a system cannot be used in isolation. Even the simplest ones have to be aware of the hardware, operating system, and programming environment where they live. Similarly, software should be designed to take into consideration the degree of integration required from other systems that have already been implemented.

In a company, for example, we need to integrate with database systems, and well as other applications that provide business functionality. Web sites need to integrate with other technologies provided by the environment used such as Java, Ruby, or PHP. A smart software engineer should be able discern the best way to integrate into the environment where the program will live.

Failure in this area usually leads to systems that are underpowered because they don’t use and integrate the functionality available in the environment. Therefore this type of issue can also be viewed as a cause for the second item mentioned above. Try to make the best of existing technology in order to avoid this problem.

Not Using Common Solutions

Many engineers have trouble using established solutions for common software problems. This may come from lack of understanding of a particular platform or programming environment. In other occasions it is common to observe the familiar feeling of “not made here”, which leads many developers to recreate existing technology.

The truth is that current programming technology is too complex to avoid using libraries and other applications developed by third parties. In fact, I would characterize as a liability the urge to implement everything in a software project, for the simple reason that a single person (or team) cannot have the competency to implement all parts needed by a modern application.

The best practice is to evaluate and use libraries and frameworks that have been proven to solve all the main technical problems found during development. This has been made much easier by the emergence of open source software. A major advantage of open source is that solutions can be created for common problems and consequently shared by a whole community. A strong open source scene is a must have for most modern programming environments.

Whatever the reason for this problem, it is essential to leverage the knowledge and work available through existing solutions. Trying to reinvent the wheel is in most cases just an example of bad practices in software engineering.

Conclusion

Developing software requires attention to several details. I have mentioned only five of the factors that can make or break a project, depending on how well they are managed. A lot of other issues remain, but being careful about the items mentioned here can definitely help in the successful completion of complex software projects.

  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • E-mail this story to a friend!
  • HackerNews
  • Reddit
  • StumbleUpon
  • Twitter

Disadvantages of Statically Typed Languages

It is widely understood that typed languages have advantages that make them suitable for development of a wide range of systems. In fact, statically typed languages such as C++ and Java are currently the most successful in terms of commercial adoption for the development of desktop and server-side applications. Most operating systems, including Linux, Windows, and Mac OS X, are written on C/C++ at their cores. Moreover, user applications in the desktop are frequently written in typed languages: traditionally C++ has been used for this purpose, but we nowadays also have C#, Objective-C and Java as contenders for that position.

The increase in importance of web services, however, has made it possible to use dynamic languages (where types are not strictly enforced at compile time) for large scale development. Languages such as Python, Ruby, and PHP, which were previously treated as simple tools to create scripts, are now used to write software for some of the world’s biggest web sites. As a result, companies such as Facebook and Google rely every day on software written using dynamic languages.

The main reason for the uprising of dynamic languages is the nature of web-based programming. Typical web applications are concerned mostly with storing and retrieving data, where the heavy lifting is done by database systems. The other piece of the puzzle is generating markup language: interpretation of the markup and traditional event handling is performed in the client by a web browser. For this reason, web applications have a very limited set of responsibilities as compared to what is typically assigned to a desktop application, which is responsible for all tasks associated to displaying and manipulating data.

Given this change is programming needs, it is interesting to understand how dynamic languages compare to statically typed languages in the areas of usability and maintainability.

Strengths of Dynamic Languages

Although typed languages provide many useful services that can improve the performance and safety of resulting applications, they are not, however, the solution for all kinds of software engineering issues. A lot of programming problems can be more easily expressed in languages that have a relaxed notion of data types. For example, dynamic languages such as Lisp and Javascript have made it possible to express complicated requirements without the introduction of new types for each concept. Instead, functions and macros are used to define types that are checked are run time only, which reduces the amount work necessary to write a compiler.

At the heart of the problem with compile-time types in programming languages is the requirement of creating user defined classes for each new concept. For example, when a new concept is introduced in an object-oriented system, some languages (such as Java) require that this information be encoded as a new user defined type — which happens to be a statically defined element referred to as a “class”. Moreover, the new type will only be allowed to use operations that have been white-listed for its use. For instance, the inheritance mechanism is frequently used for this purpose. It is also possible to manually add methods to a new class definition as needed.

Forcing Programmers to Deal with Types

The idea of compile-time type checking is also important when safety is a concern, specially in a language with a low-overhead run-time. A prime example of this is C, which manages to use compile time checking to provide some degree of type safety, even in the absence of a proper run-time.

However, when this idea is not properly managed, it ends up generating more work for programmers than it can possibly save. One example of this insidious problem happens when two libraries treat the same concept using similar — but different– types. When this occurs, programmers are required to deal themselves with the differences. They end up needing to provide translation layers that make it possible for the two pieces of code to interoperate.

Working for the Compiler

One of the worst feelings I have when programming in C++ or Java is the sense of what I call “fighting the compiler”. Essentially, that is the part of the work in which one needs to make the compiler happy, by fixing all type errors so that a program can compile without issues. Usually it happens when we’re doing some kind of manual refactoring and several places have similar type checking issues that result in a large list of errors.

Slow compilation is another problem that frequently arises during the development of programs in a typed language. The main reason is that the compiler has to be able not only to parse the language in question, but also check each expression for correctness with respect to the type system. Depending on the complexity of the expressions used, the computational time required to perform these operations can be comparable to the total time for code generation. This is especially true when the language has a complex syntax such as C++, which even includes templates that can perform recursion at compile time.

This means that the requirements of compile-time type systems may increase the overhead on programmers. And few things can be more annoying than having to wait a long compilation time for seemingly small changes in a code base — as a result, productivity suffers. This may be a negative force in a large project, even when the advantages provided by type-checking are factored into the equation.

Conclusion

Compile-time checking provides several advantages, including the automatic elimination of a large class of errors. However, the rise of dynamic languages during the last few years has provide an opportunity for the discussion of the disadvantages of strict compile-time type checking.

Among the problems caused by type checking during compilation is the increased time that programmers need to spent for a full build. Sometimes it is much better to be able to test a program quickly, and let full-scale type checking for a later phase. Traditional typed languages do not allow this relaxed approach, which is one of the key advantages of dynamic languages.

In the future, we expect programming languages to provide better trade offs in this arena. Newer languages such as C# and Go have already introduced new ideas in this respect, but would like to see even more improvements in the next few years.

  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • E-mail this story to a friend!
  • HackerNews
  • Reddit
  • StumbleUpon
  • Twitter