Is Using Hash Tables a Good Thing?
One of these days, I was looking at patterns of coding in some of the applications that I work with — written in Java and C#. I was then interested in seeing why our group happened to use hash tables so frequently in these languages. This was puzzling to me because I am used to write software in C (for many years now), and I hardly ever used, or found the necessity of using, hash tables. Of course, I am not considering the classic examples (such as symbol tables), but programming in general. And I can definitely say that the programs I write in C are not any slower because of this missing data structure.
After thinking a little bit, it became easier to see the reason why Java and C# not only provide, but need to
use hash tables in so many places. The fact is that hash tables are frequently used where pointers, structures, and vectors should suffice.
Uses of Hash Tables In Java
A hash table is frequently used as a loosely defined structure, which can hold whatever fields you want. Thus, one first use of hash tables is as non-typed storage that can hold any kind of object. This can also be a problem in C, but for better or worse, we have void pointers that can be used to hold anything. I am not saying that it is a good solution: it also means that we need to be extra careful not to blow up the whole program when accessing those values, but there is certainly a way to hold values or any desired type.
In a more general way, simple vectors of pointers can be used to do most of what a hash tables does. Your data just needs to store a value indicating where the second element is, in addition to its own data. For example, suppose you want a hash table from data type T1 to data type T2. Create a vector of pointers for elements of type T2. Use this vector as a list, growing it with malloc/realloc as necessary (sometimes it is easy to determine the maximum possible size, which makes it even more efficient). Then, for each added element t1 of type T1, use a new position on the vector to store the address of the element of type T2. Also, store the index of that position in t1.
Now, if you need to look up any object, just use the stored index into the array of pointers to recover t2. This is done in constant time, as we wanted. It is even easier and more efficient than using a hash table. The only disadvantage is that you need to add a new field to the object t1, which is pretty easy to do in C, but could hurt the feelings of someone that thinks in an object oriented way.
Clearly, there are situations in which this is not so easy to do. For example, if we are manipulating simple values such as integers, doubles, even strings, instead of structures, then you don’t have a place where to add the extra field. In that case, you might consider creating a hash table, even though I think it would still be easier in most cases to create a structure with the original value plus the index, and use the structure instead of the value object (and by the way, that is also more “object oriented”).
Using Data Instead of Code
Of course, there is an even simpler way: just put a pointer to the second object in the first object. While this is easier, there can be value in having an additional layer of indirection, so somebody else could manipulate the references without going through the objects of type T1.
We can also notice that this is an example of the general technique of using more data instead of algorithms. In the scheme described above, the indirection work is taken from the hash table code into the object itself. Now, it is not the hash table that knows how to find the object, instead the object itself knows the secret of how the other piece of data can be found. The first object just passes that secret to the vector in order to retrieve the object of type T2.
This is not to say that hash tables are not useful. They can be used in highly dynamic situations, in which we don’t know in advance which relationships will exist between objects. However, it must be said that in 99% of the circumstances this is not the case, and using hash tables to implement simple algorithms might just add overhead to the program. It can also mean that you are not putting the references for your objects in the place where they should go logically, which is a sign of weakness in the original design.
Given all the issues related to the use of hash table in Java and C#, I will certainly start to be more careful of the places where I use them, from now on.