The Wasteful Legacy of Programming as Language

A few years ago I visited a friend who is a graduate student in linguistics. After some time he asked me if I was aware of the work by Chomsky on formal languages. I told him that yes, Chomsky work was a basis for much of the developments in theoretical computer science. More than that, I was glad to learn that there was something technical that I could share and discuss with other people in linguistics.

At the time I found this was just a great coincidence. It was only recently, though, that I started to think seriously about the implications of the idea that much of our understanding of computer programming stated with the study of human languages.

Language and Computers

One of the early ambitions in computer science was to understand, or at least be able to parse, human language. In order to do this, computer scientists explored models of how language worked in general. Based on these models, researchers figured out that we could classify languages in terms of complexity. Moreover, many computer languages could be entirely contained in some of the lower levels of this hierarchy, which is called the Chomsky hierarchy.

This was an incredible discovery for scientists working on compilers and interpreters, because its developments resulted in the necessary tools to efficiently parse languages. After that, instead of creating ad hoc parsers, which was the norm in the early period, one could apply a rigorous method to find a parser for a particular programming language.

The insidious side of this success, however, is that it completely changed the perceptions computer scientists have of how to give instructions to computers. We now have a frame of mind in which the important thing in the process of instructing a computer is to define a suitable language – and the more powerful the language the better.

The problem of this line of reasoning is that it creates a number of hurdles that are completely artificial, as we will soon discuss. For example, it makes us believe that there is something fundamentally different between programming languages, when in fact it doesn’t.

Computer Languages versus Human Languages

When we look at human languages, it does make sense that different languages exist. After all, language is a cultural concept. For example, as a speaker of two languages, it is very clear to me that speaking in English conveys different meaning and context, compared to things spoken in my mother language. There are a set of ideas that are better expressed in English, as well as there is a huge subset of thoughts that I wouldn’t be able to express well in anything other than my primary language.

As human beings, we want language to be inexact and able to convey several meanings. Contrary to the popular belief among mathematicians and engineers, this is a good feature of human languages, because it mimics the way we think. The truth about the study of language is that we are dealing with the whole culture of a community, not just with a stream of tokens that convey meaning.

The opposite, however, is true for computer languages. While we might strongly favor the use of human-like language, computers don’t really care if we use a bracket of a “begin” symbol to start a block of commands. Which means that we are, by introducing the subtleties of human language, complicating the task of how to instruct computers. Instead, we could just avoid the subtle differences in programming language and use a simple, uniform representation.

Unnecessary Work

The other problem with using current programming languages is that they create a lot of unnecessary work. Computer scientists have throughout the years devised baroque languages such as C++ and Java in an effort to provide programming instructions in a powerful way. It turns out, however, that the whole steps of writing a sequence of tokens, parsing that stream, and checking the syntax is unnecessary.

Everything in a computer program can be interpreted as a tree of expressions. If you’re encoding that tree in a block generated by a bracketed expression or a for loop it is not important for the computer. There is no reason why we cannot manipulate the syntax tree directly, for example, using a tree model on the screen, instead of parsing tokens used to recreate the tree every single time.

As a result, the use of a language is unnecessary in both sides: computers don’t care where the stream of bits is coming from. It could be coming from a direct binary representation or from a Java program with all variables in German.

Human beings, on the other hand, don’t necessarily need to use the language either. Although it may look like a useful feature, the existence of a language implies the manual labor of keeping a textual representation, as well as the algorithmic work of translating that representation back into a tree of expressions. The point is that, for all we care, we could just as well be moving boxes around the screen to represent variables — there is no necessity to create a textual document representing a program.

Divide and Conquer

Still, the other issue with programming languages is that it tends to divide programmers into artificial categories, one for each language or even flavor of a language. For example the only difference between a Java programmer and a C# programmer is that they use somewhat different keywords and have to program against a different run time library. The only reason C# and Java programmers think of themselves as different is that they use the concept of language to define themselves.

Being more polemic, the only difference between a C programmer and a Lisp programmer is in what their languages force them to do. A C programmer is forced by its language to write pointer-related code. While a Lisp programmer is forced to think in terms of lists of expressions (even when they’re using other data structures).

A language-less programming system, on the other hand, would avoid this kind of issue altogether by providing only the means to do work, without forcing programmers to use a pre-defined way of thinking. Whatever the form used to interact with the program, a language-less system doesn’t care if parts of the program are shaped as s-expressions or C blocks: it just combine them as needed to create a program.

In a program written in this way, using garbage collection is just a matter of plugging the right run time support, which you could get along with the system or you could buy from a third party. Instead, we erroneously consider garbage collection as a fundamental attribute of the language in use. As another example, in a language agnostic system we would would be able to create and call closures by just adding a simple module to the system, not by devising yet another language that supports closures as a first class citizen.

But the advantages don’t stop there. Without the need for a language, it also disappears the need for a compiler. Everything is such a system is already parsed and ready to use. When we combine pieces of code they already know how to glue themselves to each other, so there is no need for a linker either.

A system would regenerate a new executable in a fraction of a second, instead of going through the painful and wasteful process of language interpretation, code generation, linking. For example, what exactly is the need to recompile an entire file when we change only a single character? In a traditional language, that is entirely necessary, because you could be modifying the token “blass” to “class”, which changes entirely the meaning of the whole file. In a language-less system, this could never happen, because each transformation is related to the existing syntax tree, which is already stored in the system. Therefore, a single change like this can only have local effect in the program.

Why Such a System Doesn’t Exist?

There are several reasons, but the least important of them is technology. We have graphical capabilities in any modern desktop computer that are beyond what is necessary to implement the visualization features for such a programming environment. The main difficulty separating us from that goal is the experience we already have as programmers.

Each one of us have been trained to use textual languages as the main way of interact with computers. We have learned to use associated tools, such as grep and make, that are imperfect but that give good results if you are disciplined.

Moreover, it is satisfying to use the huge amount of code available with the existing technologies. Everyone of us have been conditioned for many years to look at what has been done in our programming language, and even reuse the code if needed.

I think that this is the main reason why visual languages have had such a hard time to establish themselves. After using such a language for a few minutes, you immediately feel that it is so much easier to go back and write yet another program in C or Java or Python. Of course, it will always be much more productive to use a tool that you’re used to.

However, we have never been able to really improve a language-less system enough to see their advantages as compared to traditional language-based systems. I believe that there will be a big jump in programming quality the day a group of people decide to develop a true language-agnostic programming system. And when programmers in general learn that they don’t need a language to create valid code they will think that we were just crazy for using, for such a long time, a concept as wasteful as that of programming language.

Photo credit: commons.wikimedia.com

Similar Posts:

About the Author

Carlos Oliveira holds a PhD in Systems Engineering and Optimization from University of Florida. He works as a software engineer, with more than 10 years of experience in developing high performance, commercial and scientific applications in C++, Java, and Objective-C. His most Recent Book is Practical C++ Financial Programming.

21 Responses to “The Wasteful Legacy of Programming as Language”

  1. Chomsky not Chromsky

    By fix on Nov 26, 2011

  2. Excellent Article!
    Programming languages should have a standardized syntax even if the compilers work differently.

    By Nester on Nov 26, 2011

  3. So what do you imagine this Esperanto-like computer language would look like?

    By readanyway on Nov 26, 2011

  4. That was an epic definition of “biting off more than you can chew” and I’m cringing to imagine what programming language is invented next based on this blog post. (But go ahead; they’ve been inventing a new language per week since the invention of the microchip).

    By Penguin Pete on Nov 26, 2011

  5. Can you come up with a proof that such language is even possible?

    By nc on Nov 26, 2011

  6. Sounds like you’re describing a lisp with a form based IDE. s-expressions are in fact a representation of an AST. Even with a graphical tool to write programs, you’d only trade the compiler for input parsing.

    By crisplisper on Nov 26, 2011

  7. @fix thanks for the important spelling contribution

    By coliveira on Nov 26, 2011

  8. @Pete if somebody creates a new language based on this post I haven’t been clear enough. I think we don’t need more languages, we need systems where we can manipulate instructions without having to write them down using a textual format. Probably this will be some kind of graphical environment that can manipulate the AST directly, in one or more ways. I don’t know how long its gonna take, but I think we need something like this to break free from the language-oriented approach we have been taking for so long.

    By coliveira on Nov 26, 2011

  9. I think you’re missing your own point here. If languages are just a skin on the semantics, then a GUI to manipulate visual elements is just another skin. Your “graphical environment that can manipulate the AST directly” will be the other language.

    The main reason that visual manipulative development environments (let’s call them visual languages for short) haven’t taken off is that they aren’t the most efficient way to get the meaning expressed when you are working on a computer with a decent keyboard. For that matter, the environments themselves are much more difficult to create than compilers. I expect that once more people start using computers without decent physical keyboards, visual languages will become more popular, since they’ll be easier to manipulate by touch or gestures.

    Still, most of the distinctions you make between languages can be put into three categories: semantics, development efficiency, and aesthetics. Your manipulative environment will have those same problems, and people will still need different solutions to those problems depending on the project or the person. Semantics: Pointers vs. lists, manual memory allocation vs. garbage collection, moving these semantic issues out of the language and into the library or runtime doesn’t make them any less issues, or any less debated. Development efficiency: Your ASTs will still need manipulation, and whether this means better physical interaction models, or AST generation tools, it’s still an issue. Aesthetics: Expect your manipulation tool to be popular on skinning sites, with people forking it if they need new ways to make it prettier, by their personal standards. Once a manipulation environment is customized for the developer, another developer might have a lot of trouble using it, same as with textual languages.

    Making the AST generation and manipulation graphical doesn’t make the issues you are talking about go away, it just moves them to different parts of the process and toolset. But you should still expect to see more visual programming environments over the next generation, if only because keyboards seem to be becoming less popular.

    By BrianH on Nov 26, 2011

  10. @BrianH: I couldn’t have said it better myself.

    One more thought to add though. The reason the article cites for why different human languages makes sense holds just as much sway in computer programming.

    The language that makes the most sense to use depends on the concepts that suit the developer and the problem at hand best. Some problems are best solved by more data-driven or functional approaches. Others are better solved by agent-based parallelism. Yet others may be best suited to a traditional imperative style.

    And this also holds in the case of graphical languages, because the one thing this different set of development tools will not do away with is the fundamental language concepts that are chosen to tackle a given problem. There will still be the equivalents of ML, Prolog and Java, even if we do away with their actual textual representation in favour of something that doesn’t need parsing.

    By JerryJvL on Nov 26, 2011

  11. @BrianH I agree that a programming environment that is language-less will not solve all issues. We still need to define a semantics for whatever method of computation we have. But the fact that there is no need to support a language itself is already a great step forward. Think about working with imperative languages, for example. They all have similar concepts but it seems like a big deal that you’re writing code in C or Fortran or Pascal. It is not. As long as you have a simple way to attach meaning to a visual representation of the computation tree, you can do anything that these languages can do. To add support to object oriented programming is a similar issue: why think in terms of Java or Smalltalk when you could simulate the features of all of them?

    By coliveira on Nov 27, 2011

  12. I used to program using 1’s and 0’s but I eventually discovered that I could get by with just the 1’s

    By Name on Nov 27, 2011

  13. I agree with your main point which is that programmers should stop comparing programming languages as a form of syntax, but programming languages are actually more than just the definition of a syntax.
    Languages have type models, concurrency models, and a lot of other features which make them impossible to unify.

    “the only difference between a C programmer and a Lisp programmer is in what their languages force them to do.”
    => Not only do I agree with this statement, but it is the one reason why you cannot unify all languages under one unique programming environment (graphical or not).

    I will take the example of JavaScript and C++. In C++, you can access all the (user land) memory by casting an integer into a pointer. On the other hand, if your JavaScript script does not have access to a (local) object, there is no way you can get one unless another script hands you a reference to that object (JavaScript is then said to be “memory safe”). This has strong implications in security.

    Also, if one cared about writing an AST for assembly, it would actually not be a tree, but rather a sequence (with labels).

    Programming languages also have different concurrency models and there is not, as far as I know, a unique way to unify them in a “universal API”.

    What matters is actually not the tree, but rather what the language is capable of and the data structures (“what the language forces programmers to do”). And by saying that, you have actually described the notion of a virtual machine. And there are actually different (non-equivalent) virtual machines, including LLVM (which unifies C, C++ and Objective C), or the JVM which is misnames, because it goes beyond Java and other programming languages (Scala, etc.) are built on top of it.

    Also, parsing is something necessary in some cases. As far as we know, there is no way to store abstract representations in disks or to pass on the network. Consequently, there is a need to serialize whatever we’re doing whenever we need to sustain it or share it (both are often required when writing a program) and the text representation is a good trade-off for this.

    By David Bruant on Nov 27, 2011

  14. From the description above – direct edition and visualization of an AST (abstract syntax tree) I think you may want to have a look at the old “Intentional Programming” research from Microsoft Research since commercialised by intentsoft. Looks promising but then I said that to friends in the 1990s and it still under wraps :-(.

    By Lachlan Pitts on Nov 28, 2011

  15. When we look at human languages, it does make sense that different languages exist. After all, language is a cultural concept. For example, as a speaker of two languages, it is very clear to me that speaking in English conveys different meaning and context, compared to things spoken in my mother language. There are a set of ideas that are better expressed in English, as well as there is a huge subset of thoughts that I wouldn’t be able to express well in anything other than my primary language.

    1. It seems to me that there are two aspects to “language as a cultural concept.” Firstly, there is, I believe, abundant evidence that there is direct hardware support in human brains for language. The cultural aspects of human languages are the precise form (words, grammar, etc) and, perhaps, the shared set of ideas that different cultures have accumulated. However, I would go so far as to say that they all have a largely overlapping underlying reality that they have evolved to allow us to converse about.

    2. Can you give examples of where “speaking in English conveys different meaning and context compared …” or are you making the unsurprising observation that the word “cheese” (from English) might sound a bit like the Cantonese words for “pig shit” to a Cantonese speaker.

    3. When you say “There are a set of ideas that are better expressed in English …” do you mean that you can express them more compactly in English to an English speaker who shares the same culture as you, or that you can express them with more clarity, or what. I can assure you that I can express things in English that would be understood by an Australian English speaker but would be completely misunderstood by an American English speaker, and that as someone who studies Chinese, their ideas can be expressed in English, it just takes more effort.

    In some sense, this seems both similar to, but different from, the differences between, say C++ and C. There are things you can do in constructors and destructors in C++ (scoped locks, for example, or shared pointers) that allow one to compactly express certain things that are much harder to implement/use in C (and get right).

    The similarities seem to me to be that there is different shared culture that you have to be aware of here between C and C++ programmers (and people can use C++ like it is C, although not the other way around), while the differences are that C and C++ offer different levels of abstraction, while human languages do not (but the run time does, because of culture).

    So, count me as confused by what you are saying.

    By Richard Sharpe on Dec 1, 2011

  16. I tried to quote that first para, but it seems like HTML tags cannot be used here. I should have used explicit quotes as well :-(

    By Richard Sharpe on Dec 1, 2011

  17. What brought me here was my curiosity of oop. I have used oop extensively in C++ but I always get annoyed with the idea of reducing data representations into “objects”. Using plain ordinary boring C seems to jive more with my mind b/c I am better able to map in my my mind how the data is being processed. After all beneath the surface of any language is nothing more than your stack, heap, and registers. To me C offers a 1-to-1 mapping between code and what’s beneath the code. Oop seems add a “meta” layer in between that 1-to-1 relationship so what you have is a 1-to-1-to-1 relationship. To me that extra layer is essentially “undefined” and left to the programmer to conjure up which further seems to me to be against what a computer is meant to do. Computers process binary code, 1 or 0’s, true or false statements. An object can be neither true or false because there is no definite form to them. Thus to me oop adds more complexity than it reduces or at the least it trades one form of complexity for another. As Chomsky pointed out perhaps this is due how my own mind is wired and interprets the world … or maybe Ijust not advanced enough to truly understand how to use oop. I’m interested to hear about anyone else’s experiences. Computer language linguistics is interesting.

    By Bob Johnson on Dec 2, 2011

  18. While I mostly agree with the introduction, in my opinion the conclusions of the piece are very questionable. You note yourself that one human language is better in expressing certain local concepts simply because another language and culture does not have a need to deal with similar concepts. Language of nation living on an island will certainly have more expressions for water, fish and boats than the language of a nation living in a desert. Consequently, one will find the language of the island nation more suitable for writing a research paper on marine life. Similarly, programming languages have different syntax not because people inventing them were insisting on differences but because each language had some predefined purpose in mind. True, most contemporary programming languages have many similarities but one will certainly find easier to write business applications in COBOL than in C++. Another issue that you are possibly missing is that language syntax is only a small part of any programming language. Libraries as well as overall purpose are what really define a “culture” of the language. Further, whether you use curly bracket or “begin” key word to mark the beginning of the block or you use some visual tool to do that is quite irrelevant for coder productivity. I believe you will find that defining code functionality is by far the most time consuming activity and that is where language differences come to the front. Were that not the case you would find that visual or probably speech recognition tools that you would like to have would already be available. Perhaps you are underestimating the IT community.

    By goran on Dec 3, 2011

  19. Hi, Neat post. There’s an issue together with your website in internet explorer, might test this… IE nonetheless is the market chief and a huge component to people will pass over your great writing due to this problem.

    By Cuma Honda on Apr 16, 2012

  20. Good articles. The author spoke out a point I found at these days: the syntax for high language is unnecessary. We spent too much time on parsing technique for high language. Some day we’ll directly produce the ast for the programs. I’m doing a system for natural programming. Natural programming are coming, but few people can imagine how powerful it should be until they can use it.

    By simeon.chaos on Mar 25, 2013

  21. Such a language exists it’s lisp.

    By Pascal Bourguignon on Jul 30, 2015

Post a Comment