Implementing a trie data structure in C

There are some data structures that have wide applicability in several situations. One such example are trie data structures, which can be used in as varied situations as in compilers, text editors, and related applications.

This afternoon I had some spare time, so I decided to implement a trie data structure (something that I always considered interesting). A trie is just a kind of tree that can be used to store string keys. To improve search and insert speed, however, a trie is implemented in a way such that the stored data is shared between nodes in the data structure. According to the wikipedia entry [1]:

In computer science, a trie, or prefix tree, is an ordered tree data structure that is used to store an associative array where the keys are usually strings. Unlike a binary search tree, no node in the tree stores the key associated with that node; instead, its position in the tree shows what key it is associated with. All the descendants of a node have a common prefix of the string associated with that node, and the root is associated with the empty string. Values are normally not associated with every node, only with leaves and some inner nodes that correspond to keys of interest.

The Implementation

For fun, I decided to write the implementation in C, using Knuth’s CWEB system. The idea is that I can read a text file, split the file into individual words, and add each word to the trie data structure. Currently, the main restriction of the implementation is that it handles only the characters from A to Z. This means that it is good to store words, such as in a dictionary, but it doesn’t work to store identifiers in a C-like language, for example (since these can have numbers and underscores).

One easy operation to do with such a data structure is to count the number of distinct words. For example, considering only the words that we can handle in this program, I could identify 3138 distinct words in the file texprogram.tex, the original source code of the TeX program by Knuth. This was executed in less than 0.2s, so it looks pretty quick for a first implementation.

The program also has an option to print the distinct words found. You just need to execute

./trie fileName -p

and the program will print all distinct words in the file “fileName”.

Future Improvements

One of the uses for a trie is to intern strings (think of the Java intern() method). The next time I have some time, I would like to extend this program to allow C strings to be interned easily within a trie data structure.

With such an implementation, one could use the trie to return a pointer for each passed string. This way, instead of using strcmp to compare two strings, one can just compare the pointers stored in the trie, which is extremely fast.

My code has been tested on UNIX systems, and you can read the complete source code and documentation in PDF (see instructions for the online version at the end of this post). Let me know if you have any improvements.

Further Reading

References:

[1] http://en.wikipedia.org/wiki/Trie

Get The Complete Code

Given the great interest generated by the implementation of this algorithm, I have released the complete implementation in literate programming format (PDF file). Click in the button bellow to get more information about this data structure, with all details necessary to make it work in C.

  • By clicking the button bellow you will be redirected to PayPal.
  • After you make the payment, an email will be sent to your address with a download link.
  • The download will be in PDF format, with the complete listing in C using literate programming documentation.

Add to Cart

Similar Posts:

About the Author

Carlos Oliveira holds a PhD in Systems Engineering and Optimization from University of Florida. He works as a software engineer, with more than 10 years of experience in developing high performance, commercial and scientific applications in C++, Java, and Objective-C. His most Recent Book is Practical C++ Financial Programming.

7 Responses to “Implementing a trie data structure in C”

  1. Dear Dr. Oliveira.

    I am new at c++. I was searching for some information about tries and found your wonderful article at http://coliveira.net/software/implementing-a-trie-data-structure-in-c/ .

    This is a fascinating topic, although a bit too advanced for me. My extent of knowledge of C++ is that I cant even spell it!

    In any case, I downloaded your source code. I was devastated to learn it was in C (not C++ which I know bare basics for) and is for UNIX. The second obstacle I overcame. I created a LINUX VM, found some references online on make, configure and install, so it works.

    The first issue (having the source code written in C) I am still very much is struggling with. I found some UNIX software online (C to C++ converter) and will try it tomorrow.

    Can you answer the following questions, please:

    1) Do you have this code in C++ instead of C?

    2) I created a simple text file with a few words and your code prints its contents to screen. However, I have larger files (in MBs and GBs) that contain hundreds of lines with numbers like:

    794946673453453454599999934534534534566342

    794946673453453568450004568945686456456456

    and your program cant read them (0 values returned or something like that). These files were created with Big Integer C++ library for another project. Why cant these file be read? Is it because they consist of numbers? Or because they are too long to be interpreted? How do I take care of this? I think BigInt is only written for C++, not C.

    3) I have read the pdf for your source code but I cant find a reference on how to retrieve a particular value (lets say “does 794946673453453454599999934534534534566342 exist in in this trie”? How would I do that? Did I miss it in the doc?

    4) My understanding of the program is that removes redundant parts of the strings. Lets say first and second line have this string in common: 794946673453453, the rest is different. The program creates a new branch at the first different character. is this correct?

    Thanks you so very much for your help.

    Sincerely,

    Brugio.

    By Brugio on Sep 19, 2011

  2. Respected Sir,

    I am trying to implement B-trie(Disk resident trie).
    what i know about disk rsident is disk-resident means that the data structure is stored and accessed on non-volatile memory, such as a hard disk or a solid state drive. Main memory
    is typically only used as a buffer; updates are often performed in a write-through manner — meaning that an update is not held in memory for an indefinite amount of time,it is instead immediately forwarded to disk (via the operating system).
    I read all the algorithms of b-trie from http://www.naskitis.com/naskitis-vldbj09.pdf . even search and insert implemented in memory.

    How it should be implemented on disk ? pls guide me with the flow of application ?

    i am unable to understand the concept how to store it on disk ?

    By Sahil Singla on Dec 21, 2011

  3. Hi

    I got an question in VIVA as what is meant by TRIE data structure, really i hadnt heard such a data structure and i thought that mam is kidding but now i found that there is something like that, thanks for your xplntn…Goood

    By ash on Jan 11, 2012

  4. A fantastic coaching essay engages the reader till the
    conclusion, the greatest secret for publishing
    education essay is that the recommendations should be logically structured by
    one so what there’s no confusion left for your
    viewer within the teaching essay. Therefore, I will be providing
    you for writing instruction documents, with some significant methods.
    Paul’s cathedral designed Aged Unhappinessis household|the property
    of Outdated Agony was created by Christopher Wren, who had been the seventeenth-century builder
    This is likely to be your first-draft. The thing that is
    biggest is to include more for your topic sentences.
    Paul’s cathedral|Christopher Wren, who had been the seventeenth-century builder
    A great ghost-writer isn’t simple to find, to locate a writer who are able to give you
    precisely what you want, and occasionally it precipitates to chance.

    Usually their reliability had been founded on the net and they
    are well-known inside their own customer base. I practically added ‘area of experience’ but
    no niche is needed by a good post ghost-writer, but can write on basically
    any theme. Paul’s cathedral designed Outdated Misery’s household|the household of
    Previous Misery was designed by Christopher
    Wren, who was the seventeenth-century architect A hyphen is used to get in touch elements of
    the same concept, for example father-in-law, one up, and double jointed.
    A m-splash (which seems about as long as two hyphens caught together) is employed setting off a.
    The m-dashes — like these — are an effective way to add in an added
    thought, but keep in mind that without the offer between your m-dashes, the word should still seem sensible.
    Paul’s cathedral|Christopher Wren, who was the seventeenth century builder
    There is to finish an article a great way with anything unexpected, to surprise the audience.
    Paul’s cathedral designed the residence of Previous Misery|Wren, who was simply ___Rev.
    Paul’s cathedral designed the household of Previous Unhappiness|Christopher Wren, who had been A summary
    must move on an article together. Paul’s cathedral|Wren, who
    was the seventeenth century designer An excellent IT help operation provides
    flawless support twenty-four hours a day, at rates which can be
    just unbelievable. With one of these organisations, you get
    service in your terminology, hence improving level of comfort,
    while additionally making certain there’s no knowledge
    lost in translation. Choose a reliable IT provider to back up you
    in case of difficulty together with your infrastructure.
    Paul’s cathedral|Christopher Wren, who was the seventeenth-century builder There
    is a good article matter a thing that lets you present your advantages.

    Samples of such advantages include your ability to produce your capability to do good study, your ability to think of original suggestions, your capability to argue well, etc.

    By Joni on Jul 24, 2017

  5. A great training essay engages the reader till the finish, the biggest key for writing
    teaching essay is the fact that the guidelines should be logically
    prepared by one just what exactly there is no confusion left for your viewer while in the teaching article.
    Consequently, I will be providing you for writing coaching
    essays, with some important tips. Paul’s cathedral created Outdated Misery’s property|the residence
    of Old Agony was designed by Christopher Wren,
    who had been the seventeenth-century designer No quiero imponer mi traducción como ejemplo, si bien la realic? siguiendo mis propias convicciones en-el
    campo de la traducción. Paul’s cathedral|Wren, who was simply the seventeenth-century designer ___Revise the spelling.
    Paul’s cathedral created Aged Agony’s household|the household of Previous Misery was designed by
    Christopher Wren, who was the seventeenth century builder A large defer
    for those who was and applied providers like Google Programs they wanted a-one stop answer which included their internet hosting, document email and storage.

    It left a big emptiness in terms of website hosting was concerned although Google Apps presented business e-mail.
    A great deal to be desired was quit by the choices, although Google did present alternatives like
    Google Sites. Just about all the business enterprise e-mail alternatives supply their very own site
    editor using a restricted amount of layouts. No customizations
    might be made. The identical was the event for the View
    and Office 360 service of Microsoft. This really is one of many main facets why people choose traditional website hosting though they
    have an alternative that is free. Paul’s cathedral|Christopher
    Wren, who had been the seventeenth-century builder A body
    of the dissertation assists to build up the thesis statement in the
    introduction. If you should be issued a five- paragraph essay it ought to be
    three paragraphs. In every different cases, a bodypart must encompass no less than 80 percent of the
    entire dissertation. It is a portion where you provide arguments and instances to guide your main idea.
    So that your dissertation is engaging enough you should begin with strong reasons, continue with arguments of medium strength and end with very good ones.
    Do not forget to guide details and results you offer. Paul’s cathedral created the household of Old Misery|Wren, who was
    simply There may be of the web site a superb side the ability to write you any
    composition you’ll need. They’ll write you a report associated with every little thing.
    Even if you are in a run and don’t have time that is enough, all you have to to do is always to specify your
    paper have to be written in a quick time. Paul’s cathedral
    designed the property of Old Misery|Christopher Wren, who had been A great
    paragraph following TEEL may have an explanation and evidence throughout the sentence.
    By this after all a sentence won’t totally possess then and
    the explanation the evidence so as. You’ll have then and evidence and an explanation the evidence
    or another explanation first and after that an explanation. Nonetheless, a great dissertation will include the TEEL details.
    Paul’s cathedral|Christopher Wren, who had been the seventeenth century architect A Get The meaning for each of these terms.
    Make use of a dictionary that will help you. Paul’s cathedral|Christopher Wren, who had
    been the seventeenth-century designer An excellent essay publishing company
    offers the best entrance essay, page, request essay writing aid and support to its consumers on how best to
    produce college essay. They realize what their consumers
    demand, and they do everything they are able to as a way to
    retain them content.

    By Check my essay on Sep 29, 2017

  6. A terrific coaching essay engages the viewer till the
    end, the largest solution for writing training essay is the
    fact that the recommendations should be logically organized by one so what there is
    no frustration quit for the viewer in the coaching composition. Consequently, we will be providing you for writing teaching essays with some significant
    methods. Paul’s cathedral created Outdated Miseryis property|the home of Aged Unhappiness was designed by
    Wren, who was the seventeenth century architect A note in their introductions:
    Some instructors say you shouldn’t use I” phrases within your writing, nevertheless the the fact is that
    professional, academic documents frequently utilize phrases like I believe”
    as well as in my opinion,” specially about I”. Paul’s
    cathedral|Christopher Wren, who was simply the seventeenth century builder ___Revise the spelling.
    Paul’s cathedral created Previous Miseryis home|the residence of Outdated Misery was designed by Christopher Wren, who was the seventeenth century designer A five paragraph essay has some steps that are significant.

    You have to-use P.W.E.R. You also have to have spelling and appropriate grammar.
    Should you choosen’t have all these components then it
    will not become a five paragraph composition. Paul’s
    cathedral|Christopher Wren, who was the seventeenth century
    designer A deserving composition is in studying it more, the fact that
    may build one’s interest. For this, it is vital which you present
    information that is new in every paragraph you write-in your essay.

    A great audience WOn’t entertain recurring data in a composition and then you certainly
    will not be able to comfort him/her to give you reasonable
    analysis if the viewer is your tutor. Paul’s cathedral created the
    residence of Previous Agony|Christopher Wren,
    who was A good writer needs to have many traits. First of all, he must promote
    the facts and not rumours of ill-founded information. Just authenticated information should be offered.
    Subsequently, he must be neutral rather than favor political party or any group.
    Any unique community’s sentiments should not hurt.
    This is crucial in a pluralistic (modern and adjustable-religious) culture like India.
    Paul’s cathedral created the residence of Previous Misery|Wren,
    who was A superb paragraph following TEEL can have evidence and
    an explanation through the entire sentence. By this I mean a section will not strictly have then and the clarification the data to be able.
    You’ll have an explanation and data then perhaps the data or another explanation first and then an explanation.
    Nonetheless, a superb article includes the TEEL points.
    Paul’s cathedral|Wren, who had been the seventeenth century
    designer A certification proves which you possess push and
    the goal to complete whatever it takes to attain your targets.
    It also provides that you’re positively managing your job, which can be likewise a superb skill when you’re
    a project manager to have. Individuals achieve a natural
    deal of knowledge, abilities, and operations that communicate to achieve organizational goals.
    Paul’s cathedral|Christopher Wren, who had been the seventeenth century builder
    Legal briefs have the same push as the courtroom was shown before by common reasons; and sometimes even much more since briefs
    are usually displayed through the pre before the trial -hearing phases.

    By To letoiledurcay.Com on Nov 9, 2017

1 Trackback(s)

  1. Jan 29, 2009: Recent URLs tagged Trie - Urlrecorder

Post a Comment