Trie as you Might

Categories: Data Structures

A trie is a data structure that makes it easy to search for words or prefixes in a corpus. The structure of the trie reduces the complexity of searching for a word or a prefix down from O(n) where n is the number of words in the dictionary to O(m)where m is the length of the query [1]. Pretty sweet, huh?

Visualization of a Trie, from wikipedia.

So when I ran across a leetcode question which needed a trie, I realized that I’d never actually written one before. Hence this blog post. Before I get into my implementation I want to give a huge shout out to Shubhadeep Roychowdhury who wrote an awesome blog post on the trie structure. I based a lot of my code on his implementation of the trie.

With that being said, I built a few additions to Roychowdhury’s trie. First of all there is a Trie class itself which contains TrieNode‘s. I also built out the __contains__ magic method so you can search within your trie using the in keyword. So here we go:

class TrieNode:
    '''
    Simple TrieNode, contains all we need
    for the larger Trie data structure.
    '''

    def __init__(self, char: str):
        self.char = char
        self.children = []
        self.word_complete = False
        self.counter = 1


class Trie:

    def __init__(self):
        self.root = TrieNode(None)

    def __contains__(self, word: str):
        '''
        Allow us to use the `in` keyword with
        the Trie to see if a 'word' is `in` the Trie.
        '''
        node = self.root
        for char in word:
            found_in_children = False
            for child in node.children:
                if child.char == char:
                    found_in_children = True
                    node = child
                    break
            if not found_in_children:
                return False
        return node.word_complete is True

    def add(self, word: str):
        '''
        Add a word to the Trie.
        '''
        node = self.root
        for char in word:
            found_in_children = False
            for child in node.children:
                if child.char == char:
                    child.counter += 1
                    node = child
                    found_in_children = True
                    break
            if not found_in_children:
                new_node = TrieNode(char)
                node.children.append(new_node)
                node = new_node
        node.word_complete = True

    def find_prefix(self, prefix: str) -> (bool, int):
        '''
        Return True with the number of occurrances a given
        prefix occurs if the prefix is in the Trie. Otherwise
        return False.
        '''
        node = self.root
        if not node.children:
            return False, 0
        for char in prefix:
            char_not_found = True
            for child in node.children:
                if child.char == char:
                    char_not_found = False
                    node = child
                    break
            if char_not_found:
                return False, 0
        return True, node.counter


if __name__ == '__main__':
    t = Trie()
    t.add(1)
    t.add('hackathon')
    t.add('hack')
    assert t.find_prefix('hac') == (True, 2)
    assert t.find_prefix('hack') == (True, 2)
    assert t.find_prefix('hackathon') == (True, 1)
    assert t.find_prefix('ha') == (True, 2)
    assert t.find_prefix('hammer') == (False, 0)
    assert ('hack' in t) is True
    assert ('hackathon' in t) is True
    assert ('hac' in t) is False
    print('ALL TESTS PASSED')

Side note, see those type hints?! Apparently in python 3.5 they added them as ways to help indicate the types that are expected (and returned) from functions. Python is still totally dynamically typed, but they serve as hints to the reader.

Anyway, the code is pretty self explanatory but if you have any questions please leave them in a comment below. Also, if you use this code note that it is not battle tested, so act accordingly.

«
»