Trie as you Might

A trie is a data structure that makes it easy to search for words or prefixes in a corpus. The structure of the trie reduces the complexity of searching for a word or a prefix down from O(n) where n is the number of words in the dictionary to O(m)where m is the length of the query [1]. Pretty sweet, huh?

Visualization of a Trie, from wikipedia.

So when I ran across a leetcode question which needed a trie, I realized that I’d never actually written one before. Hence this blog post. Before I get into my implementation I want to give a huge shout out to Shubhadeep Roychowdhury who wrote an awesome blog post on the trie structure. I based a lot of my code on his implementation of the trie.

With that being said, I built a few additions to Roychowdhury’s trie. First of all there is a Trie class itself which contains TrieNode‘s. I also built out the __contains__ magic method so you can search within your trie using the in keyword. So here we go:

Side note, see those type hints?! Apparently in python 3.5 they added them as ways to help indicate the types that are expected (and returned) from functions. Python is still totally dynamically typed, but they serve as hints to the reader.

Anyway, the code is pretty self explanatory but if you have any questions please leave them in a comment below. Also, if you use this code note that it is not battle tested, so act accordingly.

Data Structures Part II

Well it’s late and I’m trying to go to bed. But before I wrap up I’ll quickly go over what I learned tonight.

Structs:

Structs, although pretty simple, seem like a nice way to organize data. Essentially, this is the definition of a struct:

So nothing too crazy but a clearer and more “Ruby” I’d say. I’d like to do some further reading on Structs like that recommended here.

Graphs & Trees:

We also went over graphs and trees tonight (though not incredibly deeply). I’m excited to work with trees in our future projects and can see the glorious power of graphs. Everything regarding graphs was brought back to the idea of a social network and that made a bit more sense.

Like I said, I don’t have much to talk about tonight but since I’m trying to get in the habit of blogging after I code it was more important to get something down rather than nothing at all.

Data Structures Part 1

I was going through my lessons for Viking Code School (VCS) over the weekend and realized that I really wasn’t taking notes. Though I’m learning a lot about Ruby, testing, etc. I haven’t been writing much down (except for my code which is stored on Github) and that isn’t a good thing. So the realization that I should be blogging about the information I’m learning was born! Here we are: entry number 1 for VCS.

Data Structures:

This week we’ve been focusing on data structures. Not specifically in Ruby but rather on a more granular level. Here are some of the interesting things I’ve learned tonight.

First, generally thinking in terms of data structures was a little difficult at first. I’ve run into this type of thinking when trying to figure out a Project Euler problem and wondering why my implementation is taking 10 minutes to run through. While researching this I came across the concept of taxicab geometry and thought that it was totally cool.

Hey OThinking about data structures lends itself well to the concept of Big O notation and the like (big omega, big theta). I frequently see references to Big O thrown around and having no idea what they meant. Now, I can grasp the basic concepts and realize that having a constant time complexity is much better than the complexities of n^2 that I’m used to building. I was also pointed to this Big O Cheatsheet which I imagine will prove very useful in due time.

Takeaway here: nested loops are bad for time complexity. Know what you’re doing and when to use them properly.

The last lessons I went through tonight took me through some of the standard types of data structures. Stacks and queues seem interesting in general while less useful for the type of programming that I’m looking to get into. VisualGo has some cool visualizations of these types of structures (and others) that will help anyone understand the basic ideas behind almost any data structure.

Then we dove a bit deeper into arrays. Now, arrays seem incredibly easy to me after using them in Ruby and Javascript. But when we started to talk about them in other languages like C it blew my mind a little bit. For instance, I had no idea that in C if you wanted to use an array you would have to identify the type of data being stored and the length of the array BEFOREHAND! That’s crazy. There are too many times that I’ve built arrays absentmindedly and can’t imagine having to think about each and every one I use at that level. But, it does make a bit more sense to have them laid out like that. Increased read and update speed means wonders for some huge data sets.

I’ve heard of linked lists before but never really understood when/where to use them. It seems when someone tries to explain when to use a linked list I end up in a convoluted discussion about a conga line. Anyway, I like their increased ability to insert/delete data at the expense of being able to read quickly. Also, discussing linked lists brought up another thought that hadn’t even crossed my mind until then. Arrays (in C) need to know their length ahead of time so that the computer knows where to store them (it knows how much memory it needs to allocate). If an array grows via an update for example, the computer might need to move the ENTIRE array to a new place in memory so it can fit. Linked lists help eliminate this problem because now they only need a few bits of information to be stored in any one place. One of those bits being a reference to the next item in the list. I never really considered memory or memory size to be a concept worth noting until now.

Lastly we went through hash tables and their relationship to associative arrays. I didn’t really understand the concept of a ‘hash function’ until going through this section. As it turns out, these functions don’t have to be anything too complex. They’re simply a way to convert a given key to an array index. For instance, in a typical dictionary scenario the hash function might simply be related to the first letter of the key (A->0, B->1, C->2, etc.). The problem, though, is making sure that each linked list (or other) pointed to from the hash function is as small as possible while making sure the hash function isn’t too incredibly complex. All things considered, well-made hash tables have all of the benefits of arrays and the positives of linked lists as well on average.

Other Highlights:

  • I consumed about half of a jar of Kalamata olives while going through these lessons.
  • Songza’s ‘Psybient’ playlist was the background for tonight’s coding session.

Looking forward to doing more research on data structures and algorithms. These lessons are incredibly interesting to me and are potentially areas that I’d like to explore further when I have more time.