Data Structures Part 1
I was going through my lessons for Viking Code School (VCS) over the weekend and realized that I really wasn’t taking notes. Though I’m learning a lot about Ruby, testing, etc. I haven’t been writing much down (except for my code which is stored on Github) and that isn’t a good thing. So the realization that I should be blogging about the information I’m learning was born! Here we are: entry number 1 for VCS.
Data Structures:
This week we’ve been focusing on data structures. Not specifically in Ruby but rather on a more granular level. Here are some of the interesting things I’ve learned tonight.
First, generally thinking in terms of data structures was a little difficult at first. I’ve run into this type of thinking when trying to figure out a Project Euler problem and wondering why my implementation is taking 10 minutes to run through. While researching this I came across the concept of taxicab geometry and thought that it was totally cool.
Thinking about data structures lends itself well to the concept of Big O notation and the like (big omega, big theta). I frequently see references to Big O thrown around and having no idea what they meant. Now, I can grasp the basic concepts and realize that having a constant time complexity is much better than the complexities of n^2 that I’m used to building. I was also pointed to this Big O Cheatsheet which I imagine will prove very useful in due time.
Takeaway here: nested loops are bad for time complexity. Know what you’re doing and when to use them properly.
The last lessons I went through tonight took me through some of the standard types of data structures. Stacks and queues seem interesting in general while less useful for the type of programming that I’m looking to get into. VisualGo has some cool visualizations of these types of structures (and others) that will help anyone understand the basic ideas behind almost any data structure.
Then we dove a bit deeper into arrays. Now, arrays seem incredibly easy to me after using them in Ruby and Javascript. But when we started to talk about them in other languages like C it blew my mind a little bit. For instance, I had no idea that in C if you wanted to use an array you would have to identify the type of data being stored and the length of the array BEFOREHAND! That’s crazy. There are too many times that I’ve built arrays absentmindedly and can’t imagine having to think about each and every one I use at that level. But, it does make a bit more sense to have them laid out like that. Increased read and update speed means wonders for some huge data sets.
I’ve heard of linked lists before but never really understood when/where to use them. It seems when someone tries to explain when to use a linked list I end up in a convoluted discussion about a conga line. Anyway, I like their increased ability to insert/delete data at the expense of being able to read quickly. Also, discussing linked lists brought up another thought that hadn’t even crossed my mind until then. Arrays (in C) need to know their length ahead of time so that the computer knows where to store them (it knows how much memory it needs to allocate). If an array grows via an update for example, the computer might need to move the ENTIRE array to a new place in memory so it can fit. Linked lists help eliminate this problem because now they only need a few bits of information to be stored in any one place. One of those bits being a reference to the next item in the list. I never really considered memory or memory size to be a concept worth noting until now.
Lastly we went through hash tables and their relationship to associative arrays. I didn’t really understand the concept of a ‘hash function’ until going through this section. As it turns out, these functions don’t have to be anything too complex. They’re simply a way to convert a given key to an array index. For instance, in a typical dictionary scenario the hash function might simply be related to the first letter of the key (A->0, B->1, C->2, etc.). The problem, though, is making sure that each linked list (or other) pointed to from the hash function is as small as possible while making sure the hash function isn’t too incredibly complex. All things considered, well-made hash tables have all of the benefits of arrays and the positives of linked lists as well on average.
Other Highlights:
- I consumed about half of a jar of Kalamata olives while going through these lessons.
- Songza’s ‘Psybient’ playlist was the background for tonight’s coding session.
Looking forward to doing more research on data structures and algorithms. These lessons are incredibly interesting to me and are potentially areas that I’d like to explore further when I have more time.