Friday, June 29, 2012

Cassandra talk at SoCal Codecamp

I always enjoy giving talks at the SoCalCodecamp in San Diego and have been doing so since it's inception. This time around I was talking about Amazon Web Services and Cassandra.

I was planning to talk about data modeling and some performance tips but after I asked "Who has done NoSQL before" and I only got half a hand showing up decided to change the topic to "What is NoSQL and how does Cassandra fit". Unfortunately only two or three attendees had data needs spanning more than one mysql instance and one always compared Cassandra to an LDAP server.

The point I was trying to make is that with the advent of Software-as-a-Service (which drives the number of users up) and Big Data (basically we can track much more info about our users than before) we end up with more data we can handle. In previous times we would have some memcached paired with a (sharded) mysql cluster but that is not "easy" to set up so Cassandra and the likes have taken it place offering some more simplicity in the setting up and scaling up.

Now if we embark on a NoSQL journey we need to be aware that there are 50+ databases which are optimized for different use cases (e.g. Couch documents) and a different survival rate. This is bleeding edge! NoSQL is maybe around for 10-15 years whereas SQL is probably approaching its 50th. So if there is no pressing need to go NoSQL play it safe --

The step from SQL to NoSQL is a big one and it took me some time to unlearn SQL habits and adapt to the NoSQL ones (which was easier for me since I was doing Hibernate for all my stuff and also heavily using Key-Value caches). To show you my path: I went to the NoSQL summer reading group and listened to Tim Anglade while I was in Seattle. Having been a user of memcached I started to explore membase. Then I took the job which required Cassandra and I read up on it. I had the commands down in no-time but it was another steep learning curve aided by some Cassandra training until I was able to model data.

What I am trying to say that this is hard and the learning curve is steep. You need to be either curious, enjoy living on the edge, or have the pain that SQL doesn't solve your problems.