Infobright: Big Data Techniques for Digital Measurement
Digital analytics is a perfect example of "big-data" and is, as it happens, also the single most common creator of "big-data" problems in the business world. So it's no surprise that technology vendors focused on big-data challenges are deeply interested in digital measurement. I just wrapped up a webinar with Infobright on the "big-data" aspects of digital measurement. Infobright is a leading player in "columnar" database's, but this wasn't meant to be a technology deep-dive. I focused mainly on the challenges of big-data, some of the techniques we use when tackling big-data (2-Tier Segmentation of course), and some of the benefits of big-data digital analytics. Even if you're a regular reader and have had your fill of 2-Tiered Segmentation, there is some cool new stuff here including a discussion of mobile App measurement and how to model mobile app data in the database. I've also got a short section on digital media analytics and how various types of big-data joins can be used to drive interesting and important business decisions in the publishing realm.
If you're interested in my portion, I think you'd also find the sections of the presentation on Infobright quite useful. The idea behind columnar databases is profound and strikingly simple. Traditional relational databases have always organized data into rows. A customer row, for example, might have twenty or hundred or even more columns all related to a single customer. This organization works very well when you need all the information about one or more customers - a common enough transactional use-case. If, on the other hand, you are trying to select a small set of values from a single field or join two columns together for a small subset of customers, you have to navigate through LOTS of data to get what you want. Unfortunately, that's a very common use-case when it comes to analysis problems. Organization of the data is one big reason why analytics queries are often so painfully slow when executed against traditional databases.
Columnar databases break the row-based paradigm. They organize all the data by columns and use a set of clever techniques to keep columns in alignment. This makes analysis of single fields and joins across columns much, much faster.
One of the beauties of columnar databases is that although the organization of the data is quite different than in a traditional database system, all of the standard relational database toolset is fully applicable. With a system like Infobright, you get vastly improved query performance for many of the more common reporting and analytics access paths (it isn't magic - some kinds of queries will do quite a bit worse than in a traditional system) at a very inexpensive price-point.
If you're interested in getting a copy of the presentation, just drop me a line. You can also listen to the recording of the webinar here... (just register and you'll be taken to the recording landing page).