If you're interested in digital segmentation and analytics warehousing, you might want to check out the webinar I have this coming week in conjunction with iJento and CheapFlights. We're going to start the webinar with CheapFlights and a case-study of how they are using an analytics warehousing for targeting. Then I'm going to cover our digital segmentation (2-Tier) models and how they support this type of work. If you're a regular reader (or webinar listener), this may not break a lot of new ground for you; EXCEPT for a section on three strategies for building Visit-based segmentations. I plan to blog on that next week, but why not get a head start by listening in! Finally, Barry Parshall of iJento is going to talk about their solution (a robust, SQL-Server based analytics warehouse) and the Semphonic/iJento partnership. So even if you know (most) of what I'm going to say, I think the first and last parts will be valuable. We're doing versions for EU and US convenient times - so check it out even if you're EU-based.
iJento is one of the newer Semphonic partnerships and they fit right into our broader strategy of bringing customer analytics and digital segmentation to digital measurement world across a variety of technology platforms. U.K. based iJento provides a full-service (SaaS or on-premise) technology stack for analytics warehousing based on Microsoft's SQL-Server platform. Full-service means they bring ETL, integration (including Celebrus), optional hosting, and UI software to the table - as well as all the benefits and technology of being on one of the two most popular traditional relational database solutions on the planet.
I know...that isn't nearly as sexy as Hadoop. But that's part of what I want to talk about, because while Hadoop is cutting-edge cool, it's far from the right solution for every enterprise seeking to build an analytics mart. I'm not knocking Hadoop. We have about a dozen clients running Hadoop clusters and we even run our own cluster in the cloud (largely for knocking around).
If you have really large big data analysis problems, Hadoop may be the only plausible technology stack to consider. But when I say really large big data problems, I mean it. I'm not talking a couple hundred million rows of analytics data. That's not Hadoop country. If you migrate to Hadoop when you don't have to, you'll pay a real price. Sure Hadoop is free, but it's still a very rough-and-ready technology stack. You won't get anything like the tools, the polish and robust ecosystem that come with Oracle, SQL-Server or established enterprise platforms like Netezza or Teradata. That means you're going to spend LOTS more in custom consulting and development getting to (or at least somewhere near) the same level of capability. Hadoop isn't meant (at least so far) to provide many of the functions of a traditional data warehouse. It's a great ETL platform. It's a great exploration platform for teams sophisticated enough to use programmatic approaches to data analysis. It's great for provisional data analysis. It's not a customer data warehouse and it's not ideal for many traditional enterprise analytics requirements even around digital. Maybe someday it will be, but for now, with Hadoop, you're still on the bleeding edge of a pretty immature technology.
Here's how I see the market for big data analytics in terms of capability and size:
In other words, there are LOTS of organizations that have big data needs that are still serviceable with solutions like SQL-Server and Oracle. Maybe you don't consider those applications big data. Fair enough. I'll say it differently. There are LOTS of organizations that have digital analytics warehousing needs that are still serviceable with solutions like SQL-Server and Oracle. And when you push systems like Netezza or Aster into the mix, a very heavy majority of enterprises can meet their digital analytics needs. Most organizations simply don't have "Google" like problems with big data.
I realize this may run counter to a fair number of people's actual experience. If your enterprise has loaded digital data into Oracle or SQL-Server, you may well have had hands-on experience that suggests these technologies aren't up to the job - even with data volumes far short of the billions. We see some remarkably slow and poorly designed analytics warehouses. That's one of the reasons that solutions like Netezza are deservedly popular; they eliminate much of the need for clever design to generate good performance.
In the traditional relational database world, however, good design is a necessity. You can get terrible analytics performance on a relational system with a fairly small number of rows! It's also an enabler. A well designed relational model and architecture can make it possible to effectively support digital analytics within the traditional technology stack. Will it be as screamingly fast as Hadoop? No, but it will provide far better tools, far more options when it comes to resourcing and support, and far more integration opportunities. It will also be quite a bit cheaper when it comes to TCO (Total Cost of Ownership) and the time to value will be significantly less.
This need for good design is an undeniable risk point. One of the nicer aspects of the iJento solution is that you remove much of that risk. iJento provides the fundamental model and design architecture and, in the SaaS model, your costs in getting a system up and running so that you can measure the real-world performance are quite moderate.
How does the work that Semphonic does with data models and segmentation fit into our partnerships with companies like iJento and Infobright? Our partners provide the technology stack and, in iJento's case, the integration services and basic architecture and model. They'll take care of some of the more challenging details of the low-level data model (things like partitioning and indexing) and the ETL on multiple data sources. They also provide upstream UI tools in addition to the rich SQL-Server stack.
Our work on digital segmentation builds an additional layer on top of these systems. It makes the data more understandable and accessible and makes aggregations much more consistently useful. In next week's post, I plan to talk about three strategies for integrating our segmentation work into your enterprise with solutions like iJento and Infobright but also up and down the full set of possible technology stacks.
The work we've done in building a digital data model is meant to serve every level of this market: from SQL-Server to Hadoop. It works at the highest levels and the lowest. Understanding how to make sense of your data isn't platform specific.
It is, however, an integral part of a really good traditional relational model and it helps those relational systems scale to larger volumes by reducing the need to constantly hit the detail level data. Systems like Hadoop and Netezza are built to do precisely that on a routine basis. That doesn't mean that you don't need a digital segmentation to use the data well - even on Hadoop. It does mean that a digital segmentation won't yield performance benefits in the Hadoop world - it will simply drive better data usage. In the traditional relational technology stack, keeping the vast majority of queries out of the detail level is critically important.
SQL-Server based solutions aren't for every organization. I wouldn't recommend iJento to Turner Broadcasting, Walmart or Expedia. It's not the right fit. But if you're one of the many mid and even very large enterprises whose digital data volumes aren't in the billions, there are better choices than Hadoop for your analytics warehousing needs and iJento may well be one of those.