My Photo

Clicky

  • Clicky Web Analytics

Your email address:


Powered by FeedBlitz

« Debriefing the Philadelphia WAA Symposium | Main | Social Media and the Sampling Problem »

Comments

Wonderful post!

An analogy comes to mind. We have wonderful tools and models to do hypothesis testing, but we don't have any great models for generating hypotheses in the first place - at least not from statistics.

Really enjoyed your posting and I'm flattered that I was able to get a good discussion going on what I find to be a fascinating subject. And,thanks for the kind words about the talk. I agree that there are real-world challenges that are not easily solved by AI or machine assisted learning alone.

Of course there are very good examples of where this technology CAN compliment the analysis process, but it can’t simply replace it. Finding non obvious patterns in other structures of data using AI and machine learning (beyond free text in the case of the example) is both possible and practical in many cases.

In my opinion it is a combination of using machines for what they are good at… calculations across very large sets of information. Also, we should recognize people for what they are good at: hypothesizing, intuition, creativity, validation and investigative instinct. But the example you point out is a key point to be made on why machine learning is not THE final answer, rather possibly a part of the dialogue.

For the foreseeable future, I agree, we humans are still in control.

Thanks for posting this, Gary - it's very thought-provoking. I agree with you that the role of the "analyst" is more than just asking great questions. I also agree with Ryan's comment above. In fact, when thinking about a standard approach to statistical modeling, there are three areas where the role of the analyst stands out for me:
1. Defining the question.
3. Identifying the meta-data (or semantic relationships) - what you mention above.
5. Selecting the best model - based on the combination of statistical indicators (r2 and VIF in a regression), contextual explainability (is that a word?), and the ability to act on the model's findings.

The other steps are better candidates for automation and ML, IMHO:
2. Collecting (and some cleaning of) the data.
4. Building and refining multiple models.

Great post (and blog overall). I am obviously very late to the discussion but I don't think the relevance of the topic has expired since November :-).

Here is another topic that this blog touches on. Despite the limitations of machine analysis that you very eloquently pointed at, many companies today that have large web presence are still using only a small portion of the data that they can potentially tap into, to understand what is going on with the online portion of their business. We are only at the dawn of BigData (despite the fact that for some firms out there it already is afternoon in that respect) but we can already pinpoint at some of the big opportunities that ability to sift through large amounts of data more cheaply than ever are opening. Nevertheless, a large number of companies are still grappling with how to incorporate BigData into their existing organizational architecture (both technical one and human one). While, as you mention above, human is still (and will remain for a foreseeable future) a central point of this architecture, I'd like to argue that a medium to large size company that has a significant portion (or all) of the revenue coming through its online presence, cannot survive without setting up a BigData shop and incorporating it firmly into its organizational structure. One of the big benefits of BigData is the ability to bring together multiple data sources more cheaply (web analytics data, bid management tools data, advertising server data, logged data etc.) and enable deep analytics using this data of the type that no one of the individual tools can do by themselves.

If one accepts the view above (which is relatively easy nowadays) then here are some questions that I think are important answering:

- What are the best practices of introducing the BigData into organization that is a traditional RDBMS and web tools shop (e.g. should it be in technology organization, analytics organization or somewhere else)?

- What is the best way to introduce the traditional analysis currently being done through web tools (Omniture, Google Analytics etc.) with the deep analysis that can be done (what you call "machine analysis") using BigData framework in the organization that is mostly attuned to using traditional tools only?

I am aware that the answer to these questions may be something that is obvious or apparent to many of the readers and posters here but I think that there is a large audience out there that may be interested in hearing some answers to it.

Thanks again for the insightful post.

This is one of the most useful blog posts I've read--thanks!

I'm new to ML and your ideas on the need for wisdom in analysis have helped make sense of how ML would be put to practical use.

Some machine learning (ML) writers seem to take pride in the mistaken belief that ML algorithms only have to be provided with minimal guidance, and they will then magically find great correlations. However, nowhere else in life would one expect such a lackadaisical approach to work well.

FYI, the philosopher Karl Popper provides some additional insight to the learning challenges described in your post. Popper showed that all observations must be preceded by, and guided by, a hypothesis. Popper illustrated this to his college students by telling them, "observe". They would invariably reply, "observe what?" Observation must always be guided by a hypothesis and purpose--for one reason, otherwise, there is too much information to process. A corollary to Popper's principle, here, is that there is no such thing as a completely objective or unbiased observation.

An application of Popper's principle, here, is that ML searches for correlations must be guided by well-crafted hypotheses about where correlations might be found.

Thanks again,
Jim Yuill, PhD
Lockheed Martin Corp.

The comments to this entry are closed.