Going Beyond the Numbers: Context-Sensitive Data Mining

This Wired article talks about the NetFlix data mining competition. They have offered a $1 million prize to whomever can beat their movie-recommendation algorithm by 10%. That seems like it should be completely feasible. Yet complex math hasn’t yet been the solution. They have reached 8+% improvements, but so far nobody has cracked that 10% barrier and claimed the prize.

Common sense tells me it should not be so hard to attain such an improvement, but of course that’s easy for me to say…

A “psychologist” has done quite well so far by taking into account “human factors” in addition to math. The story is dramatized for effect, but it illustrates the value in thinking about the context when trying to solve difficult quantitative problems. Otherwise, you may be shooting too much in the dark, and the high dimensionality of the data gets in the way. No matter how complex your model, pure math won’t always cut it; and as the author suggests, you may be prone to overfitting the model.

There are parallels in mining genetic data sets.  There have to be betters ways to look at these data sets and take the biological context into account. Everyone is excited about pathway analysis, and I can see some logic in that. But I think that’s only the beginning. I’d tell you my other ideas, but that might give away my strategic advantage in getting published in Nature or Science. ;)

0 Responses to “Going Beyond the Numbers: Context-Sensitive Data Mining”



  1. No Comments Yet

Leave a Reply




Top Posts

  • None

Bookmarks