Archive for the 'Article Review' Category

Going Beyond the Numbers: Context-Sensitive Data Mining

This Wired article talks about the NetFlix data mining competition. They have offered a $1 million prize to whomever can beat their movie-recommendation algorithm by 10%. That seems like it should be completely feasible. Yet complex math hasn’t yet been the solution. They have reached 8+% improvements, but so far nobody has cracked that 10% barrier and claimed the prize.

Common sense tells me it should not be so hard to attain such an improvement, but of course that’s easy for me to say…

A “psychologist” has done quite well so far by taking into account “human factors” in addition to math. The story is dramatized for effect, but it illustrates the value in thinking about the context when trying to solve difficult quantitative problems. Otherwise, you may be shooting too much in the dark, and the high dimensionality of the data gets in the way. No matter how complex your model, pure math won’t always cut it; and as the author suggests, you may be prone to overfitting the model.

There are parallels in mining genetic data sets.  There have to be betters ways to look at these data sets and take the biological context into account. Everyone is excited about pathway analysis, and I can see some logic in that. But I think that’s only the beginning. I’d tell you my other ideas, but that might give away my strategic advantage in getting published in Nature or Science. ;)

Article Review: Will Genetics Revolutionize Medicine?

This presentation is something I prepared for a class to discuss a 2000 article from the New England Journal of Medicine by Holtzman and Marteau entitled “Will Genetics Revolutionize Medicine?” It’s an older article, but its points may be just as valid today as in 2000.

Article Review: Will Genetics Revolutionize Medicine?

Sharing Cancer Research Data

I read a recent opinion piece in the New York Times from a researcher at the Sloan-Kettering Cancer Center in New York. He states that researchers around the world are trying to find better ways to prevent and treat cancer yet are often not willing to work together or share their data. He points out that the patients who authorized this data to be collected gave it freely, so the data should be available publicly for validation and additional research. He suggests this indicates many researchers care more about their own resume than about the public interest.

There is a fine line…researchers need some motivation to collect data, process it, and analyze it. Publishing is a great motivation because it can mean more funding, prestige, and an avenue to obtain feedback. If they had to give up their data immediately, they would have to compete with other researchers who hadn’t made these efforts. On the other hand, as the author points out, the data should not be owned completely by the researchers. Sharing data should advance the cause of science.

For many types of Informatics research, authors are required to publish their data in publicly available repositories as their papers are published. These create a happy medium…findings can be published and recognition rewarded, yet the data are made available for others to validate their findings or do secondary research.

This topic is important to me because I will rely on such data repositories to do my PhD research without having to secure funding and find study subjects, etc. I am getting “recycled” data that have been used by others; but the beauty is that I think I have some new and interesting ways to look at the data that were not considered by the original authors.

GEO and CGEMS are examples of such data-sharing repositories. caBIG is an infrastructure being developed to help cancer researchers share data (my PhD advisor is involved in this effort).

What I’m reading: Genome-wide association studies for common diseases and complex traits

Well-written article that describes the challenges and potential of genome-wide association studies as of 2005. It’s 12 dense pages but worth the time to read.

http://www.nature.com/nrg/journal/v6/n2/full/nrg1521.html

What Is a Gene?

I came across an article published by researchers at Yale University who are involved in the ENCODE Project. I love the article because it gives a nice history of how our understanding of genes has evolved over the past 150 years. It was also intriguing because it explains how the ENCODE research may contribute to a drastically revised understanding of what a gene is and how it functions.

I recommend this article if anything as a means to understand the history of genes.


Bookmarks