Archive Page 2

Evaluating a p-value Using Simulation

You get your test statistic on the observed data; then you randomize the data (multiple times) and get your test statistic on the randomized data. Then you see how many times your randomized test statistic is greater/less (depends) than your observed test statistic = x. Then you divide x by the number of simulations/randomizations and assess its significance.

Novel Findings in Publicly Available Data Sets

A group of researchers from the University of Sydney in Australia presented findings they obtained from a publicly available astronomy data set at the 211th meeting of the American Astronomical Society in Austin, Texas. What made it super interesting is that they concluded the Milky Way galaxy is twice as thick as originally thought!!

While I don’t understand the physics behind this, the point is that making data sets available publicly has great value because it expands the pool of researchers who can investigate a particular data set. This helps in validating findings but also in allowing researchers to test new ways of looking at data. Had the data been locked in someone’s closet, the original, presumably incorrect results would have been assumed for a longer period of time. In this case, the implications of the subsequent analysis were huge…a 6,000 light-year difference from the original results.

And it’s fortunate for the PhD student on the project who had the opportunity to work with a data set for her dissertation without having to collect it first. This allowed her to focus on her analysis methods rather than data collection.

See http://www.usyd.edu.au/news/84.html?newsstoryid=2163.

Article Review: Will Genetics Revolutionize Medicine?

This presentation is something I prepared for a class to discuss a 2000 article from the New England Journal of Medicine by Holtzman and Marteau entitled “Will Genetics Revolutionize Medicine?” It’s an older article, but its points may be just as valid today as in 2000.

Article Review: Will Genetics Revolutionize Medicine?

Sharing Cancer Research Data

I read a recent opinion piece in the New York Times from a researcher at the Sloan-Kettering Cancer Center in New York. He states that researchers around the world are trying to find better ways to prevent and treat cancer yet are often not willing to work together or share their data. He points out that the patients who authorized this data to be collected gave it freely, so the data should be available publicly for validation and additional research. He suggests this indicates many researchers care more about their own resume than about the public interest.

There is a fine line…researchers need some motivation to collect data, process it, and analyze it. Publishing is a great motivation because it can mean more funding, prestige, and an avenue to obtain feedback. If they had to give up their data immediately, they would have to compete with other researchers who hadn’t made these efforts. On the other hand, as the author points out, the data should not be owned completely by the researchers. Sharing data should advance the cause of science.

For many types of Informatics research, authors are required to publish their data in publicly available repositories as their papers are published. These create a happy medium…findings can be published and recognition rewarded, yet the data are made available for others to validate their findings or do secondary research.

This topic is important to me because I will rely on such data repositories to do my PhD research without having to secure funding and find study subjects, etc. I am getting “recycled” data that have been used by others; but the beauty is that I think I have some new and interesting ways to look at the data that were not considered by the original authors.

GEO and CGEMS are examples of such data-sharing repositories. caBIG is an infrastructure being developed to help cancer researchers share data (my PhD advisor is involved in this effort).

1000 Genomes Project Is a Misnomer But a Good Start

Scientific American is reporting that the Sanger Institute is starting a project to sequence a large portion (or all) DNA for 1000 individuals. The name 1000 Genomes Project is a bit of a misnomer because they are not getting the full genome sequence for all 1000. They are getting the full genome sequence of six people, detailed genome scans (whatever that means) for 180 people of various ethnicities, and less-detailed scans for the rest.

They are not collecting disease information about these people. So it sounds like they are just trying to make this a cross between the Human Genome Project and the HapMap project. These genomes will be a reference that researchers who are studying disease associations can use to compare with their samples.

These data alone will not be incredibly helpful for disease studies because by the time you stratify by ethnicity, the sample size you can compare with will be pretty small. If you combine that with the fact that you know nothing about their age or other demographics, the usefulness is questionable. However, I think this can be a good starting point and possibly a catalyst for future projects.

Regardless, it’s clear that genome sequence data is going to grow exponentially over the next few years until, before too long, it will be cheap and easy to do a full genome scan on thousands or millions of people.

Biomedical Research and Innovation

Andy Grove, former CEO of Intel and fabled leader, trashed the biomedical research community in the US, stating that the current system discourages innovation. He has some interesting points.

It seems he’s simplifying quite a bit. I’m sure there are problems in the system and that innovation is reduced by near-sighted focusing on smaller ideas that can get funded rather than bigger questions that can win Nobel Prizes. However, I think those smaller pieces will continue coming together to address big problems over the coming years, despite some inefficiencies in the system (which should be addressed).

In a way, this reminds me of the Bill and Melinda Gates Foundation that aims to run their non-profit like a business to maintain high efficiency and wise use of resources. Maybe Grove can use his assets and leadership to show that something similar can be accomplished in biomedical research.

http://www.newsweek.com/id/68221/page/2

Educational Programs in Biomedical Informatics

When I decided to leave my job and pursue a graduate degree in informatics, I explored various programs. Below is a list of a few US universities that offer advanced degrees in informatics (in alphabetical order).

The following map (borrowed from http://www.nlm.nih.gov/ep/GrantTrainInstitute.html) shows the locations that have training grants from the National Library of Medicine.

National Library of Medicine training programs

PBS Special on Epigenetics

I came across a PBS show that talks about an emerging area of genetics research called epigenetics. Two people (for example, identical twins) can have the same genetic makeup yet be very different. Scientists are discovering that these differences are partially attributable to “epigenetic” changes that don’t physically alter a person’s DNA but change how the DNA is activated (or not). As these changes occur in sex cells, they are also propagated to a person’s posterity.

As people go through life, they acquire more and more of these changes, and it varies from person to person depending on lifestyle factors such as diet, smoking, environmental exposures, etc. This may explain why one identical twin gets a heritable form of cancer while the other does not, even though they have exactly the same DNA (and no mutations have occurred).

One key realization is that the way we choose to live can not only impact us negatively (or positively), but it can have a real impact on our posterity. Another realization is that if scientists can better understand epigenetics, they can devise treatments that address it. The PBS video explains it in an understandable way.

What I’m Reading: The Use and Analysis of Microarray Data

Nice overview article about microarray data and how it can be analyzed.

http://www.fmv.ulg.ac.be/genmol/Essential_genomics/References/Butte_2004.pdf

Designer Genomes?

Craig Venter’s latest pursuit is to artificially construct a chromosome of a simple bacterium and then insert it into an existing bacterium of another species with the goal of converting that organism to that specified in the artificial DNA. Lots of interesting ethical questions if you think about the long-term implications of this kind of capability.

http://www.guardian.co.uk/science/2007/oct/06/genetics.climatechange 

« Previous PageNext Page »


Bookmarks