Mining Big Data Holds Keys to Health Improvement
The era of big data is upon us. That is how Stanford President John Hennessey opened the Big Data in Biomedicine Conference, a collaborative effort between Stanford Medicine and the University of Oxford. In a packed room at the Li Ka Shing Center, scientists, physicians, industry representatives and inventors came together to explore the vast opportunities for mining the growing volume of public health data and how it can be used to develop new ways to prevent, diagnose and treat disease.
Over two and a half days, more than forty presenters from universities such as Stanford, Oxford, MIT and Harvard; the Centers for Disease Control; and industry brainstormed about how to use big data to improve health care. At its peak, 300 attendees packed Berg Hall, while another 400 watched a live stream of the conference online. The big message on big data – collaboration is key.
"Big data presents a challenge that is so big and so complex that no single individual, company or institution – no matter how accomplished or illustrious – can solve it alone," said Lloyd Minor, MD, the Carl and Elizabeth Naumann Dean of the School of Medicine, in his opening remarks. He spoke to the sheer volume of data being collected in health care and medicine –everything from information in electronic patient records to DNA sequencing to comprehensive biological data on the mechanisms of disease, treatment, monitoring, clinical trials, pharmaceutical records, medical imaging and disease registries.
Conference organizer Atul Butte, MD, PhD, chief of the division of systems medicine and associate professor of pediatrics, is a believer in big data and its promise for medicine and humanity. "The next big scientific revolution is data mining," he says. "The data is there. It's just a matter of mining it; piecing it together; and finding commonalities between diseases at a molecular level."
Butte believes many of the medical issues we face today could be solved using data that already exists. What is required is redefining the taxonomies of disease on a molecular level, and then using this data to test established therapies on diseases with a similar molecular makeup – a process he likens to Match.com for drugs and disease. "There are most likely other good uses of the drugs we already have," says Butte.
Conference co-organizer Carlos Bustamante, PhD, a professor of genetics at Stanford, says big data gives researchers a means for cataloging genetic variances that influence disease risk, finding common failure modes and disentangling causation and correlation. "As a statistician, it's exciting to think about the opportunities and responsibilities we have," he says.
Tear down the silos
Across the board, one big message kept coming through about big data – it has to be shared, across the country and across the world. Science occurring in individual labs, using local subjects on a small scale is cumbersome and slow, many of the speakers pointed out. Taking data collected on patients from across the globe, determining genetic variances in disease and testing multiple new compounds at once to see their effect is the direction of the future. And it will require new tools and statistical analysis to bring big data insights to the patient level.
"Big data means big sharing, big collaboration, big storage, big complexity and big impact on improving quality of health care," said David Haussler, professor of molecular engineering at the University of California, Santa Cruz and director of the Cancer Genomics Hub, which houses 700,000 data files on cancer. Haussler would like to see similar hubs set up internationally, with global agreements for sharing this data. "The data needed to understand human disease is beyond any one country," he says.
In the United Kingdom, the University of Oxford is sequencing 100,000 patients' genomes to test how to use large genotypic data sets to improve health care. In the private sector, entrepreneurs like keynote speaker Ann Wojcicki, founder of 23andMe, believe in medical data collection and sharing on a consumer level. For just $99, her company will provide consumers with a map of their genome. With 265,000 customers, 23andMe has collected well over 100 million phenotypic measurements. Clients can learn their genetic predisposition to diseases and can choose to share this data with other registered clients. These early adopters are also voluntarily sharing their data as part of the company's public research to uncover genetic links between diseases and find genetic variances that may be causal in disease.
"What if every human's genome was available in the medical record?" asked Euan Ashley, assistant professor in the division of cardiovascular medicine at Stanford. "What types of discoveries could be made then?"
According to Butte, a lot of the answers to important medical questions are trapped inside a matrix of repositories. The trick, he says, is to figure out what questions to ask to get the data to divulge its secrets.
"I don't think enough people study the measurements that have already been made," says Butte. "Hiding within those mounds of data is knowledge that could change the life of a patient, or change the world."
By Grace Hammerstrom