Researchers Get More Than $23 Million to Launch Centers for Big-Data Research
February 2015
The National Institutes of Health has awarded two grants, totaling more than $23 million, to establish two centers of excellence at Stanford for big data research. The grants are among 11 awarded to launch centers of excellence in big data research under an NIH initiative known as Big Data to Knowledge, or BD2K.
Biomedical big data has the potential to advance the understanding of human health and disease according to the BD2K initiative. But the lack of appropriate tools, poor data accessibility and insufficient training are major impediments to rapid translational impact. The two new Stanford centers will focus on ways to categorize, organize and consolidate diverse data from laboratory studies, clinical notes and wearable devices to allow scientists to compare and combine study results and draw more accurate conclusions as they develop medical therapies.
CEDAR tackles the metadata
The first award of about $11 million over four years went to Mark Musen, MD, PhD, to establish the Center for Expanded Data Annotation and Retrieval (CEDAR). In an era when data are generated at rates and in quantities never before imaginable, there is an urgent need to understand the structure of datasets and the experimental conditions under which they were produced. The ultimate big data challenge lies not in the data, says Musen, but in the metadata —the machine-readable descriptions that provide data about the data. It is not enough simply to put data online; data are not usable until they can be "explained" in a manner that both humans and computers can process.
"If there is one obstacle to the future of big data in biomedicine and to the breakthroughs that will result from large-scale exploration of online datasets, it is that people hate metadata," says Musen, professor of medicine, biomedical informatics research.
The metadata problem requires an end-to-end solution — from the creation of easy to use, standardized templates, to filling out such templates using ontology-based terms, to archiving the metadata and finding, exploring and reusing datasets.
"Our approach is innovative because it addresses the metadata problem for the first time in a holistic manner," says Musen. "People want a "Google" for data and it doesn't exist right now. And it will never exist until the data are adequately described so that a Google-like system can find them."
Biomedical investigators such as Musen worry that the big data revolution will fizzle out if it continues to be difficult for scientists to locate their colleagues' experimental datasets online, to glean how the experiments actually were performed and to understand how the data should be interpreted.
CEDAR will collaborate with the Human Immunology Project Consortium, HIPC, to create a test bed for studying the use of its methods throughout the full lifecycle of data annotation, data acquisition and data analysis in an industrial-scale setting.
Enhancing mobility through data integration
The second grant, roughly $12 million over four years, went to Scott Delp, PhD, professor of bioengineering and mechanical engineering, for the Mobility Data Integration to Insight project, known as the Mobilize Center.
As Delp points out in his proposal, mobility is essential for human health. Yet many conditions, such as cerebral palsy, osteoarthritis and obesity limit mobility at an enormous personal and societal cost. The Mobilize Center seeks to transform the field of mobility research by developing essential tools for data analysis that will advance research to prevent, diagnose and reduce impairments that limit human movement.
The center will develop and disseminate a range of novel data science tools, including modeling and analysis methods to predict and improve the outcomes of surgeries in children with cerebral palsy and gait pathology; to identify new approaches to optimize mobility in individuals with osteoarthritis, running injuries and other movement impairments; and to discover methods that motivate overweight and obese individuals to exercise more and in ways that promote joint health.
"The proliferation of devices monitoring human activity, including mobile phones and an ever-growing array of wearable sensors, is generating unprecedented quantities of data describing human movement, behaviors and health," says Delp, program director and principal investigator of the Mobilize Center. "Yet there is a dearth of methods for analyzing these massive, heterogenous datasets." All of this data, created by diverse sources, remains siloed in isolated labs and minimally utilized.
"Mobility research is severely limited by the inability to integrate and analyze the vast quantity of data," says Delp. "Our Center will integrate and analyze mobility big data from a variety of sources to help researchers and clinicians answer a broad range of questions. With the insights gained from subjecting these massive amounts of data to our state-of-the-art analytical techniques, we hope to enhance mobility across a large segment of the population."