Strength in numbers: Collaborating to conquer big data
03.05.13
When Harvard researchers asked computer coders to write software analysing immune-system genes, they faced a challenge: The average computer programmer doesn’t know much about gene sequences. To address this unique linguistic barrier, gene sequence was translated to coder-friendly language – and equated to a string, a programming term referring to a series of letters and numbers. Such unique solutions are becoming increasingly necessary, as scientists and coders increasingly work together to tackle a rapidly growing volume of research data – a phenomenon known as “Big Data”.
An unprecedented amount of healthcare data is being generated today, thanks to a variety of tools, from genomic sequencing machines to electronic health records. This data comes from biomedical researchers, healthcare professionals, and even patients themselves, through the use of health-monitoring smart phone applications. If we can harness this data, it can help us improve our understanding of disease, and pinpoint new and improved therapies – more efficiently than ever before.
Big Data has Big Challenges: Hurdles to maximising big data’s value
The potential is there – but there are hurdles. Gathering big data is itself a challenge, due to semantic differences, legal barriers, and fragmented databases, among other reasons. Once data is gathered, ways need to be found to organise, visualise and mine it. A January 2013 report from the OECD discusses the challenges faced in efforts to maximise big data value; while it focuses on social sciences data, its conclusions regarding the “big data challenge” are applicable across disciplines.
One concept is emphasised repeatedly in the report – and it’s an essential one when considering big data in the R&D context: Collaboration. If we are to realise the maximum value promised by data, collaboration is a must: among different institutions, both public and private; among professionals from different disciplines; among researchers from different organisations; and among national and international bodies.
Strength in Numbers: Collaborating to conquer big data
Recognising the potential big data holds for healthcare, EFPIA is supporting the May 23 event “Smarter Data for Europe”. The conference will examine how diverse stakeholders from both the public and private sectors can join forces to maximise the utility of big data – in healthcare, energy, and other domains. Panelists include representatives from IBM, the European Commission, academia, and the pharmaceutical industry.
Proof of how big data can advance research can be found in projects funded by the Innovative Medicines Initiative, Europe’s largest public-private partnership and a joint endeavour between EFPIA and the European Commission. The NEWMEDS consortium has generated the largest databases on schizophrenia and antidepressants trials, and the treated depressed population. The Pharma-Cog project is using big data to develop and validate new tools to test candidate drugs for the treatment of Alzheimer’s symptoms. By compiling databases of clinical trials, and combining these with blood tests, brain scans, and behavioural tests, it aims to advance knowledge on Alzheimer’s progression and the effect of candidate drugs.
Another example is seen in Harvard University’s efforts to engage computer programmers in writing genetic analysis software. Together with TopCoder.com, the researchers launched a $6,000 contest for computer programmers, receiving 122 entries. Many entries ran significantly faster than existing software designed for the same purpose, and the top performers are now available for download.
What all of these initiatives have in common is their emphasis on collaboration. Only by pooling expertise and resources, can we hope to conquer the mass of big data before us, and find ways to utilise it that will ultimately improve patient wellbeing. For collaboration to work, we must cross boundaries. We must adapt and find common ground with individuals from different backgrounds. We might even have to refer to a gene sequence as a string.