Big data technology – where are we heading?

The age of information has come upon us. Technological advancements in computing coupled with innovative breakthroughs in biomedical research have allowed us to obtain immense data sets via the use of genomic sequencing and other forms of -omics profiling. The large-scale sequencing ENCODE project for example, launched in 2003 with the aim of characterizing all functions of the human genome, has already accumulated about 15 terabytes of data drawn from 1600 experiments performed by ~500 scientists working at 32 institutes worldwide. This investment in large-scale data acquisition has not been in vain as the ENCODE project revealed several surprising key findings. Mainly that even though 1-2% of the genome codes for proteins, about 75% of the human genome is still transcribed into what is termed non-coding RNA which impacts biological function. Furthermore, 90% of all human genetic variants fall within these non-coding regions, spurring a large current investigative effort towards understanding non-coding RNA and its impact in human disease and health.

Other successful applications of big data technology include the screening for genetic mutations that impact decisions of front-line cancer therapy (see biomarkers in cancer post) which has also led to the setup of data depositories such as MyCancerGenome.  This enables information and clinical trial availability about cancer mutations to be easily accessed by physicians, patients and caregivers. In addition, online crowdsourcing to analyse large data sets or solve challenges has been gaining popularity. Just last week I chanced upon Nature website’s open innovation pavillion or InnoCentive which lists a set of challenges often thrown down by pharma companies that offer cash rewards for any solutions provided by the general public. A recent publication also reported the use of crowdsourcing with a US$50,000 incentive prize to come up with algorithms to predict disease progression based on data from a clinical trial involving ALS patients. Interestingly, the computational solutions provided far surpassed predictions from experienced ALS clinicians, and could also reduce clinical trial population by 20% by reducing patient intervariability.

Of course all this requires a large capacity for data storage, management and transfer. Many big companies have gotten in on this, with Amazon Web Services being one of the global leaders, providing CPUs, storage space, memory processing and operating systems at very competitive rates. Other companies such as Cisco, Dell, Microsoft, IBM, GE Healthcare and Intel have also developed services aimed towards the life sciences industry to help store, analyse and manage complex or large datasets. Cloud computing is said to revolutionize the ability to manage and store data yet challenges remain regarding privacy and security. The ability to do parallel processing of data (e.g. Hadoop by IBM) has also provided a significant breakthrough, allowing data to be analyzed or transferred at much greater speeds than before.

An obvious key implementation of big data technology is in making health/medical records of all patients electronic and accessible across healthcare centers. This would not only increase efficiency and speed of patient treatment but would also lead to better and safer administration of drugs with prior knowledge on what drugs the patient is currently on. Creating an electronic record of patient responses to drugs is also another step towards personalized medicine, something I believe we should all strive towards. This is currently being deployed in many countries but has met with several challenges. In the UK for example, the National Healthcare Service (NHS) deployed an electronic health record system in 2005 which subsequently failed and was dismantled in 2010 not without costing taxpayers over $24 billion US dollars due to a lack of healthcare information exchange. Estonia, Jordan and Netherlands are perhaps the most successful countries that have reached/almost reached the goal of a proper functioning electronic health records system.

In conclusion, big data is here to stay. The key is to not be afraid of this, make good friends with bioinformaticians, and get as much exposure to IT as possible. With the advances so far, it is already evident that big data technology can provide boundless advantages towards the betterment of healthcare and research and we should fully embrace it.


Fabricio F. Costa, Big data in biomedicine, Drug Discovery Today, Volume 19, Issue 4, April 2014, Pages 433-440, ISSN 1359-6446,

Aisling O’Driscoll, Jurate Daugelaite, Roy D. Sleator, ‘Big data’, Hadoop and cloud computing in genomics, Journal of Biomedical Informatics, Volume 46, Issue 5, October 2013, Pages 774-781, ISSN 1532-0464,

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s