December 15, 2016

Big Data Analysis Fuels Personalized Medicine

A truly unique feature of the Myeloma Center is its wealth of patient-derived clinical and research data. With more than 11,750 patients over a period of 27 years, the Myeloma Center has amassed a treasure trove of data elements, many at the molecular level, that have the potential to yield new understandings of disease biology and response to treatment. Data from current patients is continually added to the collection. Additionally, the Myeloma Center’s large tissue specimen archive presents the opportunity to mine even more data, utilizing today’s sophisticated analytics tools.

A single patient can generate a lot of meaningful pieces of data — up to 100,000 — based on information gleaned from the 20,000 to 30,000 genes in the human genome. Data are derived from patient samples that are subjected to DNA sequencing, gene expression profiling, and proteomics expression studies and are annotated with various patient information such as age, sex and disease state.

Multiplying so much data by thousands of patients results in “Big Data.” Big Data implies large volume and complexity, such that advanced mathematics and large, high-performance computers are needed. Big Data requires very big computers, massive amounts of storage, and sophisticated mathematics.

Computational biology, also known as bioinformatics, is the field of using computer-based analysis and statistics to understand biology. It covers both basic research (in the laboratory) and translational research (developing clinical applications from basic research), and spans the full spectrum from molecules to human population studies. Computational biology/bioinformatics is a subset of Biomedical Informatics (BMI).

BMI is focused on the management of large data sets in health care. It is a means of organizing and understanding data and turning it into knowledge, with the overarching goal of improving human health, and is an integral part of the search for disease-associated genes. An interdisciplinary field, BMI involves the development, study and application of theories, methods and processes for the generation, storage, retrieval, use and sharing of biomedical data. It encompasses the utilization of existing computational and statistical methods and algorithms, as well as the development of new methods to extract knowledge from the underlying data and advanced decision support systems to improve clinical practice. BMI is integral across the whole spectrum from molecules to populations, bridging basic and clinical research and practice.

At the Myeloma Center, we are striving to better understand the intricate network of molecular processes involved in myeloma. The vast amounts of molecular and clinical data that we have amassed via genome sequencing and other high-throughput techniques (large-scale methods to purify, identify and characterize DNA, RNA, proteins and other molecules) contain crucial information with the potential of leading to development of more effective, targeted therapies. We are mining and integrating these data, and resolving the subtleties involved in the pathways and molecular relationships that support myeloma growth. By identifying molecular patterns that characterize each individual genome and discerning which of these individual variations is related to a disease subset or response to treatment, we can further the development of tools for diagnosis, prognosis and personalized treatment.

We do this, in part, through the identification of disease-related SNPs (single nucleotide polymorphisms) derived from large-scale techniques. Mutations in the genomic code often produce changes in the protein sequence, leading to diseases. The key to approaches that identify disease mutations lies in distinguishing between SNPs that are functionally relevant from those that are not.

Christopher Wardell, Ph.D., an experienced bioinformatician with particular expertise in next-generation sequencing, joined the Myeloma Center in July. Educated and trained in the UK, Wardell was a lead bioinformatician at The Institute of Cancer Research in the UK and a research scientist with the Laboratory for Genome Sequencing Analysis at the RIKEN Center for Integrative Medical Sciences in Japan.

“We are aiming to spot the differences — to see what makes a normal cell become cancerous,” Wardell said. “By comparing the normal genome of a patient to the genome of their tumor, we can identify the DNA changes that predispose and cause someone to develop cancer.”

The ultimate goal is personalized medicine. “We can sequence a person and their cancer and then target treatment to the mutation or signaling pathway that is out of kilter,” Wardell said. “We can get better answers to questions of diagnosis and treatment.”

The more complicated the question, the more samples that are needed. Similarly, to determine how frequently a certain gene is mutated, high- resolution technology is essential.

In terms of sample quantity, the Myeloma Center is unsurpassed. “We have one of the largest repositories of myeloma specimen samples in the world. Using today’s modern tools, we can take current data, compare it with data in the repository, and use this information to direct future research and treatment strategies. This puts us in a distinctive position,” Wardell said.

Having so much data enables drill-down to a very detailed level of information. Given the volume of data, the process is time consuming.

“But, processes that have been slow in the past are speeding up. Computational speed and capacity doubles every 18 months,” Wardell said.

Recognizing the importance of bioinformatics for developing curative therapies, the Myeloma Center has a dedicated team of five specialists, including Wardell, who is the team leader. They are part of the first generation of full-time bioinformaticians, and they are poised to help the Myeloma Center reach new heights in the development of curative therapies.

“What makes us tick is reaching the clinic, feeling like you are making a difference,” Wardell said.

While Wardell and two of his faculty colleagues are focused on the Myeloma Center, their academic appointments are in the Department of Biomedical Informatics, established at UAMS one year ago. The department, directed by Fred Prior, Ph.D., develops computational tools to assess and manage medical and public health information and leverages data and maximizes its potential for improving health and health care.

Prior is the principal investigator for The Cancer Imaging Archive (TCIA). Supported by the National Cancer Institute, the TCIA provides researchers, educators and the general public with a vast, freely accessible, open archive of cancer-specific medical images and metadata ( TCIA is a service that de-identifies and hosts a large archive of medical images of cancer accessible for public download. The data are organized as “Collections,” typically patients related by a common disease (e.g. lung cancer), image modality (MRI, CT, etc.) or research focus. Prior’s group is in the process of hosting radiology images, including PET and CT scans, and gene expression data from the Myeloma Center on TCIA.

Both the Department of Biomedical Informatics and TCIA are valuable resources to the Myeloma Center that help ensure that Wardell and his team have access to sophisticated, state-of-the art technology, information and processes. This, in turn, translates into expanded understanding of cancer biology that will speed the development of precision medicine approaches to curing myeloma and related diseases.


American Medical Informatics Association,

Kann, Maricel G: Advances in translational bioinformatics: computational approaches for the hunting of disease genes. Brief Bioinform. 2010 Jan; 11(1): 96-110. PMID: 20007728