As technologies like single-cell genomic sequencing, enhanced biomedical imaging, and medical “internet of things” devices proliferate, key discoveries about human health are increasingly found within vast troves of complex life science and health data.
But drawing meaningful conclusions from that data is a difficult problem that can involve piecing together different data types and manipulating huge data sets in response to varying scientific inquiries. The problem is as much about computer science as it is about other areas of science. That’s where Paradigm4 comes in.
The company, founded by Marilyn Matz SM ’80 and Turing Award winner and MIT Professor Michael Stonebraker, helps pharmaceutical companies, research institutes, and biotech companies turn data into insights.
It accomplishes this with a computational database management system that’s built from the ground up to host the diverse, multifaceted data at the frontiers of life science research. That includes data from sources like national biobanks, clinical trials, the medical internet of things, human cell atlases, medical images, environmental factors, and multi-omics, a field that includes the study of genomes, microbiomes, metabolomes, and more.
On top of the system’s unique architecture, the company has also built data preparation, metadata management, and analytics tools to help users find the important patterns and correlations lurking within all those numbers.
In many instances, customers are exploring data sets the founders say are too large and complex to be represented effectively by traditional database management systems.
“We’re keen to enable scientists and data scientists to do things they couldn’t do before by making it easier for them to deal with large-scale computation and machine-learning on diverse data,” Matz says. “We’re helping scientists and bioinformaticists with collaborative, reproducible research to ask and answer hard questions faster.”
Stonebraker has been a pioneer in the field of database management systems for decades. He has started nine companies, and his innovations have set standards for the way modern systems allow people to organize and access large data sets.
Much of Stonebraker’s career has focused on relational databases, which organize data into columns and rows. But in the mid 2000s, Stonebraker realized that a lot of data being generated would be better stored not in rows or columns but in multidimensional arrays.
For example, satellites break the Earth’s surface into large squares, and GPS systems track a person’s movement through those squares over time. That operation involves vertical, horizontal, and time measurements that aren’t easily grouped or otherwise manipulated for analysis in Relational database systems.
Stonebraker recalls his scientific colleagues complaining that available database management systems were too slow to work with complex scientific datasets in fields like genomics, where researchers study the relationships between population-scale multi-omics data, phenotypic data, and medical records.
“[Relational database systems] scan either horizontally or vertically, but not both,” Stonebraker explains. “So you need a system that does both, and that requires a storage manager down at the bottom of the system which is capable of moving both horizontally and vertically through a very big array. That’s what Paradigm4 does.”
In 2008, Stonebraker began developing a database management system at MIT that stored data in multidimensional arrays. He confirmed the approach offered major efficiency advantages, allowing analytical tools based on linear algebra, including many forms of machine learning and statistical data processing, to be applied to huge datasets in new ways.