The collaboration is designed to connect NVIDIA’s healthcare computing platforms and AI expertise with the Broad Institute’s world-renowned researchers, scientists, and open platforms with a focus on three key areas:
- Do what NVIDIA Clara™ Bricks® be available on the Terra platform: Parabricks, a GPU-accelerated software suite for secondary analysis of sequencing data, is now available in six new Terra workflows. Users can now analyze an entire genome in just one hour with Clara Parabricks, compared to 24 hours in a CPU-based environment, and can cut the cost of computing by more than half.
- Development of large language models (LLM): Researchers will develop fundamental models for DNA and RNA, the building blocks of life, to better understand human biology using NVIDIA BioNeMo, an AI application framework which was announced today for large language models in biology.
- Bring enhanced deep learning to the Genomic Analysis Toolkit (GATK): NVIDIA is contributing a new deep learning model directly to the Broad Institute’s GATK toolkit, the industry standard used by more than 100,000 researchers, that helps identify genetic variants associated with disease. This will help drug discovery researchers to develop new therapies.
“There is a need across the healthcare ecosystem for better computational tools that enable advances in the way we understand disease, develop diagnoses and deliver treatments,” said Kimberly Powell, vice president of healthcare at NVIDIA. “By expanding our collaboration with the Broad Institute, we can bring the power of great language models to deliver joint solutions and bridge the divide between researchers’ insights and real benefits to patients.”
The Broad Institute aims to enable the next generation of collaborative biomedical research by providing an open cloud platform that connects researchers to each other and to the data sets and tools they need to advance science.
“Life sciences is in the midst of a data revolution, and researchers need a new approach to bringing machine learning into biomedicine,” said Anthony Philippakis, data director at the Broad Institute. “With this collaboration, we aim to expand our mission of data sharing and collaborative processes to scale genomics research.”
Great Language Models for Studying Diseases
The NVIDIA BioNeMo framework includes pre-trained LLMs for proteins and chemistry that simplify training, inference, and scaling. BioNeMo is an extension of the framework NVIDIA NeMo Megatron and is specific to the domain of chemistry, proteins, and DNA/RNA sequences.
BioNeMo enables developers to efficiently train and implement biology LLMs with billions of parameters.
Together, teams from both organizations will build on this work to create new models to add to the BioNeMo collection and make available on the Terra platform.
NVIDIA Software for Domain Specific AI
NVIDIA Parabricks GPU-accelerated workflows provide researchers with faster turnaround times and lower costs for a wide range of genomic data analyses. For Broad’s GATK best practice germline workflow, doing analysis with Parabricks on GPU can be up to 24x faster and less than half the cost.
Broad Institute researchers will also gain access to MONAIan open source deep learning framework for AI for medical imaging, as well as to NVIDIA RAPIDS™, a GPU-accelerated data science toolkit for faster data preparation, which can be used for single-cell genomic analysis.
Eddie is an Australian news reporter with over 9 years in the industry and has published on Forbes and tech crunch.