These updates, which include two innovative techniques and a hyperparameter tool to optimize and scale LLM training on any number of GPUs, provide new possibilities for training and deploying models on the NVIDIA AI platform.
With 176 billion parameters, BLOOM, the world’s largest open source multilingual language model, has recently been trained on the NVIDIA AI platform, enabling it to generate text in 46 languages and 13 programming languages. The NVIDIA AI platform has also powered one of the most powerful transformative linguistic models, the Megatron-Turing NLG model, which has 530 billion parameters (MT-NLG).
Recent advances in LLMs
LLMs, which include trillions of parameters that you learn from texts, are one of today’s most important advanced technologies. However, its creation is a costly and lengthy process that requires deep technical knowledge, a distributed infrastructure, and a comprehensive approach.
However, its impact on real-time content generation, text summarization, customer service chatbots, and question answering for conversational AI interfaces is huge.
The AI community continues to innovate tools like Microsoft DeepSpeed, Colossal-AI, Hugging Face BigScience, and Fairscale, which are built on NVIDIA’s AI platform and include Megatron-LM, Apex, and other GPU-accelerated libraries, to make advance the LLMs.
These new NVIDIA AI platform optimizations address many existing issues across the stack. NVIDIA looks forward to continuing to collaborate with the AI community to put the power of LLMs within everyone’s reach.
More agile LLM creation
The latest updates to NeMo Megatron provide 30 percent faster training of GPT-3 models with parameters ranging from 22 billion to 1 trillion. Model training with 175 billion parameters can now be completed in 24 days using 1,024 NVIDIA A100 GPUs, reducing time to results by 10 days, or approximately 250,000 GPU compute hours prior to these new releases.
NeMo Megatron is a comprehensive containerized framework for data collection, large-scale model training, model evaluation against industry-standard benchmarks, and inference with leading-edge throughput and latency performance .
Facilitates LLM training and inference reproducibly across a wide range of GPU cluster configurations. These features are currently available to early access customers for use on NVIDIA DGX SuperPODs, NVIDIA DGX Foundry, and the Microsoft Azure cloud. Other cloud platforms will be supported in the near future.
The functionality described can be tested on NVIDIA LaunchPada free program that provides short-term access to a catalog of hands-on labs on NVIDIA-accelerated infrastructures.
Eddie is an Australian news reporter with over 9 years in the industry and has published on Forbes and tech crunch.