Best practice for training LLaMA models in Megatron-LM
-
Updated
Jan 2, 2024 - Python
Best practice for training LLaMA models in Megatron-LM
Annotations of the interesting ML papers I read
Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
A LLaMA1/LLaMA12 Megatron implement.
Training NVIDIA NeMo Megatron Large Language Model (LLM) using NeMo Framework on Google Kubernetes Engine
Minimal yet high performant code for pretraining llms. Attempts to implement some SOTA features. Implements training through: Deepspeed, Megatron-LM, and FSDP. WIP
Megatron-LM/GPT-NeoX compatible Text Encoder with 🤗Transformers AutoTokenizer.
Add a description, image, and links to the megatron-lm topic page so that developers can more easily learn about it.
To associate your repository with the megatron-lm topic, visit your repo's landing page and select "manage topics."