MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs | Haibin Lin

No views

@Scale

12 days ago

MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs | Haibin Lin

MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs | Haibin Lin