SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4 • 252
Moonlight-A3B Collection Moonshot's Compute-efficient MoE LLM, first Scaling Up of Muon Optimizer • 3 items • Updated Nov 2 • 9
view article Article The Open Evaluation Standard: Benchmarking NVIDIA Nemotron 3 Nano with NeMo Evaluator 7 days ago • 32
Nemotron v3 Pre-Training Collection Large scale pre-training datasets used in the Nemotron family of models. • 11 items • Updated 1 day ago • 6
Common Pile v0.1 Collection All resources related to Common Pile v0.1, an 8TB dataset of public domain and openly licensed text • 4 items • Updated Jun 6 • 39
view article Article Nemotron 3 Nano \- A new Standard for Efficient, Open, and Intelligent Agentic Models 9 days ago • 97
ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding Paper • 2512.13586 • Published 9 days ago • 87
Nemotron-Post-Training-v3 Collection Collection of datasets used in the post-training phase of Nemotron Nano v3. • 7 items • Updated 1 day ago • 47
NVIDIA Nemotron v3 Collection Open, Production-ready Enterprise Models • 6 items • Updated 1 day ago • 101
Nemotron-Pre-Training-Datasets Collection Large scale pre-training datasets used in the Nemotron family of models. • 11 items • Updated 1 day ago • 81
NeMo Gym Collection Collection of RL verifiable data for NeMo Gym • 13 items • Updated 1 day ago • 30
Devstral 2 Collection A couple of agentic LLMs for software engineering tasks, excelling at using tools to explore codebases, edit multiple files, and power SWE Agents. • 3 items • Updated 15 days ago • 37
view article Article Transformers v5: Simple model definitions powering the AI ecosystem +2 24 days ago • 253
Mistral Large 3 Collection A state-of-the-art, open-weight, general-purpose multimodal model with a granular Mixture-of-Experts architecture. • 4 items • Updated 22 days ago • 80
Ministral 3 Collection A collection of edge models, with Base, Instruct and Reasoning variants, in 3 different sizes: 3B, 8B and 14B. All with vision capabilities. • 9 items • Updated 22 days ago • 133