SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published Apr 7, 2025 • 202
Neural Vocoder is All You Need for Speech Super-resolution Paper • 2203.14941 • Published Mar 28, 2022 • 1
MusicInfuser: Making Video Diffusion Listen and Dance Paper • 2503.14505 • Published Mar 18, 2025 • 11
view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM +2 Mar 12, 2025 • 480
Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities Paper • 2503.03983 • Published Mar 6, 2025 • 26
Qwen2.5 Collection Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 46 items • Updated 3 days ago • 672
CLAP: Contrastive Language-Audio Pretraining Collection CLAP is to audio what CLIP is to image. • 5 items • Updated Oct 20, 2025 • 15
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities Paper • 2402.01831 • Published Feb 2, 2024 • 16
view article Article SmolVLM Grows Smaller – Introducing the 256M & 500M Models! +1 Jan 23, 2025 • 189
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4, 2025 • 252
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper • 2409.17146 • Published Sep 25, 2024 • 121
Cosmos-Tokenizer Collection A suite of image and video tokenizers • 13 items • Updated 11 days ago • 43