DIFFA: Large Language Diffusion Models Can Listen and Understand

DIFFA is the first diffusion-based large audio-language model for spoken language understanding.
It combines a frozen diffusion LLM with dual adapters (semantic + acoustic) to enhance audio perception and reasoning.

Downloads last month: 22

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for zhoujiaming777/DIFFA

DIFFA: Large Language Diffusion Models Can Listen and Understand

Paper • 2507.18452 • Published Jul 24, 2025 • 1