Fun-Audio-Chat-8B

Fun-Audio-Chat is a Large Audio Language Model built for natural, low-latency voice interactions.

Model Description

Fun-Audio-Chat is a Large Audio Language Model built for natural, low-latency voice interactions. It introduces Dual-Resolution Speech Representations (an efficient 5Hz shared backbone + a 25Hz refined head) to cut compute while keeping high speech quality, and Core-Cocktail training to preserve strong text LLM capabilities. It delivers top-tier results on spoken QA, audio understanding, speech function calling, speech instruction-following and voice empathy benchmarks.

Key Features

Dual-Resolution Speech Representations: Efficient 5Hz frame rate (vs. 12.5Hz or 25Hz for other models), reducing GPU hours by nearly 50% while maintaining high speech quality
State-of-the-Art Performance: Ranks Top among models of the same size (around-8B parameters) on OpenAudioBench, VoiceBench, UltraEval-Audio, MMAU, MMAU-Pro, MMSU, Speech-ACEBench, Speech-BFCL, Speech-SmartInteract, VStyle
Comprehensive Capabilities: Supports spoken QA, audio understanding, speech function calling, speech instruction-following, voice empathy

Model Details

Attribute	Value
Model Size	~8B parameters
Architecture	Dual-Resolution Speech Representations
Languages	English, Chinese
License	Apache 2.0

Requirements

Python == 3.12
PyTorch == 2.8.0
ffmpeg
GPU Memory: ~24GB for inference, 4×80GB for training

Installation

git clone --recurse-submodules https://github.com/FunAudioLLM/Fun-Audio-Chat
cd Fun-Audio-Chat

apt install ffmpeg
conda create -n FunAudioChat python=3.12 -y
conda activate FunAudioChat
pip install torch==2.8.0 torchaudio==2.8.0 --index-url https://download.pytorch.org/whl/cu128
pip install -r requirements.txt

Quick Start

Download Models

Using HuggingFace:

pip install huggingface-hub
hf download FunAudioLLM/Fun-Audio-Chat-8B --local-dir ./pretrained_models/Fun-Audio-Chat-8B
hf download FunAudioLLM/Fun-CosyVoice3-0.5B-2512 --local-dir ./pretrained_models/Fun-CosyVoice3-0.5B-2512

Or using ModelScope:

modelscope download --model FunAudioLLM/Fun-Audio-Chat-8B --local_dir pretrained_models/Fun-Audio-Chat-8B
modelscope download --model FunAudioLLM/Fun-CosyVoice3-0.5B-2512 --local_dir pretrained_models/Fun-CosyVoice3-0.5B-2512

Inference

export PYTHONPATH=`pwd`
# Speech-to-Text
python examples/infer_s2t.py
# Speech-to-Speech
python examples/infer_s2s.py

Evaluation

Benchmark	Category
OpenAudioBench	Spoken QA
VoiceBench	Spoken QA
UltraEval-Audio	Speech-to-Speech
MMAU, MMAU-Pro, MMSU	Audio Understanding
Speech-ACEBench, Speech-BFCL, Speech-SmartInteract	Speech Function Calling
VStyle	Speech Instruction-Following

For detailed evaluation instructions, please refer to the GitHub repository.

Citation

If you find this model useful, please cite our paper:

@article{funaudiochat2025,
  title={Fun-Audio-Chat Technical Report},
  author={Qian Chen and Luyao Cheng and Chong Deng and Xiangang Li and Jiaqing Liu and Chao-Hong Tan and Wen Wang and Junhao Xu and Jieping Ye and Qinglin Zhang and Qiquan Zhang and Jingren Zhou},
  year={2025},
  eprint={2512.20156},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2512.20156},
}


@misc{tan2025drvoiceparallelspeechtextvoice,
  title={DrVoice: Parallel Speech-Text Voice Conversation Model via Dual-Resolution Speech Representations}, 
  author={Chao-Hong Tan and Qian Chen and Wen Wang and Chong Deng and Qinglin Zhang and Luyao Cheng and Hai Yu and Xin Zhang and Xiang Lv and Tianyu Zhao and Chong Zhang and Yukun Ma and Yafeng Chen and Hui Wang and Jiaqing Liu and Xiangang Li and Jieping Ye},
  year={2025},
  eprint={2506.09349},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2506.09349}, 
}

License

This model is licensed under the Apache 2.0 License.

Acknowledgments

This project is based on the following excellent open-source projects:

Contact

🐛 Submit an Issue
💡 Submit a Pull Request

Downloads last month: -

Safetensors

Model size

9B params

Tensor type

BF16

Inference Providers NEW

Any-to-Any

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support