CosyVoice3-2512 ONNX Models (flow & hift)

This repository provides ONNX-format models for selected modules of CosyVoice3, including:

flow_fp32.onnx (full precision, flow module)
flow_fp16.onnx (half precision, flow module)
hift.onnx (full precision, hift module)
flow_hift_fp32.onnx (combined flow_fp32 and hift model)
flow_hift_fp16.onnx (combined flow_fp16 and hift model)

For usage instructions, please refer to the GitHub repository.
Other modules of CosyVoice3 can be obtained from the official CosyVoice3.
I have open-sourced the ONNX version of CosyVoice2 and CosyVoice3, including the modified modules and conversion scripts needed for ONNX. If you want to learn how to perform the conversion, please visit CosyVoiceForOnnx.

Model Inputs and Outputs

flow_fp32.onnx / flow_fp16.onnx

Inputs:
- token (int64)
- prompt_token (int32)
- prompt_feat (float32 / float16)
- embedding (float32 / float16)
  - For flow_fp32.onnx, must use float32
  - For flow_fp16.onnx, must use float16
Outputs:
- tts_mel (float32)

hift.onnx

Input:
- speech_feat (float32)
Output:
- generated_speech (float32)

flow_hift_fp32.onnx / flow_hift_fp16.onnx

Inputs
- token (int32)
- prompt_token (int32)
- prompt_feat (float32 / float16)
- embedding (float16)
- speed (float32, scalar, controls speech rate)
  - For flow_hift_fp32.onnx, must use float32
  - For flow_hift_fp16.onnx, must use float16
Output - generated_speech (float32)

Notes

All outputs are float32.
Input precision must strictly match the model requirements.
Note: in the combined model, token input is int32 (not int64). The speed input is a float32 scalar controlling speech speed.

Acknowledgments

The original models are from the official CosyVoice3. This repository only provides ONNX format conversion and adaptation.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for Lourdle/Fun-CosyVoice3-0.5B-2512_ONNX

Base model

FunAudioLLM/Fun-CosyVoice3-0.5B-2512

Quantized

(1)

this model

Lourdle
/

Fun-CosyVoice3-0.5B-2512_ONNX

CosyVoice3-2512 ONNX Models (flow & hift)

Model Inputs and Outputs

flow_fp32.onnx / flow_fp16.onnx

hift.onnx

flow_hift_fp32.onnx / flow_hift_fp16.onnx

Output - `generated_speech` (float32)

Notes

Acknowledgments

Model tree for Lourdle/Fun-CosyVoice3-0.5B-2512_ONNX

CosyVoice3-2512 ONNX Models (flow & hift)

Model Inputs and Outputs

flow_fp32.onnx / flow_fp16.onnx

hift.onnx

flow_hift_fp32.onnx / flow_hift_fp16.onnx

Output - generated_speech (float32)

Notes

Acknowledgments

Model tree for Lourdle/Fun-CosyVoice3-0.5B-2512_ONNX

Output - `generated_speech` (float32)