CosyVoice3-2512 ONNX Models (flow & hift)

This repository provides ONNX-format models for selected modules of CosyVoice3, including:

  • flow_fp32.onnx (full precision, flow module)
  • flow_fp16.onnx (half precision, flow module)
  • hift.onnx (full precision, hift module)
  • flow_hift_fp32.onnx (combined flow_fp32 and hift model)
  • flow_hift_fp16.onnx (combined flow_fp16 and hift model)

For usage instructions, please refer to the GitHub repository.
Other modules of CosyVoice3 can be obtained from the official CosyVoice3.
I have open-sourced the ONNX version of CosyVoice2 and CosyVoice3, including the modified modules and conversion scripts needed for ONNX. If you want to learn how to perform the conversion, please visit CosyVoiceForOnnx.


Model Inputs and Outputs

flow_fp32.onnx / flow_fp16.onnx

  • Inputs:
    • token (int64)
    • prompt_token (int32)
    • prompt_feat (float32 / float16)
    • embedding (float32 / float16)
      • For flow_fp32.onnx, must use float32
      • For flow_fp16.onnx, must use float16
  • Outputs:
    • tts_mel (float32)

hift.onnx

  • Input:
    • speech_feat (float32)
  • Output:
    • generated_speech (float32)

flow_hift_fp32.onnx / flow_hift_fp16.onnx

  • Inputs
    • token (int32)
    • prompt_token (int32)
    • prompt_feat (float32 / float16)
    • embedding (float16)
    • speed (float32, scalar, controls speech rate)
      • For flow_hift_fp32.onnx, must use float32
      • For flow_hift_fp16.onnx, must use float16
  • Output - generated_speech (float32)


Notes

  • All outputs are float32.
  • Input precision must strictly match the model requirements.
  • Note: in the combined model, token input is int32 (not int64). The speed input is a float32 scalar controlling speech speed.

Acknowledgments

The original models are from the official CosyVoice3. This repository only provides ONNX format conversion and adaptation.


Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Lourdle/Fun-CosyVoice3-0.5B-2512_ONNX

Quantized
(1)
this model