Speech Language Models
Collection
20 items
•
Updated
•
5
reazon-research/japanese-hubert-base-k2
This is a Japanese Hubert Base model pre-trained on ReazonSpeech v2.0 corpus using the k2 framework.
This model is converted from the k2 model.
We also release the CTC models, reazon-research/japanese-hubert-base-k2-rs35kh and reazon-research/japanese-hubert-base-k2-rs35kh-bpe, derived from this model.
import librosa
import torch
from transformers import AutoFeatureExtractor, AutoModel
feature_extractor = AutoFeatureExtractor.from_pretrained("reazon-research/japanese-hubert-base-k2")
model = AutoModel.from_pretrained("reazon-research/japanese-hubert-base-k2")
audio, sr = librosa.load(audio_file, sr=16_000)
inputs = feature_extractor(
audio,
return_tensors="pt",
sampling_rate=sr,
)
with torch.inference_mode():
outputs = model(**inputs)
@misc{reazon-research-japanese-hubert-base-k2,
title={japanese-hubert-base-k2},
author={Sasaki, Yuta},
url = {https://huggingface.co/reazon-research/japanese-hubert-base-k2},
year = {2025}
}
@article{yang2024k2ssl,
title={k2SSL: A faster and better framework for self-supervised speech representation learning},
author={Yang, Yifan and Zhuo, Jianheng and Jin, Zengrui and Ma, Ziyang and Yang, Xiaoyu and Yao, Zengwei and Guo, Liyong and Kang, Wei and Kuang, Fangjun and Lin, Long and others},
journal={arXiv preprint arXiv:2411.17100},
year={2024}
}