`reazon-research/japanese-hubert-base-k2`

This is a Japanese Hubert Base model pre-trained on ReazonSpeech v2.0 corpus using the k2 framework.

This model is converted from the k2 model.

We also release the CTC models, reazon-research/japanese-hubert-base-k2-rs35kh and reazon-research/japanese-hubert-base-k2-rs35kh-bpe, derived from this model.

Usage

import librosa
import torch
from transformers import AutoFeatureExtractor, AutoModel

feature_extractor = AutoFeatureExtractor.from_pretrained("reazon-research/japanese-hubert-base-k2")
model = AutoModel.from_pretrained("reazon-research/japanese-hubert-base-k2")

audio, sr = librosa.load(audio_file, sr=16_000)
inputs = feature_extractor(
    audio,
    return_tensors="pt",
    sampling_rate=sr,
)
with torch.inference_mode():
    outputs = model(**inputs)

Citation

@misc{reazon-research-japanese-hubert-base-k2,
  title={japanese-hubert-base-k2},
  author={Sasaki, Yuta},
  url = {https://huggingface.co/reazon-research/japanese-hubert-base-k2},
  year = {2025}
}

@article{yang2024k2ssl,
  title={k2SSL: A faster and better framework for self-supervised speech representation learning},
  author={Yang, Yifan and Zhuo, Jianheng and Jin, Zengrui and Ma, Ziyang and Yang, Xiaoyu and Yao, Zengwei and Guo, Liyong and Kang, Wei and Kuang, Fangjun and Lin, Long and others},
  journal={arXiv preprint arXiv:2411.17100},
  year={2024}
}