NLLB-200 Fine-tuned for Uzbek β English Translation
This model is a fine-tuned version of facebook/nllb-200-distilled-600M specifically optimized for bidirectional translation between Uzbek (uz) and English (en).
Model Description
This translation model has been fine-tuned to provide high-quality translations for the Uzbek-English language pair, addressing the limited availability of quality translation models for Uzbek language.
Base Model: facebook/nllb-200-distilled-600M
Language Pairs:
- English β Uzbek (en β uz)
- Uzbek β English (uz β en)
Training Data
The model was fine-tuned on a diverse dataset totaling approximately 100% of training data, composed of:
- 15% - Curated parallel corpus from uza.uz information portal web pages
- 30% - Uzbek texts translated to English using Gemma-3-27b-it model
- 35% - English texts translated to Uzbek
- 20% - Self-improvement dataset: translations generated by the model itself, with low-quality outputs corrected using Gemini model and used for re-fine-tuning
This multi-source approach ensures robust performance across different domains and translation directions.
Performance
The model was evaluated on 200 samples from the openlanguagedata/flores_plus dataset using multiple metrics: BLEU, CHRF, COMET, and BLEURT.
Benchmark Results
English β Uzbek
| Model | BLEU | CHRF | COMET | BLEURT |
|---|---|---|---|---|
| NLLB-200-uz-en-v1 | 20.22 | 59.3 | 0.906 | 0.766 |
| Tahrirchi Tilmoch | 19.83 | 58.01 | 0.91 | 0.795 |
| NLLB-200 Baseline | 13.07 | 51.73 | 0.881 | 0.707 |
Uzbek β English
| Model | BLEU | CHRF | COMET | BLEURT |
|---|---|---|---|---|
| NLLB-200-uz-en-v1 | 34.47 | 62.19 | 0.874 | 0.747 |
| Tahrirchi Tilmoch | 33.76 | 61.97 | 0.876 | 0.754 |
| NLLB-200 Baseline | 30.28 | 58.28 | 0.856 | 0.718 |
Usage
Installation
pip install transformers sentencepiece
Python Example
from transformers import pipeline
def translate(text, src_lang, tgt_lang):
"""
Translate text between Uzbek and English using transformers pipeline.
"""
translator = pipeline(
"translation",
model="OvozifyLabs/nllb-en-uz-v1",
src_lang=src_lang,
tgt_lang=tgt_lang,
max_length=512
)
result = translator(text)
return result[0]["translation_text"]
# English β Uzbek
en_text = "Hello, how are you today?"
uz_translation = translate(en_text, "eng_Latn", "uzn_Latn")
print("EN:", en_text)
print("UZ:", uz_translation)
# Uzbek β English
uz_text = "Salom, bugun qandaysiz?"
en_translation = translate(uz_text, "uzn_Latn", "eng_Latn")
print("UZ:", uz_text)
print("EN:", en_translation)
Batch Translation Example
def translate_batch(texts, src_lang, tgt_lang):
"""
Translate a list of texts using the transformers pipeline.
Pipeline automatically supports batch input.
"""
translator = pipeline(
"translation",
model="OvozifyLabs/nllb-en-uz-v1",
src_lang=src_lang,
tgt_lang=tgt_lang,
max_length=512
)
results = translator(texts)
return [item["translation_text"] for item in results]
# Example usage
texts = [
"Machine learning is fascinating.",
"I love learning new languages.",
"This is a great translation model."
]
translations = translate_batch(texts, src_lang="eng_Latn", tgt_lang="uzn_Latn")
for orig, trans in zip(texts, translations):
print(f"{orig} β {trans}")
Intended Use
This model is intended for:
- General-purpose translation between Uzbek and English
- Content localization for web applications
- Educational purposes and language learning
- Research in machine translation for low-resource languages
Acknowledgments
- Base model: facebook/nllb-200-distilled-600M
- Evaluation dataset: openlanguagedata/flores_plus
License
This model inherits the license from the base NLLB-200 model. Please refer to the original model card for licensing information.
Contact
For questions, issues, or feedback, please open an issue in the model repository.
- Downloads last month
- 52