NLLB-200 Fine-tuned for Uzbek ↔ English Translation

This model is a fine-tuned version of facebook/nllb-200-distilled-600M specifically optimized for bidirectional translation between Uzbek (uz) and English (en).

Model Description

This translation model has been fine-tuned to provide high-quality translations for the Uzbek-English language pair, addressing the limited availability of quality translation models for Uzbek language.

Base Model: facebook/nllb-200-distilled-600M
Language Pairs:

  • English β†’ Uzbek (en β†’ uz)
  • Uzbek β†’ English (uz β†’ en)

Training Data

The model was fine-tuned on a diverse dataset totaling approximately 100% of training data, composed of:

  • 15% - Curated parallel corpus from uza.uz information portal web pages
  • 30% - Uzbek texts translated to English using Gemma-3-27b-it model
  • 35% - English texts translated to Uzbek
  • 20% - Self-improvement dataset: translations generated by the model itself, with low-quality outputs corrected using Gemini model and used for re-fine-tuning

This multi-source approach ensures robust performance across different domains and translation directions.

Performance

The model was evaluated on 200 samples from the openlanguagedata/flores_plus dataset using multiple metrics: BLEU, CHRF, COMET, and BLEURT.

Benchmark Results

English β†’ Uzbek

Model BLEU CHRF COMET BLEURT
NLLB-200-uz-en-v1 20.22 59.3 0.906 0.766
Tahrirchi Tilmoch 19.83 58.01 0.91 0.795
NLLB-200 Baseline 13.07 51.73 0.881 0.707

Uzbek β†’ English

Model BLEU CHRF COMET BLEURT
NLLB-200-uz-en-v1 34.47 62.19 0.874 0.747
Tahrirchi Tilmoch 33.76 61.97 0.876 0.754
NLLB-200 Baseline 30.28 58.28 0.856 0.718

Usage

Installation

pip install transformers sentencepiece

Python Example

from transformers import pipeline

def translate(text, src_lang, tgt_lang):
    """
    Translate text between Uzbek and English using transformers pipeline.
    """
    translator = pipeline(
        "translation",
        model="OvozifyLabs/nllb-en-uz-v1",
        src_lang=src_lang,
        tgt_lang=tgt_lang,
        max_length=512
    )
    
    result = translator(text)
    return result[0]["translation_text"]

# English β†’ Uzbek
en_text = "Hello, how are you today?"
uz_translation = translate(en_text, "eng_Latn", "uzn_Latn")
print("EN:", en_text)
print("UZ:", uz_translation)

# Uzbek β†’ English
uz_text = "Salom, bugun qandaysiz?"
en_translation = translate(uz_text, "uzn_Latn", "eng_Latn")
print("UZ:", uz_text)
print("EN:", en_translation)

Batch Translation Example

def translate_batch(texts, src_lang, tgt_lang):
    """
    Translate a list of texts using the transformers pipeline.
    Pipeline automatically supports batch input.
    """

    translator = pipeline(
        "translation",
        model="OvozifyLabs/nllb-en-uz-v1",
        src_lang=src_lang,
        tgt_lang=tgt_lang,
        max_length=512
    )

    results = translator(texts)

    return [item["translation_text"] for item in results]


# Example usage
texts = [
    "Machine learning is fascinating.",
    "I love learning new languages.",
    "This is a great translation model."
]

translations = translate_batch(texts, src_lang="eng_Latn", tgt_lang="uzn_Latn")

for orig, trans in zip(texts, translations):
    print(f"{orig} β†’ {trans}")

Intended Use

This model is intended for:

  • General-purpose translation between Uzbek and English
  • Content localization for web applications
  • Educational purposes and language learning
  • Research in machine translation for low-resource languages

Acknowledgments

License

This model inherits the license from the base NLLB-200 model. Please refer to the original model card for licensing information.

Contact

For questions, issues, or feedback, please open an issue in the model repository.

Downloads last month
52
Safetensors
Model size
0.6B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support