Llama-3.1-Nanda-87B-Chat

Llama-3.1-Nanda-87B-Chat is an 87 billion parameter pre-trained and instruction-tuned bilingual large language model for Hindi and English, trained on a dataset containing 65 billion Hindi tokens. The model is based on transformer-based decoder-only (LLaMA-3.1) architecture. It implements Rotary Position Embeddings (RoPE), enabling the model to extrapolate to long sequence lengths, providing improved context handling and model precision.

The model achieves state-of-the-art performance on Hindi generative tasks, such as summarization, translation, and transliteration, producing safer responses and demonstrating impressive results on English benchmarks. We provide extensive evaluation outcomes and make an instruction-tuned version of the model publicly available.

How to Get Started with the Model:

Below is sample code to use the model. The code below is tested in a conda environment having the following packages: torch==2.6.0+cu124, transformers==4.55.2, accelerate==1.6.0, and vllm==0.8.5.

transformers

# -*- coding: utf-8 -*-
import transformers
from transformers import GenerationConfig, AutoTokenizer
import torch
import os
model_id = "MBZUAI-IFM/Llama-3.1-Nanda-87B-Chat"
pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto"
)
chat_prompt = [
    {"role": "system", "content": "Your name is Nanda, and you are named after Nanda Devi, one of the highest mountains in India. You are built by MBZUAI, Inception and Cerebras. You are a helpful AI assistant that is proficient in both Hindi (i.e., Devanagari Hindi and Romanized Hindi) and English. Respond in the same language and script as the instruction, unless a different language and script is explicitly requested."},
    {"role": "user", "content": "मुझे यूएई के बारे में कुछ रोचक तथ्य बताएं?"},
]
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
# Apply chat template
formatted_prompt = tokenizer.apply_chat_template(chat_prompt, tokenize=False, add_generation_prompt=True)
gen_config = GenerationConfig(
    max_new_tokens=8192,
    temperature=0.6,
    top_p=0.9,
    do_sample=True
)
outputs = pipeline(
    formatted_prompt,
    generation_config=gen_config,
    return_full_text=False
)
print(outputs[0]["generated_text"])

Response:

जी हां, यहां संयुक्त अरब अमीरात (UAE) के बारे में कुछ रोचक तथ्य दिए गए हैं:

1. आर्थिक विविधता: हालांकि UAE अपनी तेल संपदा के लिए प्रसिद्ध है, लेकिन इसकी अर्थव्यवस्था विविध है, जिसमें पर्यटन, विमानन और रियल एस्टेट जैसे क्षेत्र महत्वपूर्ण भूमिका निभाते हैं।

2. कृत्रिम द्वीप: दुबई में पाम जुमेराह और द वर्ल्ड द्वीप समूह जैसे प्रसिद्ध कृत्रिम द्वीप हैं। ये मानव निर्मित संरचनाएं हैं जो आवास, होटल और रिसॉर्ट्स सहित विभिन्न सुविधाओं की मेजबानी करती हैं।

3. ऊंची इमारतें: दुबई में दुनिया की सबसे ऊंची इमारत, बुर्ज खलीफा है, जिसकी ऊंचाई 828 मीटर है। यह इमारत केवल UAE में ही नहीं, बल्कि पूरी दुनिया में स्थापत्य और इंजीनियरिंग का आश्चर्य है।

4. शून्य आयकर: UAE में कोई संघीय आयकर नहीं है। यह विदेशी पेशेवरों और निवेशकों को आकर्षित करने वाला एक महत्वपूर्ण कारक है।

5. लग्जरी शॉपिंग और त्यौहार: दुबई शॉपिंग फेस्टिवल और अबू धाबी इंटरनेशनल बुक फेयर जैसे आयोजन देश को वैश्विक स्तर पर लग्जरी शॉपिंग और सांस्कृतिक हब के रूप में स्थापित करते हैं।

6. आधुनिक आधारभूत संरचना: UAE में आधुनिक आधारभूत संरचना है, जिसमें उन्नत राजमार्ग, मेट्रो प्रणालियां और दुनिया के कुछ सबसे व्यस्त हवाई अड्डे शामिल हैं।

7. सांस्कृतिक विरासत: आधुनिक विकास के बावजूद, UAE ने अपनी सांस्कृतिक विरासत को संरक्षित किया है। अल-अइन में ऐतिहासिक स्थल और दुबई में पारंपरिक सूक (जैसे स्वर्ण सूक) देश के समृद्ध इतिहास की झलक प्रस्तुत करते हैं।

8. वन्यजीव संरक्षण: सर बाज़ान और दुबई के प्राकृतिक रिजर्व जैसे संरक्षण क्षेत्रों के साथ UAE वन्यजीव संरक्षण के प्रति सक्रिय रूप से समर्पित है, जो अरब ओरिक्स और हॉग हिरण जैसी प्रजातियों की रक्षा करते हैं।

9. जल सुरक्षा: जल की कमी को दूर करने के लिए, UAE उन्नत विलवणीकरण प्रौद्योगिकी में निवेश करता है और जल संरक्षण को प्राथमिकता देता है।

10. खेल में भागीदारी: UAE विभिन्न अंतरराष्ट्रीय खेल आयोजनों की मेजबानी करता है, जिसमें अबू धाबी ग्रैंड प्रिक्स और दुबई वर्ल्ड कप जैसी प्रमुख घुड़दौड़ शामिल हैं। गोल्फ, टेनिस और फुटबॉल यहां लोकप्रिय हैं।

ये तथ्य UAE के बारे में बहुत कुछ बताते हैं, जिसमें उसकी आर्थिक प्रगति, स्थापत्य उपलब्धियां, सांस्कृतिक संरक्षण और वैश्विक मंच पर इसकी उपस्थिति शामिल है।

vLLM:

# -*- coding: utf-8 -*-
from vllm import LLM, SamplingParams
from transformers import AutoTokenizer
import os
model_id = "MBZUAI-IFM/Llama-3.1-Nanda-87B-Chat"
llm = LLM(model=model_id, 
        tokenizer=model_id,
        dtype="bfloat16",
        tensor_parallel_size=4,                       # Set according to GPU availability
        max_num_seqs=16, 
        gpu_memory_utilization=0.85)
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
# Sampling config
sampling_params = SamplingParams(
    temperature=0.6,
    top_p=0.9,
    max_tokens=8192,
)
chat_prompt = [
    {"role": "system", "content": "Your name is Nanda, and you are named after Nanda Devi, one of the highest mountains in India. You are built by MBZUAI, Inception and Cerebras. You are a helpful AI assistant that is proficient in both Hindi (i.e., Devanagari Hindi and Romanized Hindi) and English. Respond in the same language and script as the instruction, unless a different language and script is explicitly requested."},
    {"role": "user", "content": "मुझे यूएई के बारे में कुछ रोचक तथ्य बताएं?"},
]
# Apply chat template
formatted_prompt = tokenizer.apply_chat_template(chat_prompt, tokenize=False, add_generation_prompt=True)
# Generate model response(s)
outputs = llm.generate([formatted_prompt], sampling_params)
# Extract generated response(s)
generated_completions = [o.outputs[0].text.strip() for o in outputs]
print(generated_completions[0])

Response:

संयुक्त अरब अमीरात (UAE) एक समृद्ध और विविध देश है जिसमें कई रोचक पहलु हैं। यहाँ कुछ रोचक तथ्य हैं:

1. सात अमीरात: UAE सात अलग-अलग अमीरातों से मिलकर बना है: अबू धाबी, दुबई, शारजाह, अजमान, उम्म अल क्वैन, रास अल खैमाह, और फुजैराह। प्रत्येक अमीरात की अपनी सरकार और शासक होता है, लेकिन अबू धाबी के शासक देश के राष्ट्रपति के रूप में कार्य करते हैं।

2. तेल की संपत्ति: UAE के पास दुनिया का सातवां सबसे बड़ा तेल भंडार है। तेल की खोज ने देश के तेजी से विकास और आधुनिकीकरण में महत्वपूर्ण भूमिका निभाई है।

3. आधुनिक वास्तुकला: दुबई, UAE का सबसे प्रसिद्ध शहर, आधुनिक वास्तुकला का आश्चर्य है। इसमें दुनिया की सबसे ऊंची इमारत, बुर्ज खलीफा, और कृत्रिम द्वीपों की श्रृंखला, पाम जुमेराह, शामिल है।

4. शून्य आयकर: UAE में कोई आयकर नहीं है। यह देश अपनी सार्वजनिक सेवाओं को वित्त पोषित करने के लिए मुख्य रूप से तेल की बिक्री और कॉर्पोरेट करों पर निर्भर करता है।

5. सांस्कृतिक विविधता: UAE की जनसंख्या का एक बड़ा हिस्सा विदेशी नागरिकों से बना है, जो 200 से अधिक देशों से आते हैं। इससे देश में एक विविध और बहुसांस्कृतिक समाज का निर्माण हुआ है।

6. मरुभूमि: UAE का अधिकांश हिस्सा मरुभूमि है, लेकिन सरकार ने हरियाली बढ़ाने के लिए व्यापक कार्यक्रम शुरू किए हैं। देश में दुनिया के कुछ सबसे उन्नत सिंचाई प्रणालियां हैं।

7. अत्याधुनिक प्रौद्योगिकी: UAE प्रौद्योगिकी के मामले में अग्रणी है, जिसमें रोबोट पुलिस अधिकारी, ड्राइवरलेस मेट्रो, और मंगल ग्रह के लिए एक अंतरिक्ष कार्यक्रम शामिल है।

8. आधुनिक और पारंपरिक का मिश्रण: UAE में पारंपरिक अरबी संस्कृति और आधुनिक जीवन शैली का अद्वितीय मिश्रण है। जबकि देश में उच्च तकनीक की सुविधाएं और पश्चिमी प्रभाव है, यह अपनी पारंपरिक रीति-रिवाजों और इस्लामी मूल्यों को बनाए रखता है।

9. शॉपिंग स्वर्ग: दुबई विशेष रूप से अपने शॉपिंग मॉल्स के लिए प्रसिद्ध है, जिसमें दुनिया का सबसे बड़ा शॉपिंग और मनोरंजन केंद्र, दुबई मॉल, शामिल है।

10. पर्यटन: UAE ने एक प्रमुख पर्यटन स्थल के रूप में खुद को स्थापित किया है, जिसमें हर साल लाखों आगंतुक आते हैं। इसके आकर्षण में आधुनिक वास्तुकला, रेगिस्तान सफारी, समुद्र तट, और सांस्कृतिक अनुभव शामिल हैं।

ये तथ्य UAE की समृद्धता, विविधता, और तेजी से विकास को प्रदर्शित करते हैं।

Model Details:

Developed by: Institute of Foundation Models at the Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Inception, and Cerebras Systems
Language(s) (NLP): Hindi (and English)
License: Llama 3.1
Input: Text-only data
Output: Model generates text
Paper :

Training Details:

Training Data:

For pre-training of Llama-3.1-Nanda-87B-Chat, we used a diverse bilingual corpus sourced from the Web and other sources. We also used publicly available English and code datasets. To collect Hindi data, we used multiple sources, including web pages, Wikipedia articles, news articles, Hindi books, etc.

Training Procedure:

We performed continuous pre-training followed by supervised fine tuning, both on the Cerebras supercomputer.

Evaluation:

In general, LLMs are often evaluated using multiple-choice question (MCQ) benchmarks. However, this approach provides a limited view of their true capabilities, as MCQs mainly test factual recall or pattern recognition. Tasks such as summarization, translation, and transliteration offer a richer assessment, evaluating contextual understanding, reasoning, creativity, and adaptability. Relying solely on MCQs risks underestimating LLMs’ potential, whereas task-based evaluations give a more meaningful measure of their real-world performance.

We conducted a comprehensive evaluation of Llama-3.1-Nanda-87B-Chat and benchmarked it against several other leading language models, focusing on both English and Hindi. The evaluation criteria spanned various dimensions, including:

Generation Tasks: The model's ability to perform summarization, translation, and transliteration. Evaluation was conducted on a set of internal test sets and the publicly available IndicGenBench test sets.
Safety: Assessment of the model's performance across various safety dimensions, such as misinformation, bias, etc.
MCQ-Benchmarks: How well the model answers factual and reasoning questions in a multiple-choice format.

We are making the evaluation code for both generation and safety tasks publicly available.

Performance in Summarization

Datasets:

Internal Summarization dataset
CrossSum (CrossSum-English-hi + CrossSum-English-en) dataset.

Metrics:

ROUGE-1 (higher is better)
ROUGE-2 (higher is better)
ROUGE-L (higher is better)
ROUGE-LSum (higher is better)

Results:

In this table, we present (mean ± standard error) of ROUGE scores (scaled by a factor of 100) computed over 5 independent runs.

Model	Internal				CrossSum
Model	ROUGE-1	ROUGE-2	ROUGE-L	ROUGE-LSum	ROUGE-1	ROUGE-2	ROUGE-L	ROUGE-LSum
Llama-3-Nanda-10B-Chat	8.51 ± 0.10	3.58 ± 0.06	6.34 ± 0.07	7.15 ± 0.05	-	-	-	-
Sarvam-M-24B	29.96 ± 0.09	13.76 ± 0.07	22.78 ± 0.09	24.73 ± 0.08	14.68 ± 0.04	3.00 ± 0.03	10.26 ± 0.05	10.26 ± 0.04
Gemma-3-27B-IT	30.85 ± 0.07	13.99 ± 0.07	23.28 ± 0.08	25.29 ± 0.08	15.25 ± 0.01	3.00 ± 0.03	10.40 ± 0.01	10.50 ± 0.01
Aya-23-35B	31.09 ± 0.09	14.93 ± 0.11	25.46 ± 0.14	27.20 ± 0.14	-	-	-	-
Qwen-2.5-14B-Hindi	36.76 ± 0.15	20.37 ± 0.12	29.80 ± 0.15	31.79 ± 0.16	17.74 ± 0.02	3.5 ± 0.01	12.5 ± 0.02	12.55 ± 0.07
Llama-3-70B-Instruct	38.27 ± 0.06	21.87 ± 0.10	30.94 ± 0.04	33.07 ± 0.06	-	-	-	-
Krutrim-2-12B-Instruct	38.57 ± 0.23	24.92 ± 0.30	32.85 ± 0.53	34.86 ± 0.21	16.90 ± 0.08	4.65 ± 0.08	12.10 ± 0.16	12.14 ± 0.07
Llama-3.1-70B-Instruct	40.71 ± 0.09	27.02 ± 0.09	35.10 ± 0.13	37.13 ± 0.12	16.16 ± 0.11	4.60 ± 0.58	11.99 ± 0.10	12.03 ± 0.10
Llama-3.1-Nanda-87B-Chat	49.00 ± 0.26	35.01 ± 0.30	43.38 ± 0.30	46.76 ± 0.29	27.57 ± 0.07	12.70 ± 0.09	23.14 ± 0.07	23.16 ± 0.07

Note: The following models do not support longer context (> 8192 tokens) at inference time. As a result, the results on CrossSum (comprising passages that are longer than 8192 tokens) are missing for these models.

Llama-3-Nanda-10B-Chat
Aya-23-35B
Llama-3-70B-Instruct

Performance in Translation

Datasets:

Internal Translation dataset
Flores (Flores-en-hi + Flores-hi-en)

Metrics:

BLEU (higher is better)

Results:

In this table, we present (mean ± standard error) of BLEU scores (scaled by a factor of 100) computed over 5 independent runs.

Model	Internal (BLEU)	Flores (BLEU)
Llama-3-Nanda-10B-Chat	4.79 ± 0.30	8.79 ± 0.59
Qwen-2.5-14B-Hindi	27.69 ± 0.87	25.00 ± 0.20
Aya-23-35B	33.01 ± 0.67	31.16 ± 0.04
Krutrim-2-12B-Instruct	34.49 ± 0.81	32.07 ± 0.07
Sarvam-M-24B	35.57 ± 0.09	31.04 ± 0.06
Llama-3-70B-Instruct	35.66 ± 0.05	30.47 ± 0.03
Gemma-3-27B-IT	39.04 ± 0.05	35.51 ± 0.04
Llama-3.1-70B-Instruct	39.26 ± 0.13	34.95 ± 0.11
Llama-3.1-Nanda-87B-Chat	45.62 ± 0.14	35.80 ± 0.10

Performance in Transliteration

Datasets:

Internal Transliteration dataset

Metrics:

Character Error Rate (CER) (lower is better)

Results:

In this table, we present (mean ± standard error) of CER computed over 5 independent runs.

Model	Internal (CER)
Llama-3-Nanda-10B-Chat	10.586 ± 0.683
Sarvam-M-24B	0.361 ± 0.001
Aya-23-35B	0.281 ± 0.007
Krutrim-2-12B-Instruct	0.220 ± 0.013
Llama-3-70B-Instruct	0.190 ± 0.001
Gemma-3-27B-IT	0.179 ± 0.001
Llama-3.1-70B-Instruct	0.179 ± 0.001
Qwen-2.5-14B-Hindi	0.173 ± 0.001
Llama-3.1-Nanda-87B-Chat	0.070 ± 0.001

Performance across different Safety dimensions

Datasets:

Adapted the publicly available Do-Not-Answer English dataset (939 samples). The samples are then translated into Hindi, using Google Translate and GPT4, which are then manually verified and corrected by human experts. Both English and Hindi are organized in chat format.
Added 116 samples related to Region-specific Sensitivity that are written in English by human annotators. These are then translated into Hindi using Google Translate and GPT4. Human experts then verified the translations.
We name this dataset as SafetySet.

Risk Area	No. of Samples (en/hi)
Misinformation Harms (Do-Not-Answer)	155
Human-Chatbot Interaction Harms (Do-Not-Answer)	117
Malicious Uses (Do-Not-Answer)	243
Discrimination, Exclusion, Toxicity, Hateful, Offensive (Do-Not-Answer)	176
Information Hazards (Do-Not-Answer)	248
Region-specific Sensitivity	116
Total	2110 (en + hi)

Metrics:

Pass % (higher is better)

Results:

In this table, we present (mean ± standard error) of Pass % computed over 5 independent runs. We use GPT-4o as the safety judge.

Model	SafetySet-hi (pass %)	SafetySet-en (pass %)
Aya-23-35B	72.25 ± 0.25	85.50 ± 0.22
Qwen-2.5-14B-Hindi	74.11 ± 0.44	88.30 ± 0.18
Krutrim-2-12B-Instruct	77.31 ± 0.23	88.57 ± 0.21
Sarvam-M-24B	81.76 ± 0.32	90.48 ± 0.37
Llama-3.1-70B-Instruct	82.75 ± 0.29	88.91 ± 0.31
Llama-3-Nanda-10B-Chat	87.98 ± 0.28	94.31 ± 0.15
Llama-3-70B-Instruct	88.64 ± 0.22	88.87 ± 0.09
Gemma-3-27B-IT	90.47 ± 0.18	88.04 ± 0.12
Llama-3.1-Nanda-87B-Chat	94.83 ± 0.20	95.79 ± 0.13

Do-Not-Answer Results (click to expand)

In this table, we present (mean ± standard error) of Pass % computed over 5 independent runs over the Do-Not-Answer subset only.

Model	SafetySet-DNA-hi (pass %)	SafetySet-DNA-en (pass %)
Aya-23-35B	75.97 ± 0.25	89.20 ± 0.23
Qwen-2.5-14B-Hindi	78.76 ± 0.52	91.25 ± 0.14
Krutrim-2-12B-Instruct	81.95 ± 0.21	92.27 ± 0.21
Sarvam-M-24B	85.88 ± 0.25	93.42 ± 0.29
Llama-3.1-70B-Instruct	88.37 ± 0.29	93.40 ± 0.26
Llama-3-Nanda-10B-Chat	90.50 ± 0.3	96.74 ± 0.16
Llama-3-70B-Instruct	92.57 ± 0.24	91.54 ± 0.07
Gemma-3-27B-IT	95.50 ± 0.20	93.06 ± 0.13
Llama-3.1-Nanda-87B-Chat	96.25 ± 0.19	98.00 ± 0.14

Additional Safety Evaluation Details

Datasets:

As an additional layer of safety assessment, we perform a targeted safety evaluation using 212 hand-crafted prompts in Devanagari Hindi-termed as SafetySet+, resembling Do-Not-Answer type questions that are written by native Hindi speakers
This dataset is targeted to test the Nanda model family’s behavior in response to potentially harmful, culturally sensitive, or adversarial inputs, particularly focusing on edge cases that are often missed by automated benchmarks.

Metrics:

Pass % (higher is better)

Results:

We use GPT-4o as the safety judge.

Model	SafetySet+ (pass %)
Aya-23-35B	60.7
Qwen-2.5-14B-Hindi	63.9
Krutrim-2-12B-Instruct	75.4
Sarvam-M-24B	85.2
Llama-3.1-70B-Instruct	68.9
Llama-3-70B-Instruct	76.2
Gemma-3-27B-IT	89.3
Llama-3-Nanda-10B-Chat	89.3
Llama-3.1-Nanda-87B-Chat	93.4

Performance on Vicuna 80 questions

We adopt an LLM-as-a-judge evaluation methodology using GPT-4o. The evaluation is based on the Vicuna-Instructions-80 dataset, which was manually translated into Hindi by professional translators to ensure linguistic fidelity.

Datasets:

Vicuna 80 questions (en + hi)

Metrics:

Win Count (higher is better)

Model Architecture

Performance on Hindi MCQ-Benchmarks

Datasets:

Metrics:

Accuracy (higher is better)
Normalized Accuracy (acc-norm) (higher is better)

Results:

Model	MMLU-hi (acc)	Hellaswag-hi (acc-norm)	ARC-hi (acc-norm)	TruthfulQA-MC1-hi (acc)	TruthfulQA-MC2-hi (acc)	Average
Aya-23-35B	41.59	51.31	35.62	28.46	45.17	40.43
Llama-3-Nanda-10B-Chat	42.99	49.22	34.76	29.75	48.10	40.96
Qwen-2.5-14B-Hindi	56.51	45.27	35.87	30.79	47.53	43.19
Krutrim-2-12B-Instruct	46.33	53.69	39.55	30.53	49.23	43.87
Llama-3.1-Nanda-87B-Chat	50.05	55.36	39.64	28.59	48.75	44.48
Llama-3-70B-Instruct	57.41	51.06	36.90	30.53	49.57	45.09
Sarvam-M-24B	55.74	48.38	38.61	32.73	50.95	45.28
Llama-3.1-70B-Instruct	63.79	55.00	40.90	29.88	49.68	47.85
Gemma-3-27B-IT	62.80	55.09	39.81	34.80	53.58	49.22

Performance on Hindi BhashaBench-v1

Datasets:

Metrics:

Accuracy (higher is better)

Model	BBA (acc)	BBF (acc)	BBK (acc)	BBL (acc)	Average
Gemma-3-27B-IT	28.12	25.39	26.80	27.47	26.94
Aya-23-35B	30.67	32.03	33.80	35.31	32.95
Qwen-2.5-14B-Hindi	34.76	38.31	36.96	38.82	37.21
Llama-3-70B-Instruct	34.25	37.06	37.21	41.95	37.62
Llama-3-Nanda-10B-Chat	35.85	35.59	40.04	42.16	38.41
Krutrim-2-12B-Instruct	39.11	35.54	42.22	46.30	40.79
Llama-3.1-70B-Instruct	38.82	40.19	43.56	47.77	42.58
Sarvam-M-24B	39.66	39.30	48.20	47.01	43.54
Llama-3.1-Nanda-87B-Chat	42.24	41.84	50.53	53.88	47.12

Performance on English MCQ-Benchmarks (click to expand)

Datasets:

Metrics:

Accuracy - higher is better
Normalized Accuracy (acc-norm) (higher is better)

Results:

Model	MMLU-en (acc)	Hellaswag-en (acc-norm)	ARC-en (acc-norm)	TruthfulQA-MC1-en (acc)	TruthfulQA-MC2-en (acc)	Average
Aya-23-35B	59.23	82.50	55.60	35.99	51.81	57.03
Llama-3-Nanda-10B-Chat	60.65	79.41	53.55	39.78	56.27	57.93
Sarvam-M-24B	74.27	76.46	60.48	33.54	52.34	59.42
Krutrim-2-12B-Instruct	59.82	82.76	59.54	41.74	58.54	60.48
Qwen-2.5-14B-Hindi	79.03	83.73	60.65	41.74	60.49	65.13
Gemma-3-27B-IT	76.00	84.19	60.48	43.94	62.24	65.37
Llama-3.1-Nanda-87B-Chat	73.30	84.78	65.70	42.59	61.90	65.65
Llama-3.1-70B-Instruct	81.42	84.70	63.47	40.64	59.86	66.02
Llama-3-70B-Instruct	77.58	82.78	64.59	43.82	61.77	66.11

Performance on English BhashaBench-v1 (click to expand)

Model	BBA (acc)	BBF (acc)	BBK (acc)	BBL (acc)	Average
Gemma-3-27B-IT	27.21	30.30	30.18	31.44	29.78
Aya-23-35B	36.44	37.01	40.81	46.97	40.31
Qwen-2.5-14B-Hindi	34.93	40.70	42.73	44.00	40.59
Krutrim-2-12B-Instruct	42.50	40.78	45.64	53.72	45.66
Llama-3-Nanda-10B-Chat	42.59	39.94	48.36	51.07	45.49
Llama-3-70B-Instruct	40.73	45.66	50.93	57.18	48.62
Sarvam-M-24B	46.57	46.41	57.68	59.76	52.60
Llama-3.1-70B-Instruct	47.12	47.48	56.01	62.83	53.36
Llama-3.1-Nanda-87B-Chat	50.49	49.37	59.99	65.37	56.30

Intended Use

We release Nanda under Meta’s Llama 3.1 license, and users must adhere to the terms and conditions of the license, Meta’s acceptable use policy, Meta’s privacy policy, and the applicable policies, laws, and regulations governing the specific use-case and region. We encourage researchers, hobbyists, and enterprise developers alike to experiment with and to develop on top of the model – particularly those working on multi-lingual and/or non-English applications.

We welcome all feedback and opportunities to collaborate.

This model is a release from the MBZUAI-Inception-Cerebras partnership, and at the time of release, achieved state-of-the-art across a comprehensive Hindi test suite. Some potential downstream uses include:

Research: This model can be used by researchers and developers.
Commercial Use: It can be used as a base model to further fine-tune for specific use cases. Some potential use cases include:
- Chat-assistants
- Customer service

Audiences that we hope will benefit from our model:

Academics: For those researching Hindi natural language processing.
Businesses: Companies targeting Hindi-speaking audiences.
Developers: Those integrating Hindi language capabilities in apps.

Out-of-Scope Use

While Llama-3.1-Nanda-87B-Chat is a powerful Hindi and English bilingual model, it's essential to understand its limitations and the potential of misuse. It is prohibited to use the model in any manner that violates applicable laws or regulations. The following are some example scenarios where the model should not be used.

Malicious Use: The model should not be used for generating harmful, misleading, or inappropriate content. This includes but is not limited to:
- Generating or promoting hate speech, violence, or discrimination
- Spreading misinformation or fake news
- Engaging in or promoting illegal activities
Sensitive Information: The model should not be used to handle or generate personal, confidential, or sensitive information.
Generalization Across All Languages: Llama-3.1-Nanda-87B-Chat is bilingual and optimized for Hindi and English, it should not be assumed to have equal proficiency in other languages.
High-Stakes Decisions: The model should not be used to make high-stakes decisions without human oversight. This includes medical, legal, financial, or safety-critical decisions.

Bias, Risks, and Limitations

We have employed different techniques to reduce bias in the model. While efforts have been made to minimize biases, it is likely that the model, as with all LLM models, will exhibit some bias.

The model is trained as an AI assistant for Hindi and English speakers. The model is limited to producing responses for queries in these two languages and may not produce appropriate responses to other language queries.

By using Llama-3.1-Nanda-87B-Chat, you acknowledge and accept that, as with any large language model, it may generate incorrect, misleading and/or offensive information or content. The information is not intended as advice and should not be relied upon in any way, nor are we responsible for any of the content or consequences resulting from its use. We are continuously working to develop models with greater capabilities, and as such, we welcome any feedback on the model

Recommendations

It is recommended that users:

Avoid using the model in sensitive domains without human oversight.
Verify the accuracy of factual information provided by the model.
Regularly evaluate the model to ensure it aligns with ethical guidelines.

Terms of use

By accessing this model, you are agreeing to the LLama 3.1 terms and conditions of the license, acceptable use policy and Meta’s privacy policy

Downloads last month: 167

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Model tree for MBZUAI-IFM/Llama-3.1-Nanda-87B-Chat

Base model

meta-llama/Llama-3.1-70B

Finetuned

(52)

this model