Llama-2-7b-Chat-GPTQ fine-tuned on PYTHON-CODES-25K

Generate Python code that accomplishes the task instructed.

LoRA Adpater Head

Description

Parameter Efficient Finetuning a 4bit quantized Llama-2-7b-Chat on flytech/python-codes-25k dataset.

Language(s) (NLP): English
License: openrail
Qunatization: GPTQ 4bit
PEFT: LoRA
Finetuned from model TheBloke/Llama-2-7b-Chat-GPTQ
Dataset: flytech/python-codes-25k

Intended uses & limitations

Addressing the efficay of Quantization and PEFT. Implemented as a personal Project.

How to use

The quantized model is finetuned as PEFT. We have the trained Adapter.
Merging LoRA adapater with GPTQ quantized model is not yet supported.
So instead of loading a single finetuned model, we need to load the base
model and merge the finetuned adapter on top.

instruction = """"Help me set up my daily to-do list!""""

from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM,AutoTokenizer

config = PeftConfig.from_pretrained("SwastikM/Llama-2-7B-Chat-text2code")                        #PEFT Config
model = AutoModelForCausalLM.from_pretrained("TheBloke/Llama-2-7b-Chat-GPTQ",device_map='auto')  #Loading the Base Model
model = PeftModel.from_pretrained(model, "SwastikM/Llama-2-7B-Chat-text2code")                   #Combining Trained Adapter with Base Model
tokenizer = AutoTokenizer.from_pretrained("SwastikM/Llama-2-7B-Chat-text2code")

inputs = tokenizer(instruction, return_tensors="pt").input_ids.to('cuda')
outputs = model.generate(inputs, max_new_tokens=500, do_sample=False, num_beams=1)
code = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(code)

A Test Example

User_Prompt = """Write a Python program to implement K-Means clustering. The program should take two mandatory arguments, k and data, where k is the number of clusters and data is a 2D array containing the data points k = 3
data = [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]"""

inputs = tokenizer(User_Prompt, return_tensors="pt").input_ids.to('cuda')
outputs = model.generate(inputs, max_new_tokens=500, do_sample=False, num_beams=1)
python_code = tokenizer.decode(outputs[0], skip_special_tokens=True)

print("Generated Output:",python_code)
>>>

Generated Output:Write a Python program to implement K-Means clustering. The program should take two mandatory arguments, k and data, where k is the number of clusters and data is a 2D array containing the data points k = 3
data = [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]] Ready for action! Let's do this! ```python
import numpy as np

def kmeans(data, k):
    # Initialize the centroids
    centroids = np.random.rand(k, 2)
    
    # Initialize the cluster assignments
    cluster_assignments = np.zeros(data.shape[0], dtype=int)
    
    # Iterate through the data points
    for i in range(data.shape[0]):
        # Calculate the distance between the current data point and each of the centroids
        distances = np.linalg.norm(data[i] - centroids)
        
        # Assign the data point to the closest centroid
        cluster_assignments[i] = np.argmin(distances)
        
    return cluster_assignments
```
This program takes two mandatory arguments, `k` and `data`, where `k` is the number of clusters and `data` is a 2D array containing the data points. The program initializes the centroids randomly and then iterates through the data points to calculate the distance between each data point and each of the centroids. The program then assigns each data point to the closest centroid based on the calculated distance. Finally, the program returns the cluster assignments for each data point.

Note that this program uses the Euclidean distance to calculate the distance between the data points and the centroids. You can change the distance metric if needed.

Also, this program assumes that the data points are 2D. If the data points are 3D or higher, you will need to modify the program accordingly.

I hope this helps! Let me know if you have any questions.

```python
# Example usage
data = [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]
k = 3
cluster_assignments = kmeans(data, k)
print(cluster_assignments)
```
This will output the cluster assignments for each data point. The output will be a list of integers, where each integer represents the cluster assignment for that data point. For example, if the data points are
---------------------------------------------------------------------

Size Comparison

The table shows comparison VRAM requirements for loading and training of FP16 Base Model and 4bit GPTQ quantized model with PEFT. The value for base model referenced from Model Memory Calculator from HuggingFace

Model	Total Size	Training Using Adam
Base Model	12.37 GB	49.48 GB
4bitQuantized+PEFT	3.90 GB	11 GB

Training Details

Training Data

Dataset:gretelai/synthetic_text_to_sql

Trained on instruction column of 20,000 randomly shuffled data.

Training Procedure

HuggingFace Accelerate with Training Loop.

Training Hyperparameters

Optimizer: AdamW
lr: 2e-5
decay: linear
batch_size: 4
gradient_accumulation_steps: 8
global_step: 625

LoraConfig

r: 8
lora_alpha: 32
target_modules: ["k_proj","o_proj","q_proj","v_proj"]
lora_dropout: 0.05

Hardware

GPU: P100

Additional Information

Github: Repository
Intro to quantization: Blog
Emergent Feature: Academic
GPTQ Paper: GPTQ
BITSANDBYTES and further LLM.int8()

Acknowledgment

Thanks to @AMerve Noyan for precise intro. Thanks to @HuggungFace Team for the notebook on GPTQ.

Model Card Authors

Swastik Maiti

Downloads last month: 8

Model tree for SwastikM/Llama-2-7B-Chat-text2code

Base model

meta-llama/Llama-2-7b-chat-hf

Quantized

TheBloke/Llama-2-7B-Chat-GPTQ

Adapter

(13)

this model

Dataset used to train SwastikM/Llama-2-7B-Chat-text2code

Papers for SwastikM/Llama-2-7B-Chat-text2code

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

Paper • 2210.17323 • Published Oct 31, 2022 • 10

LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

Paper • 2208.07339 • Published Aug 15, 2022 • 5