OOMS on 8 GB GPU, is it normal?

by tanimazsin130 - opened Jan 31, 2024

Discussion

tanimazsin130

Jan 31, 2024

It gives OOM error even though use_fp16=True is set. is this normal? I am running on 8 gb rtx 3070 graphics card.

MarcRibs

Jan 31, 2024

same happens to me :/

hanhainebula

Beijing Academy of Artificial Intelligence org Jan 31, 2024

I use the corpus of BeIR/nq to generate sentences. Here is my test results (use_fp16=True, Linux, A800 GPU):

model.encode(sentences, batch_size=128, max_length=512): 5.9GB / GPU
model.encode(sentences, batch_size=200, max_length=512): 7.6GB / GPU
model.encode(sentences, batch_size=256, max_length=512): 9.0GB / GPU
model.encode(sentences, batch_size=256, max_length=256): 5.7GB / GPU

The default parameters are batch_size=256, max_length=512, so it's normal if you run the examples directly. To solve the problem, you have two choices:

set shorter max_length, if the sentences consists mostly of short sequences
set smaller batch_size

prudant

Feb 11, 2024

•

edited Feb 12, 2024

in my case for model.encode(sentences, batch_size=1, max_length=5000): 10.5GB VRAM
i'm testing the model for multilang retrieve and re-rank and works pretty good, but demands a lot of VRAM, i dont know if quants are possible with this model's arch, but loading in 8 bits would be a FTW

nafi-ahmed

May 1

I've been trying to evaluate it. It took 20GB of my GPU. Is there any way that it can be prevented?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment