OOMS on 8 GB GPU, is it normal?
It gives OOM error even though use_fp16=True is set. is this normal? I am running on 8 gb rtx 3070 graphics card.
same happens to me :/
I use the corpus of BeIR/nq to generate sentences. Here is my test results (use_fp16=True, Linux, A800 GPU):
model.encode(sentences, batch_size=128, max_length=512): 5.9GB / GPUmodel.encode(sentences, batch_size=200, max_length=512): 7.6GB / GPUmodel.encode(sentences, batch_size=256, max_length=512): 9.0GB / GPUmodel.encode(sentences, batch_size=256, max_length=256): 5.7GB / GPU
The default parameters are batch_size=256, max_length=512, so it's normal if you run the examples directly. To solve the problem, you have two choices:
- set shorter
max_length, if thesentencesconsists mostly of short sequences - set smaller
batch_size
in my case for model.encode(sentences, batch_size=1, max_length=5000): 10.5GB VRAM
i'm testing the model for multilang retrieve and re-rank and works pretty good, but demands a lot of VRAM, i dont know if quants are possible with this model's arch, but loading in 8 bits would be a FTW
I've been trying to evaluate it. It took 20GB of my GPU. Is there any way that it can be prevented?