SentenceTransformer wrapper for Rostlab/prot_t5_xl_uniref50
This repository repackages Rostlab/prot_t5_xl_uniref50 into a Sentence-Transformers model:
Transformer + mean pooling for producing fixed-size protein sequence embeddings.
Preprocessing (IMPORTANT)
ProtT5 expects:
- Replace rare/ambiguous amino acids
U,Z,O,BwithX - Insert whitespace between all amino acids
Example: PRTEINO -> "P R T E I N O"
Usage
from sentence_transformers import SentenceTransformer
def prott5_prepare_sequences(seqs):
import re
out = []
for s in seqs:
s = re.sub(r"[UZOB]", "X", s.upper())
out.append(" ".join(list(s)))
return out
model = SentenceTransformer("wrice/prot_t5_xl_uniref50-st")
seqs = ["PRTEINO", "SEQWENCE"]
seqs = prott5_prepare_sequences(seqs)
emb = model.encode(seqs, normalize_embeddings=False, batch_size=4, show_progress_bar=True)
print(emb.shape)
- Downloads last month
- 17
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support