SentenceTransformer based on BAAI/bge-base-en-v1.5

This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5 on the reason_ccnews, reason_reddit and reason_s2orc datasets. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: BAAI/bge-base-en-v1.5
Maximum Sequence Length: 256 tokens
Output Dimensionality: 768 dimensions
Similarity Function: Cosine Similarity
Training Datasets:

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("bwang0911/reasoning-bge")
# Run inference
sentences = [
    'Crossover and multicriticality due to the Dzyaloshinsky-Moriya interaction',
    'We show that the addition of a Dzyaloshinsky-Moriya interaction to a Heisenberg ferromagnet introduces only one crossover exponent, which is the same as for the usual uniaxial anisotropy. This result is in contrast to a previous report by Liu.',
    'The second text elaborates on the first by specifying the impact of the Dzyaloshinsky-Moriya interaction on a Heisenberg ferromagnet. It highlights a key finding: the introduction of only one crossover exponent, contrasting with a prior study. This directly addresses the topic introduced in the title.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Datasets: mteb/nfcorpus, mteb/trec-covid, mteb/fiqa and mteb/quora
Evaluated with InformationRetrievalEvaluator

Metric	mteb/nfcorpus	mteb/trec-covid	mteb/fiqa	mteb/quora
cosine_accuracy@1	0.5046	0.86	0.358	0.8112
cosine_accuracy@3	0.6347	1.0	0.5231	0.9258
cosine_accuracy@5	0.6966	1.0	0.5849	0.9553
cosine_accuracy@10	0.7678	1.0	0.6744	0.9773
cosine_precision@1	0.5046	0.86	0.358	0.8112
cosine_precision@3	0.3994	0.88	0.2325	0.3724
cosine_precision@5	0.3573	0.856	0.1694	0.2455
cosine_precision@10	0.2867	0.832	0.1065	0.1341
cosine_recall@1	0.0652	0.0007	0.1851	0.7047
cosine_recall@3	0.1139	0.0022	0.318	0.8691
cosine_recall@5	0.1396	0.0036	0.372	0.9145
cosine_recall@10	0.1869	0.0069	0.4559	0.9525
cosine_ndcg@10	0.3825	0.8435	0.3827	0.8812
cosine_mrr@10	0.5875	0.9233	0.4577	0.873
cosine_map@100	0.196	0.5214	0.3237	0.8502

Training Details

Training Datasets

reason_ccnews

Dataset: reason_ccnews at 2e4fb05
Size: 44,978 training samples
Columns: title, body, and reason

Approximate statistics based on the first 1000 samples:

	title	body	reason
type	string	string	string
details	min: 6 tokens mean: 15.34 tokens max: 42 tokens	min: 21 tokens mean: 221.75 tokens max: 256 tokens	min: 28 tokens mean: 59.19 tokens max: 88 tokens

Samples:

title	body	reason
`Fight Leaves Wayne Simmonds Shirtless`	Reed Saxon/AP Images Kevin Bieksa and Wayne Simmonds dropped the gloves just 95 seconds into last night’s 4-3 Ducks shootout win over the Flyers, and Bieksa immediately yanked his opponent’s jersey over his head, to the delight of the crowd and to grins from Simmonds and the officials. That’s not supposed to happen. NHL players wear something called a fight strap, which binds the back of the jersey to the pants, preventing the jersey from being pulled off. (Losing a jersey is an advantage in a fight, as it gives the shirtless player’s opponent nothing to grab on to. Sabres enforcer Rob Ray was notorious for losing his gear in a fight, occasionally taking it off himself before clinching.) Any player who engaged in a fight without wearing a fight strap is subject to an automatic game misconduct. Advertisement Simmonds wasn’t ejected, though; at the one-minute mark of the video above, you can see he did have his fight strap properly attached. It just broke, which happens on occasion.	`The article describes a hockey fight involving Wayne Simmonds, confirming the title's claim. It details the fight, including Simmonds' jersey being pulled off, and explains the rules and context around the incident, directly elaborating on the event suggested by the title.`
`Merck CEO Kenneth Frazier ditches Trump over Charlottesville silence`	Merck CEO Kenneth C. Frazier resigned from the president’s council on manufacturing Monday in direct protest of President Donald Trump’s lack of condemnation of white nationalist actions in Charlottesville, Va. over the weekend. In a statement, Frazier, who is African-American, said he believes the country’s strength comes from the diversity of its citizens and that he feels personally compelled to stand up for that diversity and against intolerance. “America’s leaders must honor our fundamental values by clearly rejecting expressions of hatred, bigotry and group supremacy, which run counter to the American ideal that all people are created equal,” he wrote. “As CEO of Merck, and as a matter of personal conscience, I feel a responsibility to take a stand against intolerance and extremism.” RELATED: At least one death has been confirmed after a car plowed into a crowd of protesters in Charlottesville Trump immediately fired back at Frazier on Twitter, saying the Merck CEO now “will have...	`The second text provides a detailed elaboration of the first. It explains the context of Kenneth Frazier's resignation, the reasons behind it (Trump's silence on Charlottesville), and includes Frazier's statement. It also provides additional background information about Frazier and the President's Manufacturing Council.`
`Lightning's Braydon Coburn: Joining road trip`	Coburn (lower body) will travel with the team on its upcoming four-game road trip and is hoping to play at some point in the second half of the trip, Bryan Burns of the Lightning's official site reports. The veteran blueliner is yet to play in the month of December, having already missed four games. However, the fact that Coburn is traveling with the team and has been given a chance to play at some point within the next week will be music to the ears of fantasy owners who benefited from Coburn's surprising production -- seven points in 25 games -- earlier in the season. Keep an eye out for updates as the trip progresses.	`The second text elaborates on the first by providing details about Braydon Coburn's situation. It specifies that he will join the team on a road trip and offers context about his injury, recovery timeline, and potential for playing, directly expanding on the initial announcement.`

Loss: ReasoningGuidedRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

reason_reddit

Dataset: reason_reddit at 2fd69ee
Size: 41,703 training samples
Columns: title, body, and reason

Approximate statistics based on the first 1000 samples:

	title	body	reason
type	string	string	string
details	min: 6 tokens mean: 18.82 tokens max: 69 tokens	min: 16 tokens mean: 126.63 tokens max: 256 tokens	min: 42 tokens mean: 59.32 tokens max: 84 tokens

Samples:

title	body	reason
`The one feature the iPad is really missing.`	I don't care about the lack of camera. I never use the one on my MacBook, and even if I did the angle would be terrible on the iPad. I don't care if third party apps can't run in the background. I don't listen to streaming music. I don't care that the App Store is a closed system. I can jailbreak for myself and I think the closed system works better for most users. The one feature I want is User Accounts and a Guest Account. If this device is meant to be a coffee table computer, it needs to be able to accomadate multiple users.	`The second text identifies the missing feature from the iPad as user accounts and a guest account. The first sentence in the second text sets up a contrast by stating what the author doesn't care about. The final sentence directly addresses the prompt by stating the feature the author does want.`
`Dear Sydney Reddit'ers, Would you like any changes made to the style of this subreddit?`	`I was going to subtly edit the style of the Sydney subreddit but then I found this post and realised that people have very strong opinions about how their reddit should look. So before I make any changes do you have any opinions or suggestions?`	`The second text directly responds to the question in the first text. It acknowledges the query about subreddit style changes and seeks further input from the community before making any modifications. It demonstrates an understanding of the original post's intent and a willingness to engage with user preferences.`
`I skipped bail, ran away, and never got caught. AM(A)A.`	Long/short story, I went to work in the United States in the last 90s and was busted in a major drug raid. I risked up to lifetime in jail if caught since I was associated with so many crimes; at the bare minimum, said my attorney, I was looking at 7 years in jail, and much more likely more than this. My attorney said I was in a lot of trouble. He was the first to bring it up. I did not want to lose 10, 15 or 25 years of my life in jail, especially at my age. Since I was not a United States citizen, I should simply skip bail and run away. And never come back. My bail was initially supposed to be $300,000 but my attorney managed to get the judge to set a final bail of $100,000. He explained I was a trustworthy person, lawfully employed, who never did anything wrong and never committed any crime. He portrayed me as someone trustworthy and intelligent who could take care of his responsibilities. The judge agreed and decided on a very low bail, especially for the crimes I was accused of....	`The second text provides a detailed account of the events summarized in the first text. It elaborates on the circumstances of skipping bail, running away, and avoiding capture, offering specific details about the legal situation, the escape plan, and the aftermath. The AMAA at the end indicates the user is open to questions about the story.`

Loss: ReasoningGuidedRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

reason_s2orc

Dataset: reason_s2orc at 4d04170
Size: 96,205 training samples
Columns: title, body, and reason

Approximate statistics based on the first 1000 samples:

	title	body	reason
type	string	string	string
details	min: 6 tokens mean: 19.26 tokens max: 75 tokens	min: 17 tokens mean: 138.29 tokens max: 256 tokens	min: 47 tokens mean: 67.13 tokens max: 107 tokens

Samples:

title	body	reason
`Syntheses, Structures and Properties of Two Transition Metal-Flexible Ligand Coordination Polymers`	Two coordination polymers based on 3,5-bis(4-carboxyphenylmethyloxy) benzoic acid (H3L), [M(HL)]·2H2O M = Mn(1), Co(2), have been synthesized under hydrothermal conditions. Their structures have been determined by single-crystal X-ray diffraction and further characterized by elemental analysis, IR spectra and TGA. The two complexes possess 3D framework with diamond channels resulting from the trans-configuration of the flexible ligand and three coordination modes, 3(η2, η1), 2(η1, η1), η1, of carboxyl groups in the ligand. The framework can be represented with Schlafli symbol of (48·66)(47·66). The wall of the channel consists of left- or right-handed helical polymeric chains. UV–visible–NIR and photoluminescence spectra, magnetic properties of 1 and 2 have also been discussed.	`The second text elaborates on the title by detailing the synthesis, structure, and properties of two specific transition metal coordination polymers. It provides the chemical formula, synthesis method, structural characteristics (3D framework, channels), and characterization techniques (X-ray diffraction, IR spectra, etc.) mentioned in the title.`
`Discussion on the Influence and Development of Technical Aesthetics in Modern Landscape Design`	`The source of technical aesthetics was introduced and its meaning was explained.The relations between technical aesthetics and modern landscpae design were discussed.The embodiment of technical aesthetics in landscpae design was discussed in the aspects of new material,new technology,new structureand new apparatus.It was put forward that the the development direction of technical aesthetics were tending to sensibility, native land and zoology.`	`The second text directly addresses the topic introduced in the first text. It explores the meaning, application, and future directions of technical aesthetics within modern landscape design, elaborating on the influence and development mentioned in the title.`
`GRIN optics for dual-band IR sensors (Conference Presentation)`	Graded index (GRIN) optics offer potential for both weight savings and increased performance but have until recently been limited to visible and NIR bands (wavelengths shorter than about 0.9 µm). NRL has developed glass-based IR-GRIN lenses compatible with SWIR-LWIR wavebands. Recent designs show the potential for significant SWaP reduction benefits and improved performance using IR-GRIN lens elements in dual-band, MWIR-LWIR sensors. The SWaP and performance advantages of IR-GRIN lenses in platform-relevant dual-band imagers will be presented.	`The second text elaborates on the first by providing a detailed description of GRIN optics, specifically for dual-band IR sensors. It explains the potential benefits (weight savings, increased performance) and highlights the development of IR-GRIN lenses compatible with SWIR-LWIR wavebands, aligning directly with the conference presentation topic.`

Loss: ReasoningGuidedRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 128
learning_rate: 5e-06
num_train_epochs: 1
warmup_ratio: 0.2
fp16: True
batch_sampler: no_duplicates

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 128
per_device_eval_batch_size: 8
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 5e-06
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 1
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.2
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: True
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
tp_size: 0
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional

Training Logs

Epoch	Step	Training Loss	mteb/nfcorpus_cosine_ndcg@10	mteb/trec-covid_cosine_ndcg@10	mteb/fiqa_cosine_ndcg@10	mteb/quora_cosine_ndcg@10
-1	-1	-	0.3714	0.8385	0.3831	0.8889
0.0070	10	0.9492	-	-	-	-
0.0140	20	0.9799	-	-	-	-
0.0210	30	0.84	-	-	-	-
0.0280	40	0.9555	-	-	-	-
0.0350	50	0.9292	0.3695	0.8401	0.3840	0.8892
0.0420	60	1.1549	-	-	-	-
0.0490	70	0.8573	-	-	-	-
0.0559	80	0.5784	-	-	-	-
0.0629	90	0.7275	-	-	-	-
0.0699	100	0.4792	0.3766	0.8457	0.3886	0.8887
0.0769	110	0.6293	-	-	-	-
0.0839	120	0.5167	-	-	-	-
0.0909	130	0.3838	-	-	-	-
0.0979	140	0.3458	-	-	-	-
0.1049	150	0.4897	0.3739	0.8494	0.3866	0.8876
0.1119	160	0.3124	-	-	-	-
0.1189	170	0.4367	-	-	-	-
0.1259	180	0.3565	-	-	-	-
0.1329	190	0.2646	-	-	-	-
0.1399	200	0.2	0.3757	0.8508	0.3852	0.8860
0.1469	210	0.2051	-	-	-	-
0.1538	220	0.1248	-	-	-	-
0.1608	230	0.2398	-	-	-	-
0.1678	240	0.1599	-	-	-	-
0.1748	250	0.3251	0.3743	0.8527	0.3840	0.8840
0.1818	260	0.263	-	-	-	-
0.1888	270	0.2523	-	-	-	-
0.1958	280	0.2156	-	-	-	-
0.2028	290	0.1587	-	-	-	-
0.2098	300	0.1977	0.3777	0.8557	0.3859	0.8830
0.2168	310	0.1544	-	-	-	-
0.2238	320	0.1301	-	-	-	-
0.2308	330	0.1178	-	-	-	-
0.2378	340	0.1084	-	-	-	-
0.2448	350	0.1784	0.3800	0.8540	0.3860	0.8821
0.2517	360	0.1541	-	-	-	-
0.2587	370	0.0982	-	-	-	-
0.2657	380	0.1897	-	-	-	-
0.2727	390	0.117	-	-	-	-
0.2797	400	0.1806	0.3785	0.8458	0.3861	0.8818
0.2867	410	0.1258	-	-	-	-
0.2937	420	0.1249	-	-	-	-
0.3007	430	0.1987	-	-	-	-
0.3077	440	0.1512	-	-	-	-
0.3147	450	0.1646	0.3817	0.8422	0.3829	0.8814
0.3217	460	0.1322	-	-	-	-
0.3287	470	0.1464	-	-	-	-
0.3357	480	0.1488	-	-	-	-
0.3427	490	0.1033	-	-	-	-
0.3497	500	0.1209	0.3825	0.8435	0.3827	0.8812

Framework Versions

Python: 3.10.12
Sentence Transformers: 3.5.0.dev0
Transformers: 4.50.0
PyTorch: 2.6.0+cu124
Accelerate: 1.5.2
Datasets: 2.21.0
Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

Downloads last month: 3

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for bwang0911/reasoning-bge

Base model

BAAI/bge-base-en-v1.5

Finetuned

(435)

this model

Datasets used to train bwang0911/reasoning-bge

Paper for bwang0911/reasoning-bge

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Paper • 1908.10084 • Published Aug 27, 2019 • 9

Evaluation results

Cosine Accuracy@1 on mteb/nfcorpus
self-reported

0.505
Cosine Accuracy@3 on mteb/nfcorpus
self-reported

0.635
Cosine Accuracy@5 on mteb/nfcorpus
self-reported

0.697
Cosine Accuracy@10 on mteb/nfcorpus
self-reported

0.768
Cosine Precision@1 on mteb/nfcorpus
self-reported

0.505
Cosine Precision@3 on mteb/nfcorpus
self-reported

0.399
Cosine Precision@5 on mteb/nfcorpus
self-reported

0.357
Cosine Precision@10 on mteb/nfcorpus
self-reported

0.287
Cosine Recall@1 on mteb/nfcorpus
self-reported

0.065
Cosine Recall@3 on mteb/nfcorpus
self-reported

0.114