15 18 15

Gabriel Mongaras PRO

gmongaras

BurnyCoder's profile picture

tuxinet's profile picture

little-lake-studios's profile picture

https://gmongaras.me/

gmongaras
gmongaras
gmongaras

AI & ML interests

None yet

Recent Activity

updated a collection about 15 hours ago

Stuff I'm going to read

liked a Space 16 days ago

microsoft/TRELLIS.2

upvoted a paper 28 days ago

InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input Vision-Language Models

View all activity

Organizations

gmongaras 's collections 8

Stuff I'm going to read

LTX-2: Efficient Joint Audio-Visual Foundation Model

Paper • 2601.03233 • Published 1 day ago • 55

datasets

gmongaras/CC12M_and_Imagenet21K_Recap_Highqual_512

Viewer • Updated Apr 24, 2025 • 19.8M • 3.58k • 1
gmongaras/CC12M_and_Imagenet21K_Recap_Highqual_256

Viewer • Updated Apr 21, 2025 • 19.8M • 4.48k
gmongaras/CC12M_and_Imagenet21K_Recap_Highqual

Viewer • Updated Feb 21, 2025 • 19.8M • 5.83k • 5
gmongaras/CC12M_and_Imagenet21K_Recap

Viewer • Updated Sep 17, 2025 • 22.7M • 5.23k • 7

Reddit Models

Some terrible Reddit models I am training just to see what happens. Never again will I hear "As an AI language model"

gmongaras/Wizard_7B_Reddit_Political_2019_13B

Text Generation • Updated Sep 15, 2023 • 7
gmongaras/Wizard_7B_Reddit_Political_2019

Text Generation • Updated Sep 11, 2023 • 6
gmongaras/Wizard_7B_Reddit_Political_2019_8bit

Text Generation • 7B • Updated Sep 11, 2023 • 9
gmongaras/reddit_negative_v1_8B

Text Generation • Updated Sep 15, 2023 • 6

BERT_512

gmongaras/BERT_Base_Cased_512_Dataset

Viewer • Updated Nov 28, 2023 • 136M • 128
gmongaras/BERT_Base_Cased_512_Dataset_Mapped

Viewer • Updated Nov 29, 2023 • 136M • 185
gmongaras/BERT_Base_Cased_512_GLUE

Viewer • Updated Dec 11, 2023 • 1.44M • 22
gmongaras/BERT_Base_Cased_512_GLUE_Mapped

Viewer • Updated Dec 11, 2023 • 1.44M • 25

Stable Diffusion 3 Checkpoints

Collection of checkpoints from the stable diffusion 3 model I am training (https://github.com/gmongaras/Stable-Diffusion-3-From-Scratch)

gmongaras/datav3_attempt5_8GPU_SoftFlash_RoPE2d_2AccSteps_13batchsize_stage3

Updated May 14, 2025
gmongaras/datav3_attempt5_8GPU_SoftFlash_RoPE2d_2AccSteps_40batchsize_stage2

Updated Apr 28, 2025
gmongaras/datav3_attempt5_8GPU_SoftFlash_RoPE2d_2AccSteps_140batchsize_stage1

Updated Apr 19, 2025
gmongaras/datav3_attempt4_8GPU_SoftFlash_RoPE2dV2_2AccSteps_stage2

Updated Apr 11, 2025

Cosine Attention (Cottention)

Models for the paper Cottention: Linear Transformers With Cosine Attention https://arxiv.org/abs/2409.18747

gmongaras/Cosine_Attention_GPT_300M

Feature Extraction • Updated Oct 7, 2024 • 8
gmongaras/Softmax_Attention_GPT_1.2B

Feature Extraction • Updated Oct 7, 2024 • 6
gmongaras/Softmax_Attention_GPT_300M

Feature Extraction • Updated Oct 7, 2024 • 3
gmongaras/Cosine_Attention_GPT_1.2B

Feature Extraction • Updated Oct 7, 2024 • 5

Squad Models

Models trained on squad data

gmongaras/Wizard_7B_Squad

Text Generation • Updated Sep 11, 2023 • 10
gmongaras/Wizard_7B_Squad_8bit

Text Generation • Updated Sep 11, 2023 • 8
gmongaras/Wizard_7B_Squad_v2

Text Generation • Updated Sep 15, 2023 • 5

Subtitle Data

gmongaras/Anime_Subtitle_data

Viewer • Updated Mar 31, 2024 • 14.6M • 23 • 1
gmongaras/Anime_Subtitle_data2

Viewer • Updated Mar 31, 2024 • 1.91M • 20

Stuff I'm going to read

LTX-2: Efficient Joint Audio-Visual Foundation Model

Paper • 2601.03233 • Published 1 day ago • 55

Stable Diffusion 3 Checkpoints

Collection of checkpoints from the stable diffusion 3 model I am training (https://github.com/gmongaras/Stable-Diffusion-3-From-Scratch)

gmongaras/datav3_attempt5_8GPU_SoftFlash_RoPE2d_2AccSteps_13batchsize_stage3

Updated May 14, 2025
gmongaras/datav3_attempt5_8GPU_SoftFlash_RoPE2d_2AccSteps_40batchsize_stage2

Updated Apr 28, 2025
gmongaras/datav3_attempt5_8GPU_SoftFlash_RoPE2d_2AccSteps_140batchsize_stage1

Updated Apr 19, 2025
gmongaras/datav3_attempt4_8GPU_SoftFlash_RoPE2dV2_2AccSteps_stage2

Updated Apr 11, 2025

datasets

gmongaras/CC12M_and_Imagenet21K_Recap_Highqual_512

Viewer • Updated Apr 24, 2025 • 19.8M • 3.58k • 1
gmongaras/CC12M_and_Imagenet21K_Recap_Highqual_256

Viewer • Updated Apr 21, 2025 • 19.8M • 4.48k
gmongaras/CC12M_and_Imagenet21K_Recap_Highqual

Viewer • Updated Feb 21, 2025 • 19.8M • 5.83k • 5
gmongaras/CC12M_and_Imagenet21K_Recap

Viewer • Updated Sep 17, 2025 • 22.7M • 5.23k • 7

Cosine Attention (Cottention)

Models for the paper Cottention: Linear Transformers With Cosine Attention https://arxiv.org/abs/2409.18747

gmongaras/Cosine_Attention_GPT_300M

Feature Extraction • Updated Oct 7, 2024 • 8
gmongaras/Softmax_Attention_GPT_1.2B

Feature Extraction • Updated Oct 7, 2024 • 6
gmongaras/Softmax_Attention_GPT_300M

Feature Extraction • Updated Oct 7, 2024 • 3
gmongaras/Cosine_Attention_GPT_1.2B

Feature Extraction • Updated Oct 7, 2024 • 5

Reddit Models

Some terrible Reddit models I am training just to see what happens. Never again will I hear "As an AI language model"

gmongaras/Wizard_7B_Reddit_Political_2019_13B

Text Generation • Updated Sep 15, 2023 • 7
gmongaras/Wizard_7B_Reddit_Political_2019

Text Generation • Updated Sep 11, 2023 • 6
gmongaras/Wizard_7B_Reddit_Political_2019_8bit

Text Generation • 7B • Updated Sep 11, 2023 • 9
gmongaras/reddit_negative_v1_8B

Text Generation • Updated Sep 15, 2023 • 6

Squad Models

Models trained on squad data

gmongaras/Wizard_7B_Squad

Text Generation • Updated Sep 11, 2023 • 10
gmongaras/Wizard_7B_Squad_8bit

Text Generation • Updated Sep 11, 2023 • 8
gmongaras/Wizard_7B_Squad_v2

Text Generation • Updated Sep 15, 2023 • 5

BERT_512

gmongaras/BERT_Base_Cased_512_Dataset

Viewer • Updated Nov 28, 2023 • 136M • 128
gmongaras/BERT_Base_Cased_512_Dataset_Mapped

Viewer • Updated Nov 29, 2023 • 136M • 185
gmongaras/BERT_Base_Cased_512_GLUE

Viewer • Updated Dec 11, 2023 • 1.44M • 22
gmongaras/BERT_Base_Cased_512_GLUE_Mapped

Viewer • Updated Dec 11, 2023 • 1.44M • 25

Subtitle Data

gmongaras/Anime_Subtitle_data

Viewer • Updated Mar 31, 2024 • 14.6M • 23 • 1
gmongaras/Anime_Subtitle_data2

Viewer • Updated Mar 31, 2024 • 1.91M • 20