SLM pretrained from scratch
AI & ML interests
None defined yet.
Recent Activity
View all activity
a suite of high-quality Chinese datasets, used for pretraining, fine-tuning or preference alignment. And the models trained on these datasets.
-
opencsg/Fineweb-Edu-Chinese-V2.1
Viewer • Updated • 958M • 29.5k • 56 -
OpenCSG Chinese Corpus: A Series of High-quality Chinese Datasets for LLM Training
Paper • 2501.08197 • Published • 9 -
opencsg/chinese-fineweb-edu-v2
Viewer • Updated • 188M • 15.3k • 72 -
opencsg/chinese-fineweb-edu
Viewer • Updated • 84.6M • 17.2k • 109
codeLlama finetune by OpenCSG
synthetic datasets
SLM pretrained from scratch
a suite of high-quality Chinese datasets, used for pretraining, fine-tuning or preference alignment. And the models trained on these datasets.
-
opencsg/Fineweb-Edu-Chinese-V2.1
Viewer • Updated • 958M • 29.5k • 56 -
OpenCSG Chinese Corpus: A Series of High-quality Chinese Datasets for LLM Training
Paper • 2501.08197 • Published • 9 -
opencsg/chinese-fineweb-edu-v2
Viewer • Updated • 188M • 15.3k • 72 -
opencsg/chinese-fineweb-edu
Viewer • Updated • 84.6M • 17.2k • 109
codeLlama finetune by OpenCSG
starcoder finetuned by OpenCSG
synthetic datasets