--- license: mit datasets: - tiiuae/falcon-refinedweb language: - en metrics: - perplexity - accuracy --- ## Description Models trained on 300B tokens, including dense FFN ones and low-rank FFN ones. ## Citation If you find it useful, please consider citing the paper: ``` @article{wei2024building, title={Building on efficient foundations: Effective training of LLMs with structured feedforward layers}, author={Wei, Xiuying and Moalla, Skander and Pascanu, Razvan and Gulcehre, Caglar}, journal={Advances in Neural Information Processing Systems}, volume={37}, pages={4689--4717}, year={2024} } @article{wei2024investigating, title={Investigating low-rank training in transformer language models: Efficiency and scaling analysis}, author={Wei, Xiuying and Moalla, Skander and Pascanu, Razvan and Gulcehre, Caglar}, journal={arXiv preprint arXiv:2407.09835}, year={2024} } ```