A Review of the Challenges with Massive Web-Mined Corpora Used in Large Language Models Pre-training

Michał Perełkiewicz, Rafał Poświata

2025 W: Artificial Intelligence and Soft Computing : 23rd International Conference, ICAISC 2024, Zakopane, Poland, June 16–20, 2024, Proceedings, Part III / Leszek Rutkowski, Rafał Scherer, Marcin Korytkowski, Witold Pedrycz, Ryszard Tadeusiewicz, Jacek M. Zurada. - Cham : Springer. - s. 153-156

International Conference on Artificial Intelligence and Soft Computing [ICAISC], Zakopane, 16-20.06.2024

https://link.springer.com/chapter/10.1007/978-3-031-81596-6_14