PLLuM: a model co-created by OPI PIB ready for action

The Polish Large Language Model (PLLuM) is a family of artificial intelligence (AI) models which allows to process and generate texts in the Polish language. The models that are created by Polish experts in IT and linguistics will effectively support the development of digital competencies and innovation in public administration and business.

‘PLLuM demonstrates that we can advance modern technologies on our own terms, in our own language, and for the benefit of our citizens. We are laying the foundation for intelligent public services and innovation that will effectively support the administration and business sectors,’ said Krzysztof Gawkowski, Deputy Prime Minister and Minister of Digital Affairs.

‘Hi, I’m PLLuM. Here is what I have to offer’

• The Polish language model is flexible and scalable – it uses from 8 to 70 billion parameters. It can generate accurate content in the Polish language. Smaller versions are ideal for rapid tasks, while larger models ensure greater precision and contextual consistency in understanding the Polish language. The PLLuM family includes models that rely on the Mixture of Experts architecture (MoE) with balanced expert selections as well as specialised Retrieval Augmented Generation (RAG) models.

• The model is based on ethically sourced data – versions that are intended for commercial use rely on textual content provided by owners who licensed it to the consortium, as well as on resources that may be used to develop a fully open model, according to the Polish Act on copyright and neighbouring rights and according to EU regulations. Scientific models, which are licensed for non-commercial use, also utilise publicly available datasets, such as Common Crawl.

• The model is fine-tuned with the use of proprietary datasets, consisting of tens of thousands of so-called prompts and model’s expected responses, as well as preferences – prompts and various model responses that are assessed for their quality. All these are created by a team of over fifty experts.

• The model contributes to the development of an ecosystem for Polish language models. The PLLuM and Bielik models can foster Polish-developed AI and support one another in improved training and further data acquisition and sharing, leading to the improvement of #AIMadeInPoland for public administration, business, and society.

Building an ecosystem for the development of large language models in Poland
‘The development of PLLuM is a commitment to a digital future of our country. So far, we have invested PLN 14.5 million in this project, and we are now ready to take the next step. An additional PLN 19 million will help implement the model in public administration and expand collaboration with new partners, such as COI and Cyfronet AGH. This strategy will ensure that PLLuM becomes a key element of public service digitalisation and the national AI ecosystem,’ said Dariusz Standerski, Deputy Minister of Digital Affairs.

The project is conducted on behalf of the Polish Ministry of Digital Affairs, which supervises the development of PLLuM and is the owner of all PLLuM results. The project is implemented by a consortium of six organisations:
Wrocław University of Science and Technology (project leader)
The Institute of Computer Science of the Polish Academy of Sciences
The Institute of Slavic Studies of the Polish Academy of Sciences
Research and Academic Computer Network (NASK-PIB)
The National Information Processing Institute (OPI PIB)
the University of Lodz

Wide-ranging applications of PLLuM
PLLuM is unique compared to other language models. It is customised to the nuances of the Polish language and public administration terminology. When generating Polish content, PLLuM relies on comprehensive procedures for data collection and quality evaluation. It is worth noting that PLLuM primarily uses organic data which is developed manually and not generated with other language models. Trained on Polish datasets, PLLuM efficiently tackles inflection and complex syntax, delivering accurate content.

PLLuM offers advanced solutions for public administration, contributing to Poland’s continued digitalisation:
• A virtual assistant in the mObywatel app which helps citizens access public information.
• An intelligent clerical assistant which automates document processing, content analysis, information retrieval, and assists in responding to citizens’ inquiries.
• Developing educational applications, providing translations, and helping teachers deliver engaging lessons using cutting-edge technology.

The OPI PIB benchmark to test PLLuM
The AI Lab team at OPI PIB composed of Sławomir Dadas, Małgorzata Grębowiec, Michał Perełkiewicz, and Rafał Poświata developed the Polish Linguistic and Cultural Competency Benchmark. The set of 600 manually prepared questions evaluate how language models manage the Polish language and culture in six categories – history, geography, culture and tradition, art and entertainment, grammar, and vocabulary. While text processing deals with surface-level language, interpreting context from literature to pop culture and nuanced speech demands deeper insight. This benchmark helps reveal whether a model truly ‘understands’ the Polish language and culture. Results from the OPI PIB benchmark confirm the PLLuM model’s excellent performance.

PLLuM is available at http://pllum.clarin-pl.eu.
The models can be downloaded from Hugging Face.