{"id":32621,"date":"2025-02-24T15:09:16","date_gmt":"2025-02-24T14:09:16","guid":{"rendered":"https:\/\/opi.org.pl\/pllum-model-wspoltworzony-przez-opi-pib-gotowy-do-dzialania\/"},"modified":"2025-04-08T01:46:03","modified_gmt":"2025-04-07T23:46:03","slug":"pllum-model-wspoltworzony-przez-opi-pib-gotowy-do-dzialania","status":"publish","type":"post","link":"https:\/\/opi.org.pl\/en\/pllum-model-wspoltworzony-przez-opi-pib-gotowy-do-dzialania\/","title":{"rendered":"PLLuM: a model co-created by OPI PIB ready for action"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">The Polish Large Language Model (PLLuM) is a family of artificial intelligence (AI) models which allows to process and generate texts in the Polish language. The models that are created by Polish experts in IT and linguistics will effectively support the development of digital competencies and innovation in public administration and business.<br><br><em>\u2018PLLuM demonstrates that we can advance modern technologies on our own terms, in our own language, and for the benefit of our citizens.<\/em> <em>We are laying the foundation for intelligent public services and innovation that will effectively support the administration and business sectors<\/em>,\u2019 said Krzysztof Gawkowski, Deputy Prime Minister and Minister of Digital Affairs.<br><br><strong><strong>\u2018Hi, I\u2019m PLLuM. Here is what I have to offer\u2019<\/strong><br><\/strong><br>\u2022 The Polish language model is flexible and scalable \u2013 it uses from 8 to 70 billion parameters. It can generate accurate content in the Polish language. Smaller versions are ideal for rapid tasks, while larger models ensure greater precision and contextual consistency in understanding the Polish language. \u00a0The PLLuM family includes models that rely on the Mixture of Experts architecture (MoE) with balanced expert selections as well as specialised Retrieval Augmented Generation (RAG) models.<br><br>\u2022 The model is based on ethically sourced data \u2013 versions that are intended for commercial use rely on textual content provided by owners who licensed it to the consortium, as well as on resources that may be used to develop a fully open model, according to the Polish Act on copyright and neighbouring rights and according to EU regulations. Scientific models, which are licensed for non-commercial use, also utilise publicly available datasets, such as Common Crawl.<br><br>\u2022 The model is fine-tuned with the use of proprietary datasets, consisting of tens of thousands of so-called prompts and model\u2019s expected responses, as well as preferences \u2013 prompts and various model responses that are assessed for their quality. All these are created by a team of over fifty experts.<br><br>\u2022 The model contributes to the development of an ecosystem for Polish language models. The PLLuM and Bielik models can foster Polish-developed AI and support one another in improved training and further data acquisition and sharing, leading to the improvement of #AIMadeInPoland for public administration, business, and society.<br><br><strong><strong>Building an ecosystem for the development of large language models in Poland<\/strong><\/strong><br><em>\u2018The development of PLLuM is a commitment to a digital future of our country.<\/em> <em>So far, we have invested PLN 14.5 million in this project, and we are now ready to take the next step. An additional PLN 19 million will help implement the model in public administration and expand collaboration with new partners, such as COI and Cyfronet AGH.<\/em> <em>This strategy will ensure that PLLuM becomes a key element of public service digitalisation and the national AI ecosystem,\u2019<\/em> said Dariusz Standerski, Deputy Minister of Digital Affairs.<br><br>The project is conducted on behalf of the Polish Ministry of Digital Affairs, which supervises the development of PLLuM and is the owner of all PLLuM results. The project is implemented by a consortium of six organisations:<br><strong>Wroc\u0142aw University of Science and Technology<\/strong>\u00a0(project leader)<br><strong>The Institute of Computer Science of the Polish Academy of Sciences<\/strong><br><strong>The Institute of Slavic Studies of the Polish Academy of Sciences<\/strong><br><strong>Research and Academic Computer Network (NASK-PIB)<\/strong><br><strong>The National Information Processing Institute (OPI PIB)<\/strong><br><strong>the University of Lodz<\/strong><br><br><strong><strong>Wide-ranging applications of PLLuM<\/strong><\/strong><br>PLLuM is unique compared to other language models. It is customised to the nuances of the Polish language and public administration terminology. When generating Polish content, PLLuM relies on comprehensive procedures for data collection and quality evaluation. It is worth noting that PLLuM primarily uses organic data which is developed manually and not generated with other language models. Trained on Polish datasets, PLLuM efficiently tackles inflection and complex syntax, delivering accurate content.<br><br>PLLuM offers advanced solutions for public administration, contributing to Poland\u2019s continued digitalisation:<br>\u2022 A virtual assistant in the mObywatel app which helps citizens access public information.<br>\u2022 An intelligent clerical assistant which automates document processing, content analysis, information retrieval, and assists in responding to citizens\u2019 inquiries.<br>\u2022 Developing educational applications, providing translations, and helping teachers deliver engaging lessons using cutting-edge technology.<br><br><strong><strong>The OPI PIB benchmark to test PLLuM<\/strong><\/strong><br>The AI Lab team at OPI PIB composed of \u00a0S\u0142awomir Dadas, Ma\u0142gorzata Gr\u0119bowiec, Micha\u0142 Pere\u0142kiewicz, and Rafa\u0142 Po\u015bwiata developed the <strong><a href=\"https:\/\/huggingface.co\/spaces\/sdadas\/plcc\" class=\"external\" rel=\"nofollow\" target=\"blank\">Polish Linguistic and Cultural Competency Benchmark<\/a><\/strong>. The set of 600 manually prepared questions evaluate how language models manage the Polish language and culture in six categories \u2013 history, geography, culture and tradition, art and entertainment, grammar, and vocabulary. While text processing deals with surface-level language, interpreting context from literature to pop culture and nuanced speech demands deeper insight. This benchmark helps reveal whether a model truly \u2018understands\u2019 the Polish language and culture. Results from the OPI PIB benchmark confirm the PLLuM model&#8217;s excellent performance.<br><br>PLLuM is available at <a href=\"http:\/\/pllum.clarin-pl.eu\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"external\">http:\/\/pllum.clarin-pl.eu<\/a>. <br>The models can be downloaded from <a href=\"https:\/\/router.huggingface.co\/CYFRAGOVPL\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"external\">Hugging Face<\/a>.<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"Konferencja prasowa | Odkryj przysz\u0142o\u015b\u0107 us\u0142ug publicznych z PLLuM\" width=\"640\" height=\"360\" src=\"https:\/\/www.youtube.com\/embed\/m9gyLQTX820?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe>\n<\/div><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>On 24 February 2025, the Polish Ministry of Digital Affairs presented the Polish language model PLLuM and the plan for its development. The project was conducted by a consortium of organisations, with the National Information Processing Institute (OPI PIB) as one of its members. The model\u2019s performance was tested using the  <a href=\"https:\/\/huggingface.co\/spaces\/sdadas\/plcc\" class=\"external\" rel=\"nofollow\" target=\"blank\">Polish Linguistic and Cultural Competency Benchmark<\/a>, which was also developed by experts at OPI PIB. The conference at which PLLum was showcased was attended by Dr Marek Koz\u0142owski, Head of AI Lab at OPI PIB.<\/p>\n","protected":false},"author":30,"featured_media":32434,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":"","_links_to":"","_links_to_target":""},"categories":[411],"tags":[492,838,876,840,874],"class_list":["post-32621","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-news-en","tag-innovation","tag-it-en","tag-languagemodels-2","tag-opipib-2-en","tag-pllum-en"],"_links":{"self":[{"href":"https:\/\/opi.org.pl\/en\/wp-json\/wp\/v2\/posts\/32621","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/opi.org.pl\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/opi.org.pl\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/opi.org.pl\/en\/wp-json\/wp\/v2\/users\/30"}],"replies":[{"embeddable":true,"href":"https:\/\/opi.org.pl\/en\/wp-json\/wp\/v2\/comments?post=32621"}],"version-history":[{"count":1,"href":"https:\/\/opi.org.pl\/en\/wp-json\/wp\/v2\/posts\/32621\/revisions"}],"predecessor-version":[{"id":32623,"href":"https:\/\/opi.org.pl\/en\/wp-json\/wp\/v2\/posts\/32621\/revisions\/32623"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/opi.org.pl\/en\/wp-json\/wp\/v2\/media\/32434"}],"wp:attachment":[{"href":"https:\/\/opi.org.pl\/en\/wp-json\/wp\/v2\/media?parent=32621"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/opi.org.pl\/en\/wp-json\/wp\/v2\/categories?post=32621"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/opi.org.pl\/en\/wp-json\/wp\/v2\/tags?post=32621"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}