Our laboratory offers state-of-the-art technologies and excellent organisation to those who are interested in programming. Those who are interested in scientific activity can both earn money and do research. Paweł Nowek, the Head of the Laboratory of Intelligent Information Systems, National Information Processing Institute, in conversation with Maciej Chojnowski
Paweł Nowek, head of the Laboratory of Intelligent Information Systems at the National Information Processing Institute obtained a masters degree in Agricultural Technology at the Agricultural University of Lublin, and completed post-graduate studies in software engineering at the Jagiellonian University in Kraków.
His professional interests include project management, big data, optimisation of manufacturing processes and the implementation of new software development methods and technologies.
In 2018 the team of the Laboratory of Intelligent Information Systems (LIIS) responsible for the Integrated System of Information on Science and Higher Education (POL-on) received the EUNIS Elite Award. What is that award given for?
Paweł Nowek: The award is to recognise the best information systems created for the education sector. It was given at a scientific conference. The award committee takes into account both technical solutions, or the case study presented, and the quality of the article.
What did the committee like best in your project?
The systems we described. Such solutions have already been implemented on a large scale in other countries, but their structure is different: they are distributed web systems. We have created one huge system that centralises many processes and that is maintained by a single institution. On top of that, the system was created in a short period of time. Those were the key criteria.
What exactly is POL-on? What purpose does it serve?
It’s a highly extended, multi-layer system. It contains a repository part with information on employees, students, doctoral students, broadly understood scientific and administrative staff, as well as theses and publications. There is a multitude of repositories. But that is only a small piece of a much bigger solution.
There are also value-added operational components. One of them is the System for Evaluation of Scientific Achievements (SEDN), which has been designed to assess the quality of science and higher education.
In other words, POL-on is both a set of data in repositories and a set of various services which dynamically transform such data. To make things easier, we often use the terms, ‘small POL-on’ and ‘big POL-on’. The latter refers to the whole POL-on ecosystem.
However, in view of the Science 2.0 Act, we’ve recently been trying to break that monolith, for purely technical reasons. Currently, we are transforming POL-on into a micro-service architecture. Each module of the previous system that was closed therein can now operate autonomously.
That means you are not only making the whole architecture more flexible, but you are improving the functionalities of the system as well.
Precisely. It’s a specific type of feedback. Although we appreciate the fact that we have been recognised by the EUNIS community, we are aware that there are other interesting solutions. We are trying to find a golden mean. On the one hand, we have decided to keep extensive and detailed knowledge. On the other hand, we are creating a flexible ecosystem of services instead of a single, monolithic database.
Your approach catches the spirit of contemporary programming development in the agile model. Was it your own idea to modernise this systemic thinking?
Definitely so. Currently, we are implementing the most popular Java architecture. It is not about a specific programming language, but about an architecture based on micro-services, with a flexible approach to data saving.
The solution we are developing is based on the ‘domain driven design’ concept. ‘Event sourcing’ is a departure from classical, relational databases and a focus on the user’s actions and on what is happening in the real world.
What is the purpose of that?
We gain a better insight into the needs of the environment and we are able to help administrative staff employed at universities and in research institutes to better understand their work. We want to build flexible applications to ensure services are best tailored to users’ needs. Now, the data is being updated on an on-going basis. With that solution, the system is alive and can convey the information from the world almost instantly.
The information system is used for services. We try to react very fast and not to wait too long to implement software. We do that with iterations. We are committed to improving the user experience by constantly adapting the scope of work and new releases to feedback and recommendations from the environment. You can also say we convey information on what the ministry wants and on what the universities expect. In the National Information Processing Institute, many solutions are invented to satisfy the needs and requirements of those two worlds.
POL-on was implemented several years ago. You are undergoing a thorough transformation but, generally speaking, systems of that sort are evolving all the time. How does this look in your case?
Since 2011 we have undergone several such fundamental changes. It’s mainly about amendments to acts and about expanding the areas. Such processes are dynamic. All of us – the National Information Processing Institute, the Ministry of Science and Higher Education, the universities and the users – are learning all the time. The system embodies that knowledge. And that translates into modernisation and constant changes.
The rate at which implementations are introduced is one of the key indicators to assess the quality or performance of IT companies on the market. If you are able to implement, react and provide software on a continuous delivery basis fast enough, you win. This is the pressure of the world. This is what people expect.
Have you thought about any modules that you would wish to add to the system in the future?
Obviously, further exploration of given areas will partially depend on strategic decisions. One of the crucial areas in the current POL-on system is the procedure of awarding academic degrees. For example, the combination of theses with review elements and other detailed information on that procedure. But also integration with the Uniform Anti-plagiarism system. Dynamic reports prepared with the use of artificial intelligence is another area which will have to be thoroughly explored. With all that data potential, you simply can’t opt out of using them.
Importantly, all those elements are getting more and more integrated, which results in synergy.
What is the current scope of works of the LIIS?
About 80 percent of our activity is creating IT services, with the remaining 20 percent consisting of research and development. Our main goal is to implement two Acts: the Act on Science and Higher Education and the Act on the Principles of Financing Science.
To whom are those services offered?
We provide our services to what is broadly understood as the science sector, i.e. the Ministry of Science and Higher Education and other ministries that supervise universities, as well as the National Science Centre and the National Centre for Research and Development. Our customers also include the Polish National Agency for Academic Exchange, the Council of Scientific Excellence, all universities and research institutes, and the Łukasiewicz Research Network. However, there are also many other institutions, e.g. Statistics Poland and the Border Guard.
What other projects, apart from POL-on, are conducted by the LIIS?
The National Repository of Theses, where we currently store over 10 TB of data, corresponding to over 3 million theses. We make them available to the Uniform Anti-plagiarism System.
We have also been managing the Polish Graduate Tracking System project. We have been using data provided by the Social Insurance office.
And then there is RAD-on, our rising star. RAD-on is based on business intelligence. Its heart is the data warehouse where data from all systems administered by the National Information Processing Institute are integrated. The goal of the system is to make public data available to other entities to allow them to offer new services.
It is also the area we use to develop analytical tools. It is about supporting the decision-making processes by showing what is happening in the science sector. In the future, we’re intending to build analytical and predictive models regarding science and higher education.
We also have several lesser projects, e.g. Inventorum. That was created to join business and science. There is the Pstryk project, which is a tool to support innovation. It is about stimulating university technology transfer centers and establishing a single point of contact: entrepreneurs report their problems, we analyse the inquiry and then send it to technology transfer centers, which then have 24 hours to respond.
Finally, there is Navoica, the Polish MOOC platform, and the Aparatura system, which is the database of available test equipment. This system is currently used by the Łukasiewicz Centre.
Additionally, we have created a system for the National Centre for Research and Development which is not publicly available. The system has been designed to detect grant duplicates. It is a sort of an anti-plagiarism tool used in the area of grants which is able to determine if an idea has already been financed.
Who are the employees of the LIIS?
People are our greatest asset. Having employees who demonstrate affective commitment to the institute and to what they do gives us market and competence advantages. Why is that the case? Because our systems offer a measurable social value. They are also complicated, which makes them interesting at the same time. Our employees want to stay in our team for a long time, which is uncommon in the IT sector.
What is their professional background?
First of all, they are aficionados who are not afraid of experimenting. Before changing the POL-on structure, we had to decide which path to follow. We could have adopted a conservative approach and built a system that would have become an upgraded version of the old system. But people in the LIIS are keen on self-development, knowledge sharing, attending seminars and doing experiments. And they have decided to take up that challenge. Although not all of them are scientists, their way of thinking reflects the spirit of the organisation.
More specifically, we have the research and development team led by Sławomir Dadas, which concentrates on machine learning, particularly on natural language processing. Their studies do not pertain only to the analysis of science-related data, but also to the ways of detecting emotions in texts and to the ‘domain adaptation’ method. It is about using models trained on text data from a single domain (e.g. film reviews) in other contexts (e.g. in opinions on healthcare services). The team has also conducted research on image recognition using handwritten Japanese kanji characters.
The team is now training a very large Polish language model based on BERT architecture. The model is a statistical representation of natural language. Today, such models are laying the groundwork for building more complex tools for text classification, semantic comparisons, data extraction, etc. The bigger the dataset used to train such a model is, the more adequately the semantic and syntactic nuances of a given language are represented. To train that in the LIIS, we have used a plain text corpus of 130 GB, which is equivalent to 400,000 books. Our plan is to make the trained model available to the public.
Another research team in our laboratory is led by professor Witold Pedrycz, University of Alberta, Canada, who is also an academic researcher at the National Information Processing Institute. He’s a world-class expert in computer intelligence, fuzzy modelling, granular computing, image recognition and neural networks. His team combines programming with research and development.
The rest of our employees are IT specialists. Seventy percent of them are programmers, administrators, developers in test, designers and testers. We are no different from any other software house.
What technologies do you use?
The main technology we use is the Java ecosystem. We also work with Python; we have been using it for the Navoica project and to create machine learning models.
Oracle is used for databases.
For the purposes of machine learning, the LIIS also utilises a large Nvidia computing machine, which we make available to other entities. As far as programming tools are concerned, it’s usually Enterprise Architect for UML modeling. Interfaces are designed with Axure. We all use IntelliJ. There is also Pie chart for Python. Code repositories are obviously GIT-supported.
There is JIRA for group work and Confluence for documenting. Implementation is based on Kubernetes. The Docker platform helps us with containerisation. We also use Apache Kafka to provide services for the entire organisation. It is about integrating all systems, exchanging messages between them.
We must also mention automated tests which would not be possible without JUnit. There is also COMIT and everything is built automatically thanks to Jenkins. Sonar takes care of the analysis of the whole code.
How can LIIS employees improve their skills?
In order to meet the expectations of people involved in research and development, we organise annual doctoral seminars. In the past, they were delivered by Prof. Witold Pedrycz from Canada, who I have already mentioned, and by the sadly missed Prof. Krzysztof Marasek, our colleague and illustrious specialist in natural language processing and experimental phonetics, who was also the Head of the Multimedia Department at the Polish-Japanese Academy of Information Technology.
This year seminars are held online. We have signed an agreement on guiding R&D development with the System Research Institute of the Polish Academy of Sciences. And that mainly covers machine learning.
Our employees attend the best global conferences, invest in their development and produce publications on a regular basis.
What can a person, who wants to join the LIIS, hope for?
Our laboratory offers state-of-the-art technologies and excellent organisation to those who are interested in programming.
Those who are interested in scientific activity related to data science or machine learning can both earn money and do research. They can also contribute to solutions that will be used by other people in practice.