Laboratory of Databases and Business Analytics

It never gets boring

We define the path of our progress. We need to implement the aims but it is up to us how we implement them. Maciej Chojnowski talks to Emil Podwysocki, the Head of the Laboratory of Databases and Business Analytics, National Information Processing Institute.

Emil Podwysocki
Head
of the Laboratory of Databases
and Business Analytics

Emil Podwysocki is the head of the Laboratory of Databases and Business Analytics at the National Information Processing Institute.

He has more than 10 years’ experience in designing and implementing data processing processes (ETL / ELT), data warehouses and business intelligence systems for telecommunications and the media.

How does a regular day at the newest OPI PIB Laboratory look?

Our regular day revolves around ensuring the continuity of the database systems for which we are responsible. For example, we have the biggest database in OPI PIB – the Integrated System of Information on Science and Higher Education (POL-on). The next routine task is the development of data warehouses which integrate the data of most of the OPI PIB systems and ensure a source of data for the reports, rankings and analyses we generate. Finally, we can’t forget one of the key aspects of the laboratory’s operations, which is the preparation of reports and analyses, starting with answers to relatively simple questions, such as the number of students at a given university or in a particular department, to complex analyses and rankings which require commitment over a longer time period. The answer to your question is a simplification, of course, and I could go on and on as to what hides behind these seemingly simple tasks.

Always ready?

You could say that. Sometimes we get questions about the data, mainly from our primary client, the Ministry of Science and Higher Education to which we need to reply in no more than twenty minutes. For example, in connection with COVID-19, we recently answered questions to which Deputy Prime Minister, Jarosław Gowin was responding to in television interviews. Generally – we have three main tasks.

The first is to submit understandable reporting to government ministries, various public institutions and the media. This involves the daily preparation of compilations and analyses.

What kind of issues do these reports pertain to?

Everything that goes on in OPI PIB. Staring with the students, to employees of institutions, to achievements of employees and publications. It is also about research theses and diploma projects, as well as doctoral and diploma students. We compile reports for the Polish National Agency for Academic Exchange (NAWA) on foreign students who study in Poland. Apart from that, together with the Polish Science Centre (NCN) we make reports on projects financed by the institution.

We try to automate our reporting. We have implemented and are developing a data warehouse, we create interfaces which make reports available that are generated without human involvement.

What else do you do?

The second branch of our operation is the integration of our IT systems – in OPI PIB, as well as with external entities. We are currently integrating the Integrated System of Information on Science and Higher Education (POL-on), the Polish Scientific Bibliography, the Uniform Anti-plagiarism System and the ZSUN system – an electronic proposal submission service used by the Polish Ministry of Science and Higher Education (MNiSW), the National Science Centre (NCN), the National Centre for Research and Development (NCBiR) and the OSF (a funding platform). We are trying to standardise the interfaces to ensure a single integration standard between the systems. It is connected to a data exchange so that it is transparent for everyone. We want the data warehouse to work as an integrator.

We are also integrating OPI PIB with data systems of other subjects, for example the Institute for Educational Research or the Polish Border Guard. We use REST API services for integration with external subjects, https://radon.nauka.gov.pl/api/katalog-udostepniania-danych. In special cases we can adjust the integration method to the needs of our client.

You also do research in the field of data science and science policy.

Yes, research and development is the third operational area at the laboratory.

We create recommendations, publications and analyses, allowing us to uncover intriguing correlations between Polish and international science. We want to anticipate the needs of our clients by using machine learning algorithms for the analysis of data on Polish science. We have the data, as well as the skills, to analyse it in a comprehensive manner. We want our research to have practical applications, and contribute to the implementation of beneficial changes in the science, research and development and technology sectors. Our scientists collaborate with experts from various government ministries, agencies and foundations, helping them to make decisions based on hard data. This way, our research fits into the data-driven policy concept, the policy based on data which is the standard way of operating in the European Union.

I heard about a data visualisation tool…

Yes, we are working on it. We are developing the RAD-on portal: reports, analyses, data (radon.nauka.gov.pl). It is a place where every user will find easily accessible data on the science and higher education sector, and will be able to analyse them on their own. For example, a user will be able to create a graph which will demonstrate the changes in the number of foreign student coming to Poland. Then, he will be able to modify various university parameters, filtering public or private only, depending on the information needed. It will be a unique solution in this country. I hope it will be useful for journalists, political decision-makers, researchers and many others. Already today on the portal, you can find thorough analyses of research on artificial intelligence conducted in Poland and abroad. In the near future, RAD-on will be filled with new and interactive reports, covering topics such as the state of universities in Poland, academic faculties and students.

What is the current stage of your work?

Quite advanced – we intend to make more interactive reports available this year. We are working in collaboration with the Laboratory of Intelligent Information Systems (LIIS), where I used to be in charge of the Database and Business Intelligence Team. My laboratory originates from there, after all.

What kind of projects did you implement earlier at LIIS?

I was involved in implementing the ZSUN2 project from the very beginning, so – among other things – the implementation of data warehouses and the Business Intelligence System in OPI PIB. Open Data services were based on a data warehouse https://radon.nauka.gov.pl/api/katalog-udostepniania-danych. The next phase of the ZSUN2 project in which we took part as the Database and Business Systems Team is the implementation of parts, such as systems integration, the preparation of API for the database, and the Portal for Citizens https://radon.nauka.gov.pl/dane-obywatela/unauthorized.

Currently, as a laboratory, we are collaborating with LIIS on a huge project – national reporting. It is about transferring the data from every university to Statistics Poland (GUS). There are around 1000 different indicators for analysis.

We are also responsible for maintenance of the database systems: POL-on, Polish Science Database (BWNP) and Inventorum. We are responsible for administering and ensuring the continuity of operation of these databases, inputting system adjustments, updates and dealing with device and software malfunctions.

What kind of skills can one develop in your team?

We currently have four employee profiles.

The first is administrators – people who administer databases, applications like Oracle Business Intelligence, Oracle Data Integrator, Oracle Data Guard and Oracle GoldenGate. They must know these technologies – what matters is that they are experts in these technologies.

The second are the business intelligence programmers who mainly programme using PL/SQL, and they design and implement these processes in Oracle Data Integrator. They need to know their way around Bash, Python and Pearl script languages.

The third are individuals responsible for reporting. They use SQL as the data access language. They compile numerous reports.

The fourth are data science specialists. They mainly work with R, Python and SQL as the data access languages. Scientists who compile data on the higher education sector are among them.

So, you can develop widely understood competencies connected to understanding, analysis of data and programming in script languages like Bash, Python, as well as the semi-programmable ones like SQL and PL/SQL.

How do you imagine a good employee – someone who would be a good fit?

That depends on the team he/she would like to join.

If one wants to be a data science specialist then they should be curious, incisive about research. We currently have two people with doctorates, one who will soon be defending the doctorate, and two who share a passion for complicated data analysis. These people have a scientific profile and they have set ambitious research aims for themselves. Whether such a person knows how to programme or not is less important because he/she will learn R, Python, or whatever else they need to learn in time.

Someone who would like to report needs to know the business domain, how the POL-on system works, and what are the employee, student and discipline structures. He/she needs to be well-versed in using SQL language – in writing basic queries to get the data. A bit of an aesthetic sense is useful too because reports need to look impressive: the font can’t be different in each column, important information needs to be in bold, and tables need to be formatted. Not everyone pays attention to this, and how the data presents itself when it leaves our institute is important.

If someone wants to be a business intelligence programmer then our requirements are tough. You need to be well-versed in Oracle databases. Starting with administration, all the way through using SQL, PL/SQL, script languages and Oracle Data Integrator, because this is our main tool. The highest technological requirements are definitely in this category.

What do you value in your work at OPI?

First of all, we have a lot of freedom to do what we want. The OPI PIB Management has given us a lot of freedom, I hope they have not been disappointed (and won’t be) with our work. We define the direction of our own progress. We need to achieve the aims, of course, but the path is up to us. We are the architects of our own solutions. Nobody tells us something needs to be done in this or that technology. We have the knowledge and experience, and we can decide on how to achieve our goals. This is the main advantage.

And the diversity of our projects, of course. We really have huge numbers of interesting tasks and many clients, almost every single one with different requirements and needs. It never gets boring. We have new, interesting challenges every day.