Laboratory of Statistical Analysis

We will be needed for a long time

Among the professions of the future, there is the big data model designer, for example. I would describe the individuals working at our laboratory today as data scientists rather than traditional statisticians. Michał Rolecki talks to Marzena Feldy, PhD, the Head of the Laboratory of Statistical Analysis, National Information Processing Institute.

Marzena Feldy, PhD
Head of the
Laboratory
of Statistical
Analysis

Marzena Feldy has a Ph.D. in economic sciences, in the discipline of management science, and is an adjunct at the National Information Processing Institute – OPI PIB. She is the head of the Laboratory of Statistical Analysis where she focuses on research on science and innovation policy. She is the author of publications and analyses of the higher education sector, as well as consumer behaviour and marketing communication.

She is a graduate of the Warsaw School of Economics and the Faculty of Psychology at the University of Warsaw. She uses an interdisciplinary approach in her research, applying knowledge of management and psychology.

Michał Rolecki: The Laboratory of Statistical Analysis… what does that mean?

Marzena Feldy: Most importantly, it is a place where you can hone your research passion. As part of an internal organisational structure, we co-direct the research and development department at the National Information Processing Institute – OPI PIB.

So what do you do, exactly?

We uncover the unknown, the uncharted. We fuse our visions, experience and knowledge from the various fields in which we specialise, to uncover patterns and research trends in science and higher education sectors.

The results of our research let our decision-makers establish science policy based on evidence. Our analyses show which science fields in Poland have the most dormant potential and which of these areas ought to be funded more to awaken this potential. As part of the research, we consider which steps need to be taken for the collaboration between science and business to improve. When such a need arises, we evaluate the programmes already functioning through the prism of solution efficiency that is implemented as part of these programmes.

Are statisticians the only ones who work at the laboratory?

And here is where I am going to surprise you! The name of our laboratory could certainly suggest that. The skills in doing quantitative analyses are really in demand and useful for solving tasks which we come across at work every day. But it is not everything.

There are quality testers working side-by-side with statisticians in our laboratory. Not all research issues we handle can be solved by applying statistical methods. If, for example, we want to know the causes of some phenomena, it is vital to reach people who are affected by these phenomena and interview them. The knowledge from qualitative study often allows us to design a better survey later on. When, for example, we want to know why students drop out of university, we first ask them about it as part of interviews – in order not to omit important causes in the survey which we might not have thought of ourselves.

While recruiting to the team, we pay more attention to what kind of skills and abilities a candidate has, where the candidate is going in life, and whether the candidate wants to develop his/her research skills in the field of science and higher education, rather than looking solely at the candidate’s background and previous professional path.

We are search for employees with expertise in data visualisation. Apart from analysing a particular phenomenon, the ability to present the results clearly and coherently for laymen is crucial for us. An aesthetic sense is an advantage, which is why a good researcher is not only a quick-thinking professional in his/her field, but also an artist to some extent.

How does one work with 21st century data? Is it not the case that the proper software can calculate everything after pressing ‘enter’?

In the era of business intelligence it can really seem that everything counts itself as if with a touch of a magic wand. And if that were the case, we probably would not be needed.

The 21st century is about working with giant datasets which grow astronomically and exponentially. This data is gathered in various systems, each of which has its own specificity and limitations. This is why ensuring data compatibility is so important.

What exactly does this mean?

It means adjusting the data format to be used in different IT systems and databases.

Secondly, a thorough understanding of the data with which one works is crucial. This, in turn, gives us a chance to differentiate valuable from useless data. We also need to have knowledge about the completeness and credibility of the data which we analyse. If, for example, the data is incomplete, then certain actions at the stage of analysis are essential, which will allow us to use the data despite that limitation.

We need to be aware that the input data we receive determines the quality of analyses we get at the end. This has always been the case – it is just that the datasets we analyse today, are much bigger, and the dependencies in the system which store the data are much more complex. Luckily, we also have more advanced analytical tools.

The questions which our laboratory receives usually have a unique character, and require advanced analytical work and the creation of many code lines in the R programming language before we can get the results. In the end, the result needs to be interpreted and an expert is much more suitable for this role than a layperson sitting in front of the most advanced and ‘self-calculating’ software.

What challenges do you face in these analyses?

One factor which makes our jobs more difficult is the limited availability of high quality data and the limitations I have already mentioned. Sometimes an intriguing research question pops into our heads for which we hope to get an answer from the data available to us. After an initial analysis of the datasets we have, it turns out that the data is incomplete and concluding anything on the basis of that data would be a big mistake. We then search for other research methods which will let us find an answer to pressing questions, or re-formulate the research problem in such a way so that we can partially solve it, bringing us closer to the answers.

Is there anything in statistics which can make a person happy?

In the majority of research there are assumptions, hypotheses are made. In other words – we search for the confirmation of certain intuitive conclusions which stem from the previous research and discoveries, literature and the observation of the surrounding world. It feels great when these expectations and intuitive conclusions corroborate.

Discovering correlations which are not obvious also makes us emotional. For example, it may seem that due to demographic changes, there will be fewer private university students. As it turns out, it is the public universities which record larger drops in this area than private universities. In 2014-2018, it was -17% and -2% of the total number of students, respectively. The number of students studying part-time at public universities went down even faster – 25%. At the same time, private universities had a 39% uptick in full-time studies!

In statistics, you also need to be prepared for disappointment. It can happen that, even though all signs on heaven and earth point to a correlation between specific variables, we do not find a matching correlation in empirical data. We then turn back to previous stages of the research process, verify our assumptions and continue the search.

As the Laboratory of Statistical Analysis and Evaluation, you researched the fate of graduates in Poland. Why would anyone need to know how much people earn after graduation?

I can’t believe you are asking this! Even simple human curiosity makes us want to know whether our neighbour is better off than we are. And in all seriousness, the Polish Graduate Tracking System – ELA for short – is being implemented on the orders of the Minister of Science and Higher Education, who has to undertake such an initiative as the Law on Higher Education and Science compels him. We are working on the fifth instalment of the project this year.

It is valuable knowledge for various groups of people. The biggest group which benefits from ELA are individuals who are thinking about commencing their studies, or continuing them to the next level and want to make an informed choice on the subject of their majors. When choosing a career path it is best to be guided by your interests, of course. When you like what you do, you are going to do it well, and the money will come to you. Still, the knowledge of your salary several years after graduation, as well as how long you are going to look for work (we also provide such information as part of ELA) lets you plan for the future better. It allows you to choose a dream major at a university which maximises your chances of finding a well-paid job after graduation.

And this is where we come to an important group of project ELA stakeholders, namely university decision-makers.

What is in it for them?

The information we provide is extremely useful for individuals who compile university prospectuses and programmes. It allows for the comparison of graduates’ fate from the same university, albeit with different majors, as well as students of similar majors at different universities. Thanks to this monitoring, university decision-makers know which faculties’ graduates are in demand on the job market and, on the other hand, which faculties’ graduates are not sought after by employers. Moreover, if the graduates of competitive study courses are more highly valued by the market, it can be a signal to take a closer look at the study programme and consider what kind of changes should be implemented, what kind of subjects should be added, what kind of competencies the students ought to be equipped with to make the process of finding a job after graduation easier.

The next group which could find the information gathered by the ELA project useful, are administrative workers. On that basis, they can point future students who are hesitating which faculty to choose, towards programmes which will increase their chances of finding a well-paid job.

Finally, I think that ELA is an important tool for the Ministry of Science and Higher Education, enabling higher education policy based on evidence.

You were commissioned by the ministry to analyse the artificial intelligence sector in Poland. What were the results? Are we at the top, or the bottom, of the world?

I would like to say we are at the top. But for this to happen, we would need to seriously increase our funds for education, and particularly for artificial intelligence. That is why I would rather you asked me if Poland has the potential for artificial intelligence to develop, as this is what our research focused on.

We studied, among other things, the national human resources in the field of artificial intelligence, AI projects being implemented and courses at universities which let you gain qualifications in the field of AI. The results encourage us to think we have potential which will allow us to find a niche in the modern economy.

Our capital consists of amazing mathematics and computer science graduates who win prizes in competitions on a global stage. Computer science ranks first in the list of BA studies and MA studies with the highest number of accepted candidates – over 20,000 in the 2018/2019 academic year.

And how about education in the field of artificial intelligence?

Our analyses show that the highest number of theses centering on artificial intelligence were defended between 2010-2018 at the Wroclaw University of Technology, followed by Lodz University of Technology and the University of Information Technology and Management in Rzeszów. AGH University of Science and Technology or Warsaw University of Technology which we would intuitively place at the top of the ranking, were 12 and 16, respectively. It certainly is surprising.

It is worth emphasising that our universities educate at a high level, which is demonstrated by the resulting brain drain of Polish specialists to foreign countries. This is the biggest issue which can hamper the development of artificial intelligence in Poland. It is essential to create a work environment for young people comparable to the ones abroad.

Statistics from 2014, pertaining to graduates of computer science majors with artificial intelligence as the specialisation, show that they could count on 36% higher salaries than other graduates. The margin increased all the way up to 47% after four years. It is a saddening fact that not everyone has a chance to be paid equally well. Our analyses show that high salaries are mainly the domain of men, who earned 38% more, on average, than women who had graduated from the same kind of faculty during the first year after graduation. What is more, this discrepancy grew to 53% after graduation.

This shows how much still needs to be done, as well as how necessary the monitoring and analysis of the phenomena we work on is.

Aren’t you concerned artificial intelligence will leave you jobless? After all, it is a great analyzer of big data sets, it draws conclusions…

Well, a Russian team compiled the Atlas of Emerging Jobs in which I saw my profession as one destined for extinction by 2030. It was right there, alongside journalists – so you ought to feel like an endangered species, too.

I already feel it – there are bots writing telegrams.

But I can also offer some comfort: among new professions, there is an Info Stylist who serves with the use of streaming algorithms, with content and a presentation method adjusted to the expectations of a particular user. So there is still hope for you!

What about hope for you?

I think statisticians will still be needed, just in a new capacity, with their skills enhanced by computer science. And it does not seem to be anything unusual, given the permeation of technology in most professions. Among the emerging professions, the Atlas of Emerging Jobs lists ‘big data model designer’, for example. Even now, I would describe people working in our Laboratory as ‘data scientists’ rather than traditional statisticians. And when we look at the national salary ranking of university graduates, then the second place, right after computer scientists from the Jagiellonian University, goes to Big Data graduates at Warsaw School of Economics. The demand for data analysis qualifications is still considerable. We can sleep soundly.