If the models of artificial intelligence aren’t explainable, then specialists (e.g. in medicine) must fully trust the technology, even though their own signature is below the diagnosis. We are working on a system in our laboratory which will be able to answer what exactly forms the basis of such predictions. Monika Redzisz talks to Piotr Sobecki, the Head of the Applied Artificial Intelligence Laboratory.
Piotr Sobecki – the head of the Applied Artificial Intelligence Laboratory at the National Information Processing Institute. He is a PhD candidate in computer science at Warsaw University of Technology, Faculty of Mathematics and Information Science, has an MSc in computer science from the Polish-Japanese Academy of Information Technology and studied psychology at SWPS University of Social Sciences and Humanities.
In 2018, he was one of the experts preparing the tenets for the development of artificial intelligence in Poland. His scientific interests are: medical information science, deep learning, computer image recognition, signal processing and cognitive psychology.
Monika Redzisz: What does the Applied Artificial Intelligence Laboratory team at the National Information Processing Institute do?
Piotr Sobecki: We concentrate on the application of AI in medicine, among other things. We standardise the procedures, work on image analysis methods and create the artificial intelligence models needed to support diagnostics. For example, we analyse the problem of prostate cancer diagnosis using magnetic resonance imaging (MRI). Prostate cancer is the second most common cancer in men – it’s estimated that every sixth man will be diagnosed with prostate cancer during his life.
Unfortunately, its diagnosis can be complicated.
For several reasons. The screening for prostate cancer is the measurement of PSA in the blood, that is a certain enzyme. A high level indicates some sort of anomaly. In this case, further examination is required. On the basis of the MRI scans, a radiologist will have to decide whether malignancy is present. Subsequent decisions, such as whether to perform a biopsy, will depend on that. In the case of prostate examination, a biopsy is neither easy nor comfortable for a patient. It involves the gathering of material for screening from several to a dozen incisions.
When, based on a MRI, we have precise enough methods to evaluate potentially malignant lesions, we’ll be able to significantly reduce the number of patients who are referred for biopsies.
At what stage is the work currently at?
We’re creating the tools to support this process. One of them is a deep neural network which determines the probability of a malignant change based on an MRI of the prostate. Its effectiveness is estimated to be around 85 percent, which is close to that of a radiologist. The algorithm evaluates the significance of a particular change on a percentage scale (for example,10, 60 or 80 percent). It is analogous with the PI-RADS classification used by specialists.
And what if the probability of finding malignancy is sixty percent? What is a radiologist able to do with that?
Exactly. In order to understand it, we need to take a step back, to define what it is that a diagnostician does. Let’s imagine a situation where we clean up our inbox and evaluate the e-mails, deciding whether they’re spam or not. We’re going to rely on our intuition to do it on the spot. Having a system for classifications such as this, we are informed that the system assesses that there’s a sixty percent probability that a particular e-mail is spam. This is an extremely simplified description of a diagnostician’s job.
He/she got a result from a machine: sixty percent. But why sixty, exactly? He doesn’t know. How is he supposed to use this information? Sixty percent probability isn’t enough: it doesn’t tell us anything, because it’s close to a random decision. But even if it were – let’s say – eighty percent, then a doctor has a problem too, because there isn’t enough information as to why the artificial intelligence has made that decision.
We need a lot of data, but the largest accessible database of prostate imaging has only 350 cases
Currently, it’s the biggest challenge for scientists working on artificial intelligence: how to create explainable models that aren’t black boxes, meaning the ones that are capable of answering what the basis is of system prediction.
At our laboratory, we are working on an artificial intelligence-supported system (eRADS) which will be able to do that. We have specified over 100 parameters that have been partly converted to decision tables. In the final version of the tool it’s going to look like this: first, the system will perform an automatic analysis of medical images. Then, a doctor will get a generated report of the assessment in the PI-RADS standard. His job is to verify the structured medical report. Currently, such a description is in the form of a written report, which prolongs diagnostics and narrows the scope of the diagnosis.
If at a certain stage, a physician has doubts concerning the automatically generated report, he/she can check where that particular result has come from – see which intermediate variables have been calculated and modify them however he wants. He can find out, for example, that the algorithm at one point determined that a lesion is moderately hypointense, while he thinks it’s significantly hypointense. He will have over 100 calculated parameters at his disposal, with full control over each of them.
Entering the black box…
Yes. If the models aren’t explainable then the specialists must fully trust the technology, even though they’re the ones leaving their signatures below the diagnosis. People introduce hybrid methods, of course. For example, the algorithms highlight the areas of benign transformations so that depending on the colouration, it might be easier to evaluate it. Then, a colour can point to a probability that a particular transformation is malignant. This really helps a specialist to determine what can influence the model prediction.
Where do you get the data for algorithm training?
Exactly – that’s the problem. We need a lot of data, and currently the biggest database of prostate imaging totals only 350 cases.
The eRADS System: Choosing the location of the transformation as part of the tool. The location will determine the way of how the detected abnormalities are evaluated. Source: eRADS system, Applied Artificial Intelligence Laboratory.
In order to train the models, we had to artificially augment the data set – introducing certain distortions and image noise. It’s a common practice in a situation when there’s not enough data for AI training. Our networks consist of several million parameters, and we have only 350 cases at our disposal. The network is capable of memorising each pixel. The extension causes the network to ignore the details which identify a concrete case and aren’t significant for the issues modelled .
Artificial intelligence models are as good as the data they have learned from. It’s about their quantity, meaning how many cases are in a given set, as well as quality.
Is there no way to collaborate with hospitals?
We collaborate with the department of radiology at the Centre of Postgraduate Medical Education in Warsaw. The problem is such that research centres usually only have the imaging data and its reports. A patient comes, gets an MRI, gets the results, and goes out into the world. We don’t get feedback as to what happens to them afterwards. Patients come and go, and we have a problem to integrate their images and reports with a subsequent health status (e.g. biopsy results). Moreover, even though the international radiological diagnosis standard, PI-RADS (Prostate Imaging and Reporting and Data System) has been around for a couple of years, every physician essentially has their own way of reporting. Reports written in this form are difficult to convert into a data set suitable for artificial intelligence.
This gives a sizeable margin of subjectivity…
Yes. A diagnostician determines the characteristics of a detected finding. Is the lesion homogeneous or heterogeneous? Homogeneous or not? Every doctor has their own sense of this homogeneity; one considers it homogeneous, another not. Or, how do we subjectively determine the brightness? Is the brightness weak, moderate or intensely hypointense? Moderately dark or deeply dark? The diagnosis depends on this. Of course, sometimes the case is obvious and there’s nothing to think about. But everything in between is up for discussion.
The eRADS system: The description of each trait of the described lesion allows for the automatic definition of its clinical significance probability. Source: eRADS system, Applied Artificial Intelligence Laboratory.
We propose our proprietary solution to this problem: we have prepared a special tool for radiologists. The doctor’s task will involve the definition of traits on a form instead of writing it by hand. High-quality data sets will be obtained during that process, will be used for research on the malignant lesion assessment standards. Thanks to this approach we get a beautiful database for artificial intelligence algorithms, and on the basis of the filled out form, a structured report is automatically generated.
We’ve just started our work. We gather thirty cases for assessment by six radiologists: three of them experienced and three inexperienced. A retest is conducted after a month. This is also interesting from a psychological point of view: we will conduct this study in collaboration with the cognitive psychologists at the SWPS University of Social Sciences and Humanities. Will significant differences arise between radiologists? Will a physician assess the case after a month the same way they did the first time? I’m really curious what the result of this is going to be. My experiences, apart from computer science, cover psychology too. While I was studying, I worked on a tool to study susceptibility to optical illusions. (VIS – Visual Illusion Simulation), because we’re all born with a certain degree of susceptibility. And now the question is how susceptible to illusion are radiologists?
When, based on an MRI, we will have precise enough methods to evaluate potentially malignant lesions, we’ll be able to significantly reduce the number of patients who are referred for biopsies.
How do the brave ones feel, the ones who volunteered to take part in the project? Are they afraid?
Possibly. People don’t like to discover that they could be wrong. But they realise the system will serve to support radiologists.
But also a threat. Won’t this method soon force them out of work? Or, at least, won’t they lose their autonomy and professional intuition?
I think it’ll be dramatically different. There’s a shortage of radiologists. The MRI machines can work 24/7, but who’s going to analyse the scans? There are too few physicians. The system will make their job much easier. They will be able to analyse more scans at the same time, while their job comfort and the quality of their assessment and reports will increase.
Is there a visible generational difference in the approach to these methods?
Sure. Studies show that experienced specialists often think: ‘’I know what I’m doing. I’ve been doing this for so long. I know my effectiveness. Why would I need some tool, telling me what to do?’’ The younger generation are the opposite: they’re open to the introduction of solutions like this.
Unfortunately, thirty cases is a bit low. If we had 500, then it would be something. But such a dataset costs a lot. That’s why we’ve applied for a grant to the National Science Centre to collaborate with the Oslo University Hospital. As part of the collaboration, datasets from Polish and Norwegian research centres would be integrated, containing notes on a specially designed tool. This would be the biggest and the most precisely described open source mpMRI prostate data set that exists in the world.
I’m most pleased about the fact that the eRADS methodology we devised for prostate cancer MRI diagnosis will be a fantastic foundation to work on other neoplastic diseases.