IGF 2021 – Day 0 – Event #120 The importance and challenges of medical data privacy protection in the era of fourth industrial revolution

https://www.intgovforum.org/en/content/igf-2021-day-0-event-120-the-importance-…

The following are the outputs of the captioning taken during an IGF virtual intervention. Although it is largely accurate, in some cases it may be incomplete or inaccurate due to inaudible passages or transcription errors. It is posted as an aid, but should not be treated as an authoritative record.

***

>> ILONA URBANIAK: So maybe let's begin our Panel Discussion here. My name is Ilona Urbaniak. I'm the head of Artificial Intelligence Department at NASK, the Research Institute in Warsaw, Poland, and I'm also an Assistant Professor of the faculty of Computer Science and Telecommunications at Cracow University of Technology. Before I introduce my honored speakers, I'd like to emphasize the importance of the topic of our panel session, and the topic is: The importance and challenges of the medical data privacy protection in the era of the fourth industrial revolution so the fourth industrial revolution represents a fundamental change in the way we live, work, and relate to one another. It is a new chapter in human development enabled by extraordinary technological advances compared with those of the first, second, and the third industrial revolutions.

The fourth industrial revolution is more about more than just technology driven change. It actually is an opportunity to help everyone to harness converging technologies in order to create an inclusive human centered, human‑oriented future. Unlike other industrial revolutions, this one deals more with information, device to device, machine to machine communications, often generate, preserve, and share private information, and in order for the fourth industrial revolution to be a success in health care, critical data and systems must be adequately secured and protected, and artificial intelligence is a key driver of the fourth industrial revolution. Artificial intelligence has the potential to revolutionize health care and help address many challenges.

It can lead to better health care outcomes and improve the productivity and efficiency of care delivery, and at the same time questions have been raised about the impact AI could have on patients, practitioners and health systems, and about its potential risks, ethics around how AI systems and the data that are needed to make AI possible should be used. And the challenges in making medical data public of course have become increasingly important in order for the data driven biomedical research net to advance and the main issues that need to be addressed to ensure that successful use of AI in health care are data access, privacy of data, bias integration, and these are the main factors that have affected the use of AI in health care.

And health care data, medical data, is highly sensitive, and it is subject to regulations such as the General Data Protection Regulation, GDPR, which aims to ensure patients' privacy, and the first important steps to adhere to these regulations and also to incorporate privacy concerns is anonymization which is a process of removing personal identifiers. It is a type of information sanitization whose intent is privacy protection. It is the process of removing personally identifiable information from datasets so that the people whom the data describe may remain anonymous, and once data is truly anonymized and individuals are no longer identifiable, the data will not fall within the scope of the GDPR, and of course it will become easier to use.

And anonymization that meets current standards can therefore be presented to legal authorities as evidence that responsibility has been taken toward patients seriously. And anonymization is extremely relevant in the case for the data used for secondary purposes, and hear secondary purposes are generally understood to be purposes that are actually not related to providing patients care. They are, what I mean by secondary purposes, is research, public health, certification or accreditation, marketing for example would also be considered as a secondary purpose.

So we do take it for granted that sharing of health care data for the purposes of data analysis and research does have many benefits. However, the question is: How to do it in a way to protect individual privacy and at the same time ensure that the data is of sufficient quality that the analytics are useful and meaningful.

This Panel Discussion is concerned with proper data protection such as anonymization while respecting applicable legal and ethical guidelines, and now I would like to ask my honored speakers for their introductory words and I will start with Dr. David Koff. He's a radiologist, Professor from McMaster University, Ontario, Canada. David?

>> DAVID KOFF: Thank you very much Ilona for the introduction and thank you for the invitation to participate in this panel. I think you are addressing a very important topic and I'm very happy and proud to be part of that, so to give it a bit more background as Ilona said I'm a radiologist by training, I being the past Chief and Chair of the Department of Audiology at McMaster University, a large Department of now almost 80 radiologists spread over 6 hospitals, but I had and I still have more than ever a passion for a topic which I think is very important which is the communication of medical images and the integration of images in our work flow.

So I'm the Founder and Director of MIIRCAM, the Medical Imaging Informatics Research Centre, at McMaster, and I spent more than 25 years working on different topics like validation of technology, research around radiation risk, new ways to move and compress images and that's how we met I think 10 years ago when Ilona was working on image compression.

And I also worked on the new standards being the Co‑Chair of AT Radiology International. I created IHE Canada in 2003. I also created something called Canada Safe Imaging, where we worked on radiation safety at the project around artificial intelligence used to address the risk of low dose radiation and through this project we had to access millions of data and so dealing with the issue of safety and anonymization so that's another topic.

And so basically, and I'm still very involved with other projects, where anonymization and identification is key to our success. So that's in a nutshell for me.

>> ILONA URBANIAK: Thank you very much, David. Another speaker of our today roundtable is Dr. Ewa Kawiak‑Jawor from Poland. She's a sociologist from the Center for Technology Assessment from Lukasiewicz Research Network, from the Institute of Organization and Management in Industry.

>> EWA KAWIAK-JAWOR: First, let me thank you for inviting me to take a part in such a great discussion. It's an honor to take a part in this event with such wonderful guests.

As Ilona said, my name is Ewa Kawiak‑Jawor and I represent the Center for Technology Assessment which is a part of the Lukasiewicz Research Network. I'm a sociologist, but I'm also a Ph.D. in health science and I conduct research in the field of technology adaptation, technology implementation, in aspects of health and medicine. I'm particularly interested in the topic of using medical data in health policy planning, but also connecting with m‑health and e‑health in aspects of collecting data and using them for another research or policy planning. I'm also a specialist of sociologist and statistical analysis in the Nutrition Department of Institute of Mother and Child, so I have also this statistics background of using data to analyze the research aspects especially in planning the nutrition patterns.

So I also take a part in one of research connecting with evaluation of vaccine ‑‑ vaccination program connected with COVID‑19 in Poland so we also analyze data connecting also with that aspect. So many, many, many, many aspects for connecting with health, public health and medicine. Thank you.

>> ILONA URBANIAK: Thank you very much, Ewa, that's very interesting. So my third honored speaker is Professor Davide La Torre. He's a full Professor for artificial intelligence and applied mathematics, the Director of the SKEMA Artificial Intelligence Institute from France. Davide?

>> DAVIDE LA TORRE: Hello, good evening. Thanks, Ilona, for inviting me and introduction, as well. So let me briefly introduce what we do, what I do, in terms of research, so my name is Davide La Torre. I'm actually the Director of the artificial intelligence Institute at the SKEMA Business School which is an interdisciplinary artificial intelligence Institute that has been created almost two years ago.

The idea essentially to respond to two different questions and to different issues, one was actually the fact that artificial intelligence is becoming more and more important not only for classical fundamental areas but also for other disciplines, so in France, a lot of business schools now are branching different institutes related to applications, artificial intelligence and machine learning to finance in SKEMA as well.

And of course, the other reason for which this center was created is also to move along with the fact that in this Region, in Nice and where I am right now actually is one of the main places in France for artificial intelligence, and there's been funded by the French Government 5 years ago, to create this hub and of course to sustain all kinds of AI related activities.

So SKEMA is also contributing. It's a Business School with several campuses worldwide. We have one campus here in Sophia Antipolis, but it is a cross‑campus initiative so we have colleagues in Paris, in Lille and the other three main campuses in France, and we have colleagues in Brazil where we have one campus, in Raleigh, North Carolina, and in Montreal in Canada, and we also have two campuses in China where we have other colleagues related to AI and data science.

Now, the interest, my interest actually for artificial intelligence actually started many years ago. At that time we didn't call it artificial intelligence. It was probably separated in fundamental areas, and as David and Ilona my connection with Canada are actually going back to the University of Waterloo where I spent a long time in the past and a lot of my papers are from the University of Waterloo so for a long time I'm a mathematician by training and my Ph.D. goes back to 2001 and for the last, for the past 20 years I've been working on image analysis and applications of images in different areas.

We have been doing things of course at the initial application, the initial interest for images was essentially motivated by the fact that the group at the Department of Applied Mathematics was supposed by several people to do image fractile analysis, for you know that fractiles have been used for describing images because of the beautiful mathematical properties you have with these kinds of objects, with the possibility to compress images with extremely great compression rate, the possibility of extracting images. It's been used with application to medicine for a long time and over the years we've been doing, we've been working on fractiles in related areas and these have been developed over different years and evolving toward different techniques and the other thing that motivated my interest.

AI is from the optimization point of view because actually if you think at the end of the day, once you think about doing image ‑‑ when you think about machine training, machine learning, at the end of the day we're solving an optimization problem, so for those in atomization or related areas that have been working like myself in solving problems with multiple things, this has been a nice area of applications. So over the years I developed this interest and now as head Director of this center, I've also developed an interest towards different application of AI. Because we are in a Business School, we do applications of AI to finance marketing, business, human resource management, classical interests researching areas of interest for a Business School but I'm also doing other things that could be considered a bit borderline but they could have a lot of nice application in next years to come, like for instance the application of face recognition, emotions from images and their application to marketing, for instance, how to say being able to address a certain marketing campaign based on the fact that you attract certain emotions, you attract the customer based on the emotions, but even in finance being able to define what is the risk aversion of a potential investor based on his or her reaction towards a certain investment, if other than calculating or estimating what could be the risk aversion in chat call financial modeling, this could be done by extracting features from images and emotions from images.

So I would say that we have a great group. We're doing a lot of things related to AI and again it's a real pleasure for me to be here with you today.

>> ILONA URBANIAK: Thank you very much, Davide. As a mathematician I can also appreciate the beauty of the mathematical depth and yes we do have some research connections as well with Davide and with David as well, so it's my pleasure to welcome you here and to have this discussion on this important topic.

Unfortunately, two of our other honored guests, speakers, are unable to join us for technical reasons, and, yes, so let's have this discussion on the topic of this panel.

So, yes, the fourth industrial revolution ‑‑ let me go back again to the main topic of our panel. The fourth industrial revolution technology‑driven change and AI of course is the key driver. The fourth industrial revolution of course is changing the way health care is understood, transforming the methods of treatment and diagnosis as well as the relationship between health professionals and patients and altering the management and organization of health systems.

The first topic that we will discuss here is the need to access and expand sufficient high‑quality representative health care data. That need is increasingly growing and it poses a significant challenge in the process of development of AI tools and industry standards.

And here maybe I will start with Ewa to elaborate on that topic, the ethical aspects. Let's start there and balance between the data privacy and the development of data driven research, Ewa?

>> EWA KAWIAK-JAWOR: Yes, yes, all of us have different backgrounds so I'm looking on the data and the data using from the social perspective, also ethical ones, so as you said, there's such a big change. We must remember that the rights of a patient is the most important thing that we have to remember about that, so it is important to discuss some ethical standards to data use more often.

So there's a need to strike a balance between individual privacy rights and the possibility of using data in research or in health policy planning. So we have to consider where is the line between the good of the individual and the good of society. We have to find that line and discuss how we can protect patient rights in that situation.

So before we start thinking about all technical, mathematical aspects of data protection, of anonymization, so we have to remember that we have to consider who and where will be the right to access this data, and what will be the main purpose to use it.

>> ILONA URBANIAK: Thank you, Ewa. And maybe now let's move on to the quality of the data that is being, that needs to be transmitted over networks, that needs to be stored for diagnosis and research innovations. Maybe David you would like to elaborate on that topic?

David, your microphone is muted.

>> DAVID KOFF: Yes, I'm sorry, I'm muted. So first of all, I'd like to go back to the fourth industrial revolution, if you don't mind, because I think we are living in a very exciting time, and unfortunately in purview of this pandemic which didn't happen in a very long time but that has helped us to push the limit of our new ways of communicating, and so the first thing I'd like to do is to say: We have to look at this new development from two sides. One is the clinical side, which is what use do we do with the data? And how do we function around the patient now that mainly for us in Canada we've seen a lot of consultation being provided online, where there's no physical contact with the patient anymore, and for the past two years, a lot of telehealth applications have seen a huge increase to the point where it will probably become part of the mainstream, and it will be difficult to go back. A lot of that has showed a huge efficiency and saved time in terms of transfer, so it worked really well.

The other side is research and education, and what do we do with the data for research and education? So I think we need to draw the line between the two. It's not exactly the same thing. The clinical aspects of things has been developed over the past 20 years and has really shown its huge interest now over the past two years but that has been a lot of work to get to that point.

On the research side, we've seen the artificial intelligence just growing to a point where there are so many applications now. Last time I went to the Radiologist Society of North America, the largest meeting of radiology in the world, there was a floor dedicated to artificial intelligence with more than 100 companies developing solutions just for radiology, and they raised I think at that time it was two years ago more than $1.3 billion just to develop AI applications.

And the big issue we have when we look at that is to develop these solutions you need to have access to data, because you have to use real data to be able to create them and validate, so I think it's critical to make sure that the data which are used for this development are of good quality. You want to have, when we talk about radiology, images which are many diacom. You want to try to avoid any kind of processing. The study we did on image compression was mostly to assess the subjective capacity of a radiologist to use compressed images to get to a good diagnosis, but does it work for artificial intelligence and mathematical developments? I'm not sure. I think you really want to have the true value of your image. You don't want too much degradation on that. There's another thing I heard a lot with working with our engineering colleagues is the assumption that the truth is in the radiology report which may be, but it's not absolute and you need to have as much data as possible to validate your diagnosis if you want to have high‑quality development so you need to have access to a surgical report, pathology report, lab results, the patient notes, Progress Notes, a lot of information which should be aggregated to get good quality.

So having highly curated data labeled, because you want to know where is the lesion that you'll be trying to find? And even if some artificial intelligence models self‑educate and can find this lesion by themselves, you still need to have a very strong and high‑quality data. So maybe I'll tell you about the project I'm working on but this is really what we are trying to achieve here.

>> ILONA URBANIAK: Thank you very much. David, yes, so before we talk about anonymization, of course, access to data is critical. This is exactly where sometimes it is difficult to move from research to the real world, and this is the main problem and of course we'll talk about the ethical issues with the privacy protection of data. These are the main problems which do not allow the access of ‑‑ public access to all the data that is needed to train artificial intelligence models and systems, as we know, because data is ‑‑ without data, without historical data, we cannot have good predictive modeling. We cannot really train any artificial intelligence models, so, yes, this is absolutely very important.

And of course, quality of images, and another thing that I would like to mention here and how do we measure quality of medical images? Of course, this is another topic of research, and that also requires special consideration for this type of images because they are very special. They are much different than the regular natural seen images. They have different type of contrast, and they do require special treatment.

And maybe I will ask Davide to comment on this, maybe the relationship between medical doctors and patients to continue on our topic.

>> DAVIDE LA TORRE: Thanks, Ilona. Before going into this, I would like to briefly mention the fact that going back to what David was saying and in general the topic that we are discussing now about quality of images, and degradation of images, actually one of the topics we have been working for a long time is actually the noise, right, or de‑noising of images. This is such an important thing because when you get as you said it's extremely important to know what ‑‑ to get high quality, good quality images to be able to train the algorithm but the images are typically collected from devices that are subject to noise, or subject to errors, so the image de‑noising is an important step before arriving into a situation where you have something that can be used as a sample and to be used for training a machine. We've been working on image de‑noising for a long time.

As you said, Ilona, medical images in particular MRI images and images coming from these devices are very peculiar, very particular. Oath from a mathematical perspective have been interesting because with respect to classical images they don't have the classical characteristics that you have the color associated with the pixels but in the case of MRI because of the particular structure and you get all these different slices are actually mathematically it's much better to use tools that are essentially trying to take into consideration the possibility of water, molecule that is typical of what we do with these type of devices to detect the probability that the water will move in a certain direction. This type of framework is actually different than classical computer screen or mathematical tools used for computer screens so we've been doing a lot of work in this area because actually we wanted to find the right framework to describe this kind of object in such a way that it was possible then to implement de‑noising algorithm. Total variation, minimization algorithm is one of the classical ones in these different variants but for being able to use this algorithm within medical images actually require some work because these images have to be described in terms of complex mathematical structures, so you need to adapt de‑noising and minimization for these particular objects so that is on going research. As a group last year we published a paper in December 2020 related to this de‑noising of MRI images using particular mathematical framework that we call measure varied images and we showed the goodness of our approach and the de‑noising approach for using particular dataset from the Stanford Repository, so it was actually, it's still a really interesting research topic where there is a lot of work to be done before actually arriving to the case in which as I said before you sample, you get your sample of images that can be used for training.

So this is from one side and then going back to what you were saying before, the quality of images is extremely important, in particular any time we have to deal with medicine. Of course in other areas, it's important that the quality of data that we get, but in certain areas like medicine it's probably more important than others, in particular, I've been working recently with colleagues of psychologists on the importance of the relationship between AI, the role of AI, in the relationship between a medical doctor and the patient and of course, this is extremely important because nowadays, a lot of solutions are coming from medical devices, are coming from machines that have been training on data, and of course, the quality of the data will reflect then afterwards in the kind of decision that the machine is going to take.

So any time that then it comes to the moment where you have to say, okay, well the medical doctor has this solution available. Is it going to move along this decision, adopt this decision, change this decision? So really, the trust that you can have into the data and the algorithm is essentially affecting the relationship between patient and doctor.

And actually, there are people that are talking about the so‑called third element, third actor into this relationship because actually, AI is playing exactly this role, right? It's entering into the relationship between medical doctors and patients, and it's playing a role that has to be still to be defined. I gave a seminar I think it was two weeks ago to students in the Master's for artificial intelligence for cancer medicine, and of course, I tried to stimulate the discussion saying that what happens if a medical doctor one day will be replaced by machine. People were getting excited. Because we don't want a machine to take our job but at the end, it's still, it's a topic that is attracting a lot of attention, not an because there are more and more available tools from the algorithm point of view, machine point of view, data quality point of view, but also because this is actually affecting human beings, and the relationship between humans, human and machines.

So actually, this is, as I said before, AI now is no longer simply the merging of probably the three fundamental disciplines, statistics, mathematics and computer science, but is actually becoming something more complicated and the moment we start having at the intersection with our every day life it is actually becoming more and more important to what is the relationship of these tools with respect to humans.

And actually, at the end of the day, the discussion even if it was a technical lesson, became like talking about how much we have to trust what is the importance of human touch into the relationship between medical doctor and patient, if from a psychological point of view it's easy for a patient to know that they has to go through a cancer treatment or if this is the same thing as can be information, this information can be obtained from a medical doctor, from a person, or from a machine.

And so all these kinds of things that are, we're humans at the end so it's really important to be able to come up with a solution that actually tries to get the best from both perspectives. Of course, we have these tools available. We want to use them. Actually medicine is using them more and more but actually there is a point where probably, you know, the replacement of humans becomes impossible. Well, thanks a lot.

>> ILONA URBANIAK: Thank you. I would like to also add that in general, when we train machine learning models, the data is very specific to the area that is being studied, and in the case of medical imaging, we also need medical doctors to provide the depth and understanding of the features that are important diagnostic features that are important in order to study, to actually make any kind of predictions before we even start training the models.

So, yes, I totally agree with you, Davide, so it's not just statistics. It's not just mathematics and computer science but also in general artificial intelligence needs many Specialists from different areas, depending on the data that is being studied, and it needs to be studied in‑depth.

So now I would like to move on to the second topic of our discussion, which is data bias, integration, and the use of AI in supporting clinical decision‑making, and before we talk about the bias, I would like to give a general bias definition, what it means really. It is a deviation from the truth, and it may happen when we have a sample data that does not represent ‑‑ that do not represent a population, the population that is being studied, or for example when the reported data does not represent the real world.

And if we're talking about an AI system, the AI system, AI model, will not be able to discriminate based on attributes and will not take into consideration for example minority populations, minority diseases and minority manifestation of diseases.

So with regards to bias, of course, we want, we want our sample, statistical sample even when we just talk about classical statistics, we need our sample to represent the population that we are studying, and the factors that could affect such representativeness, could be socioeconomic factors that represent the quality of data, Regional, genetic, lifestyle‑specific, to a given population and of course we face these challenges when we obtain these datasets representative datasets. So maybe I will start with David on data bias please from the radiologist perspective.

>> DAVID KOFF: Right, so data bias is definitely about how much attribute do we remove when we anonymize and how much do we try to neutralize or make a standardized population. As you said that doesn't work for AI development sometimes. You will make the assumption that you take appropriation of 100,000, 300,000 and you'll average and you will end up thinking everybody is kind of the same, but it's not true, and we find that minorities are not properly represented in large data samples and minorities doesn't mean only racial or gender minorities but it's also disease minorities and it's definitely more complicated to develop algorithms for rare disease than for standard things.

So that brings me really to the next topic for research, is how much data do we need to make research relevant? And how do we deal with this? Should we have multiple small datasets of minority populations or what is considered minority populations that we can include in the larger research? Which means replicating the same developments in different environments, so if we know that we want to develop a long nod you'll algorithm for instance, yes, we will do it on the general population, but there are some people who will be more prone to develop a cancer than other people, and then should we then create smaller clusters? So that's a pretty complicated issue, and we find out totally about the bias element is much more complicated than we thought at the beginning and definitely has to be taken into consideration.

So how do we do that? It's still a very open discussion. We had a number of interventions. I don't have a solution for that, and I would love to participate in more research on how do we address the bias issues, but it's like facial recognition for instance doesn't work the same way for everybody. It works better on men than women, for instance, and it may lead to big disasters if you put people in jail because you make wrong recognition, so we have to be careful with that, too.

>> ILONA URBANIAK: Yes. Thank you very much, David. Yes, it's important to emphasize the risks that could come out of biased data, and absolutely, and especially used for medical purposes.

And maybe, Ewa, you would like to comment on that, as well?

>> EWA KAWIAK-JAWOR: Yes, thank you. It's very important when we think about using the data for example for researching public health and policy planning but our discussion shows that we are totally in a different point of time, of implementation of data using, in Canada and in Poland. In Poland we have such a background basic problem such as integration of data, access for data, medical data, to use it for more statistic purpose, to create a better medical research or create a good health policy.

So when we consider this aspect of bias, we have to remember about some demographic aspects, but also the environmental, lifestyle aspects, but for us today in Poland, access to data is more important actually to good‑quality data, which can be analyzed in the statistics operation, which can be used for AI training for example. If we're not using our data for our social, for our social specificity, we cannot sure that that AI algorithm will be proper for our process of training, and for our process of treatment.

>> ILONA URBANIAK: Thank you very much, Ewa. And Davide, would you like to have a few words?

>> DAVIDE LA TORRE: Yeah. I would like to add a couple of actually sentences about what has been said so far about this topic. Actually, data bias is extremely important but I would like to remind the fact that we have with artificial intelligence one important issue that is related to explainability. Actually this in AI has become such an important topic, and actually been found a name to describe this area, explainable AI. You have to think in general even if from a mathematical perspective I'm not particularly obsessed by the fact that a certain algorithm is providing, is associated with a certain bias, because at the end of the day it's depending on how much sample size so it's perfectly understandable.

But from the point of view instead of going back to what we were saying before, the fact that these algorithms are used in the relationship between medical doctor and patient and the algorithm maybe that is composed like in most of the cases like a complex neural network, a lot of layers and neurons and the machine at the end sees okay, you have to take a surgery, and especially when the architecture is so complex, it's actually come up with the question that ‑‑ raise the question that how much we can actually explain, right, the patient why the machine is saying this I think is quite difficult to say that I mean one million neurons were saying this so you have to take a surgery now, so this topic is becoming really, really important and the way that people are describing this particular problem, they usually describe this AI algorithm like black box things where essentially now the architecture has become so complex in terms of as I said particular structure of each layer that at the end of the moment, at the end of the day when you train your networking you say okay the network provide this kind of forecast it's difficult to say why it's doing that, and so this is an extremely important topic that moves along with data bias.

The fact that even if you have the perfect data of the world but the machine is still providing these kind of problems, at the end of the day from the patient point of view or the person who needs to implement this decision, still there are a lot of problems to be solved, actually.

>> ILONA URBANIAK: Thank you very much. Yes, of course, absolutely. And now maybe Ewa a little bit more from the sociologist perspective about the patients, how the patients respond to AI technology and supporting clinical decision‑making. Give me a couple more words.

>> EWA KAWIAK-JAWOR: Yes, thank you. As Davide said before, with the use of AI or any e‑health or m‑health solution has a strong influence on the doctor‑patient relationship so my research in this area show that doctors are concerned with using such tools. They're afraid that that tool will undermine his own decision, so on the other hand, we have to remember that the patient have to trust such a technologies. How can they trust if the doctors themselves are afraid to use them?

So you mentioned also the holistic model of personal care models which are the most important trend in reorganization of health model, so we have to remember that the model assumed greater participation of patient himself to in all treatment process, so for example the use of m‑health solutions are a good example of this process so in that case, data protection, so in this type of software, it's matter not only for development of application but also for end users, patients, so it's important to provide property education of the patient and they have to know more about safety of using this type of tools, such as m‑health, medical application so it is important to get the protection of that data.

>> ILONA URBANIAK: Thank you very much. We can all see how complex this is, this topic. And so let's move on to another topic, the next topic of our discussion, which is significant loss of the data utility due to excessive anonymization, and just to go back to the definition of anonymization, it refers to an irreversible transformation of data in order to prevent the identification of a particular individual.

Of course, irreversible we mean here ‑‑ by irreversible we mean that it must be impossible to reidentify the person in question directly or indirectly, and in the context of medical data, anonymized data refers to data from which the patient cannot be identified by the recipient of the information. The name, for example, and address and full postal code must be removed of course together with any other information, in conjunction with other data, yes, held by or disclosed to the recipient that could identify the patient, and some alternatives to anonymization, we hear these terms pseudo anonymization or deidentification, and encryption and maybe, David, you would like to talk about these different types and how do they differ.

>> DAVID KOFF: Yes, so thank you, Ilona. This is really the big topic and I know we don't have much time left to talk about that but I'll try to keep it brief.

This is the major issue in our hospitals with the privacy office so scared that it can be leaked and people may have a way to reidentify, that they try to secure the data as much as possible. That's not practical, and I belong to the Canadian Association of Radiology the Artificial Intelligence Work Group and this Work Group created a subsection, the Ethical and Legal Working Group, so they just issued a few statements, so to go back to anonymization, and you gave the definition, but we, I'd like to give the definition of deidentification which is the process of transferring direct and indirect identifiers and possibly implementing additional controls so that the likelihood of data subject being correctly identified from the information is very small and non‑existing, so it's very small and the circumstances of use of disclosure.

And HIPAA in the States listed the identifiers we need to remove to keep the risk very small. The ethical and legal Working Group has made the decision to forego the term anonymization so we don't want to hear about anonymization anymore. This is something which doesn't work, having zero risk is like if you say the safe highway is the highway without cars. That cannot work.

So you have to understand that there may be some risk, so our Working Group has recommended the use of deidentification to be used in place of the word anonymization and that's the first recommendation that we came with, which is any custodian of patient data such as a hospital, research facility or health authority needs to be comfortable with a small level of risk that an individual's information in the dataset can be potentially reidentified. The only way to achieve a zero risk of re‑identification is not to make any data available, and of course this idea would inhibit advancement of research and our understanding of medicine, and would prevent AI from being developed for patients.

So this is our position at the moment, and we think that public datasets should be encouraged and released if the data can be deidentification to low re‑identification risk. Public datasets promote openness, facilitate sharing, inspire national and international positiveness and give groups in engineering and computer science the opportunity to work with data that would otherwise only be available to medical professionals.

In the end, public dataset release facilitate more work and create the chances to assist more people around the world. That's exactly what we're doing, me and my level with my university where we're developing a dataset that we want to make public, which is why we're going through huge extent to make sure that our privacy people are comfortable with the idea of deidentification and that we'll be able to publish the data on the cloud and our group at the Association of Radiology the same project to make it countrywide, so a countrywide project to have datasets made available for research and sandboxes to train the AI algorithm for everybody, so we want openness and we want people to understand, sure, there may be a small risk but we have to live with that. There is no way we can do any progress if you want to include everybody in the same way.

>> ILONA URBANIAK: Absolutely but this brings me to our next topic which is actually defining some standards for medical data sharing and collection. And what about, David, about how it is done in Canada. The implicit versus explicit consent of patients before the data is released. And this is very important to actually emphasize the contrast actually from Canada and Europe.

>> DAVID KOFF: So there are two aspects again and I'm going back to what I said at the beginning, the clinical side, non‑clinical side so if you talk about ‑‑ then as long as the data non‑anonymized data, so the full data remain within the circle of care and are not accessed outside the circle of care, there is no need for consent. Consent is implied, and the only way you can change that is if the patient expresses clearly in writing that he wants to remove his consent, and doesn't want his data to be transferred.

And as we do telehealth, the radiology, a lot of remote consultation, access data on our digital image repository to access the previous, there is no way we can ask patient consent so this is the implied consent for clinical data. For non‑clinical data, then the Council is ‑‑ CHR is the main research organization in Canada which govern the ethical conduct for research, so they define the secondary use of data as the use in research of information originally collected for purpose other than the current intended purpose, so patient didn't give their consent for that either. Individual consent is the gold standard of course if we can get it, with regard to use of medical information and is required by law when medical data can link back to an identified individual.

However, another however, that's what I like in Canada, we always have a however, it is also defined when the secondary use of medical data can be used without individual consent, and this is said in the Policy Statement, they don't mention clearly deidentification but they just say, give a list of recommendations for what researchers should do and this Section says the Researcher will take appropriate measures to protect the privacy of individuals and to safeguard the identified information. This is the deidentification, and that's it, so no need for consent anymore, as long as we ‑‑

>> ILONA URBANIAK: Thank you. That is different. Thank you very much, David.

>> DAVID KOFF: ‑‑ from what I understand?

>> ILONA URBANIAK: Yeah, the GDPR is a little bit different. It's explicit, it requires explicit consent to the patient or user must press some kind of agree button to give the consent to release the data. So we only have 4 minutes till the end of our panel, so I would like to touch on some more things that are very crucial and that are important, and one is over‑anonymization, and how would that affect the training of AI models? And two words maybe from Davide on the federated learning versus anonymization in the context of medical data. Davide?

>> DAVIDE LA TORRE: Yeah, thanks. Well, actually, the idea of federated learning is actually coming from the fact that many times, when we want to train a certain machine, actually we ‑‑ it might happen that the forecast that this machine is producing is actually totally biased by the sample that we have taken into consideration, so this is actually an idea that came through investigation recently to address two kinds of things. One is the idea we wanted to avoid this problem related to the fact that if we have different datasets, each dataset if we train a machine then this would produce different forecasting so how we combine them, how we can actually determine if there is a kind of average solution among them.

And the other one is due to the fact that because we have more and more data and because the training process becomes more slower and slower once you have a lot of data and you have to minimize your loss of function over this big dataset, one possibly is to split the training over different datasets and do this in parallel, do this kind of operation in parallel, so localize on each server the possibility of training a single loss function, then combine again them at the end. My group doing some research on that, because we're investigating how it's possible to combine these kind of different perspectives, different kind of loss functions, that have been trained on different datasets using techniques that are called multiobjective techniques so essentially the idea is to balance, to come up with a kind of model that takes the result of a balanced combination of the two different machines that have been training on different datasets.

Now, of course, this can ask also an impact on anonymization because as I wanted to add one thing about what David was saying actually before, I agree on the fact we should say the probability identification is always so small. In principal from a math matted call perspective if you pay the price of forever, you can always de‑anonymize a certain thing. It's the time at the end of the day, not the problem, not being able to identify the key that been used to encrypt a certain image is just a matter of the fact that this is used in other areas as well.

If this is taking forever essentially you have protected your data because you will not be able to decode or be able to identify the information in a certain amount of time, a reasonable amount of time. The federated learning can move along this because can help to integrate the different recognition techniques, the techniques, and so we'll see what is going to happen in the future about that but federated learning is going to offer a nice perspective as I said before to reduce the amount of time that is required for training, and also to correct some bias we have during the training process.

>> ILONA URBANIAK: Thank you very much. Yes, thank you to the speakers for joining me in this discussion, and, yes, our time is up, so thank you very much.

>> DAVIDE LA TORRE: Thanks a lot, Ilona again.

>> DAVID KOFF: Thank you, Ilona.

>> EWA KAWIAK-JAWOR: Thank you and I have to mention we don't have a question on child but I encourage everyone to connect with us on our LinkedIn profile if that question maybe later shows.

>> DAVIDE LA TORRE: Perfect, thank you. Thanks a lot.

>> ILONA URBANIAK: Thank you, bye‑bye.

>> DAVID KOFF: Bye‑bye.

[ End of session ]

IGF 2021 – Day 0 – Event #120 The importance and challenges of medical data privacy protection in the era of fourth industrial revolution

Contact information