You are here

IGF 2019 – Day 3 – Raum IV – WS #30 Let there be data – Exploring data as a public good - RAW

The following are the outputs of the real-time captioning taken during the Fourteenth Annual Meeting of the Internet Governance Forum (IGF) in Berlin, Germany, from 25 to 29 November 2019. Although it is largely accurate, in some cases it may be incomplete or inaccurate due to inaudible passages or transcription errors. It is posted as an aid to understanding the proceedings at the event, but should not be treated as an authoritative record. 

***

 

>> MODERATOR:  Hello and welcome.  We are going to start with the workshop now, my name is Lea Gimpel.  I am colead of test project Artificial Intelligence for All implemented by the German Development Agency, and I am your moderator for this workshop session.  We will have a short introduction and then we will have insights from six speakers sharing in five minutes as they take on digital commons and data for public good.  Before we make this a real workshop, we will break this out into Working Groups for 25 to 30 minutes discussing some of the key aspects our speakers will have highlighted in their introduction.

And at the end, we will all come back here together in the bigger group and share the insights that you discussed in the smaller groups.  But let's start with short introduction to the topic of the workshop.

Data as a fundment of today's digital society and of innovative technologies such as Artificial Intelligence and machine learning.  But data is also deeply problematic in more than one dimension.  First of all, there is a data gap.  So there is just not enough data available in order to develop, for instance, voice assistance in regional languages and local languages from the African continent.  There is just no Siri in Kenya, and especially as a source and the African continent produces only a tiny, tiny fraction of the data global currently available.

So we definitely need in general more data.  Then we have the whole problem of data access.  Most of the data that we have is locked behind doors currently or it's very expensive to obtain, which, again, reflects the power that we are seeing in the digital economy, because most of the data, of course, is held by multi national corporations who use this data to develop their own products which, again, then produce data, so these products can be improved.

And the feedback loop actually leads to quasi police in this space.  And third there is the topic of data bias and representation in data because data, of course, reflects our world, how we see the world, and this world, of course, is deeply biased.  So who is actually represented in data?  How about gender issues?  How about marginalized groups?  All of the aspects are baked into the data we are working with.  So as at the German Development Agency we do have an interest in empowering partners in developing local Artificial Intelligence solutions in order to solve local problems because we believe that in this technology lays a huge potential, for instance, to give people access to data and information or to services by, for instance, providing access and voice recognition in languages such as I have mentioned.  Because plot everyone is speaking English, in fact, only 20 percent of the global population and only 5% speak English as the mother tongue language.

We have to tackle the problem and this is the access and availability of data.  So by opening up data and discussing models such as data commons and data for public good we hope to level the playing field for technology development.  We would also like to foster value creation by allowing local entrepreneurs to develop solutions for local markets.  And, of course, being able to express yourself in your own language also gives you the possibility to culturally express yourself and to preserve different currently marginalized cultures which actually don't have space in the Internet and digital society today.

And ultimately with the toll process of making data available to the public we would like to make technology nor inclusive and democratic.  So this means we have an interest in different data governance models which allow for non‑excludable access to data as a fundamental infrastructure for today's society, and more specifically, we are interested in data as a public good, which means data that is for the Government, for instance, or data that is common, gate that is governed by a group of people who are producing, maintaining and using this kind of data.

I have to say that we do have a slight favor for commons because they truly empower people in the production and development of data, and the use of this data actually.  And even commons have a history in the computerized and the Internet age.  Most notably, for instance the free software movement for protocols or Internet is created on or Wikipedia which is one of the best moan in the world.

At the same time we also see that there is a lot of talk around commons and data commons specific.  AI commons initiative which is a multi‑stakeholder initiative with the Government of France and Canada are aiming at building Wikipedia for Artificial Intelligence.  And the global data commons initiative is under the same roof.  However, currently little is done so far in enlating this into practice so how do we go about create is commons, how do we govern the commons how do we go about building an ecosystem around these commons.  One crucial aspect here is the design of these governance models because we know since the tragedy of the commons and the work of Eleanor Ostrum that Government models and design principles for commons are essential in order to make them successful as a CERT model beyond the state and markets.

So in this workshop, we would like to discuss and reflect on different viewpoints and experiences in the creation of data commons and institutions, governance models and legal frameworks needs to guide their development.  We will now give the word to our six panelists.  The first one is Renata Avila.  She is many different things and one of them the member of the board of creative commons.  Please.

>> RENATA AVILA:  Thank you, Lea.  I am happy to be here in Berlin because Berlin was one of the first placed when we started discussing creative commons a long time ago, and it's an institution that will be like next year 20 years old.  So we have been discussing and dealing with challenges that digital poses to sharing data for a long, long time.  What changes, I think, in the framing and also in the culture in our spaces in civil society and Government is at the beginning we saw content, content creation.  It was different framing.  We used to see it as cultural goods.

It was not exclusively a commodity.  And when digital and specifically data is framed into the commodity space, data is the new oil and all of that, I think that that fundamental change of mind was crucial to disrupt in a bad okay the future we were trying to build about digital commons.  And I think that we need to have a profound cultural shift right now and stop considering data as a commodity, and the proposal here that I want to discuss with you is how can we frame data as common product, and as a common infrastructure to build projects upon, and a common infrastructure either attached to a community, a city or even to a country or even better to a region.  That will really develop the potential it has.

When Lea was talking, I realized rapidly that the same problems of infrastructure that we have today with highways and with hospitals and with systems of sanitation, we will soon have with data if we do not consider data this key infrastructure, this key part of our digital societies and the way that we build our digital societies.

And we have a lot of problems, because currently our privacy rules, our like trade rules and most of the rules that regulate digital are conceived with that frame of data as commodity.  So the first and the most challenging change that we need to adopt is precisely that of data commons.  But it's not only a change in data commons.  It is how can we have, which authority will have a transparent data handling and usage of the data it collects, and which rules will control that?  And who will decide the rules?

I think the best model would be to have data shared according to rules as set by the community, and enforce, and to have an enforceable governance because, of course, we run the risk of ‑‑ let me reframe it.  First, abstaining from taking different frame, will only increase the power of big tech and will leave us with even a deeper divide, a divide we haven't seen before, because it will be basically digital imperialism, the problem that we are facing.

Second, the place where I think that we can most rapidly adopt solutions in this is at the city at the local level and there are examples we can share later.  Third, my proposal of this framing is we cannot adopt a new data frame without, without the benefits of data sharing reverted back to the communities.  I think that we have been exploited for so long, and this extractivism that we saw with resources and knowledge and many another things that should be commons, like our forest, environment, water, so on, it cannot continue with data.

It will block the possibility to fully exercise our human rights and not only economic rights but also cultural rights is if we tied our hands as citizens and we cannot participate in building the infrastructure of the future, building the societies of the future, we have a lot to lose.

>> MODERATOR:  Thank you so much.  One notion, if you have questions, please keep them in your mind.  We will have the discussion rounds later on to discuss them further in smaller groups and at the end we will also have time to discuss.  Next one up is Alex Klepel.  He is the head of strategic partnerships at open innovation team.  He is based in Berlin and he drives the collaboration at the intersection of technology, politics and media to help scaling research and development efforts and currently he has a strong focus on open voice data and technology.

>> ALEX KLEPEL:  Thank you very much, Lea.  I work for Mozilla, and most of us know us as being a building of browsers but we have a broader topic spectrum of Artificial Intelligence, machine learning, and angle we are taking at least the foundation is taking is thinking about frameworks of trustworthy AI and complimentary to that the Mozilla Corporation, the technology builders behind us have two projects focused on voice technology.  Mozilla has a history in open technology, but with open data not so much.  And these two projects on voice technology are basically our kind of pilot step into that area.

And why are we focusing on voice technology and open voice technology, it's not only a convenience factor, but a crucial access point to information to services of the Internet.  And that should be accessible to everyone and not only owned by major corporations.  And we have been basically struggling with the same issues the whole ecosystem is  struggling with.  Speech technology these days is gated through major monopolies that have heavily invested in the technology, and they keep it because it's precious, and a major advantage they have is they can collect speech data through their products.

And these data sets are siloed and basically serve only those companies who collect them.  If you want to innovate in the field, and there is massive barriers.  One is as I explained the technology is bundled in only a few companies and also the data.  So what we are trying to do is a twofold approach, one is we have developed an open source speech recognition engine which is publicly available.  The fully fledged version will come out next year.  You can always tinker and test it already.

And the other part is, and this is how we collaborate with the BIZ and the GIZ is project common voice which is a crowd sourcing initiatives to open up speech data in as many languages as possible with as many accents as possible and generally creating a broad application of the diversity of voices out there.

And right now we have collected about 2400 hours of voice data in 33 languages that sounds a lot, and we are actually the largest publicly available data set but if you think of for quality production speech recognition you need about 10,000 hours per language so there is a lot to do, and one of the interesting projects we are having is actually around project digital Uganda, and for me it's fascinating just to see how.  It's not fascinating but eye opening to see how difficult it is.  Even if you provide the right infrastructure, the platform to collect the data, to get it into the place where it can start incentivizing people to donate their voices, finding the right text resources that are license‑free that can be used otherwise you have license baggage and the technology building on these data sets or the technologists will actually need to have legal departments to deal with that.  So that stifles innovation as well.

So I'm most interested in kind of the mechanisms of how to incentivize people to be open to open data, but also to support the entire value chain, because data is the foundation.  And this is already a massive threshold, but then how do you process it?  Who is able to train the algorithms, and I'm not even talking about the application side of things.  There is a huge gap between the lack of data and then actually creating services and products that are locally relevant and actually sustainable and not only dependent on one player or singular persons, but are being shared by a multitude of stakeholders that have common interest and want to build these infrastructures for the public good.

>> MODERATOR:  All right.  Thank you so much, Alex.  So that is really a fascinating project and let's hear how it's operating on the ground.  Next one up is Audace Niyonkuru.  He is the founder and CEO of digital Uganda, a tech startup that is collecting open voice data together for Musula.

>> AUDACE NIYONKURU:  Thank you, Lea.  I will start by giving some statistics about Internet connectivity and I will dive in why I'm starting with that.  46.4 percent of the world is not connected to the Internet at the morning, and 72% and 72 percent of the population in Africa is not connected to the Internet.  I think we can all agree it's not a problem of lack of infrastructure.  It's not just a problem of lack of infrastructure, also a problem of lack of content especially in local languages.

And to solve that problem, I believe giving access to people in local languages is one key solution.  The problem becomes that sort of data is held by big corporations that will not share it with others, and that it's innovators are not able to access that information.

And it's in those regards that we looked at the problem and had to create other commons such as open voice data to solve the problem, but most importantly how do we enable local innovators to take use of the technology so they can actually be able to produce solutions on the ground.  We partners with Mozilla as Alex said in GIZ and we are currently building open voice data sets.  With open voice technology, you could think about the enormous applications especially since African cultures tend to be oral cultures and oral tend to be the preferred way of interaction.  So if you could think, let's say, somebody in rural Rwanda trying to access the justice system because they have been facing injustice and they cannot do that because lawyers are scarce and the free legal clinic is kilometers away, but they could just be able to call in a number and get access to that information because it's in a database somewhere, and the barrier becomes that they don't have Smart Phone to access that information, but with voice technology, you could think about many ways in which they could just dial in a number and get that information and get access to justice through voice technology.  That's why I believe in the application of voice technology, especially in underserved communities such as the ones I come from.  Thank you.

>> MODERATOR:  Thank you, Audace.  Now, we have another speaker from the African continent.  Baratang Miya has 17 years of experience as a technology entrepreneur.  She is a founder and engineer of software academy for women and girls.

>> BARATANG MIYA:  Thank you, Lea.  I am going to echo what he said about Internet connectivity, and say that I think for us as Africans it's always important to realize that most Africans are not connected on the Internet.  And ITU has just reported that in the past two years, most of the people who are not using Internet are now women.  In the past 13% now it's fallen to 11 percent of information are now lessly using the Internet.

And there is a lot of data bias.  And as women we do not have access to proper education.  The literacy level is very low so the people creating content for Africa are still males.  And we still are mainly the consumers of the Internet information.  We are recommending that public data commons must be made available to us, and especially people who are not using data to build AI only, but to understand and to reshape the future of the AI.

And that information should be made available and freely used, reused and redistributed by anyone with no existing local, national, international restrictions on access or usage.  If you want data from Government, you have to go through many, many channels to just access data.

We are still struggling to find, especially in South Africa to find out what is the number of women who are not being subsidized by Government or what's the number of women who are not subsidized in terms of when they want to start their businesses.  That's just a simple concept you could just go to the Internet and find information, but if you want proper reliable information, you have to go through loads of channels from Government and sometimes you don't even have it.  So that for us, it's still a major barrier.

So we think the key to collect high quality data and use it effectively is by having more data commons, and having capacity building for us to be able to use it, and one path is to set the standards that will format the data and enable high quality data, because at the moment we don't trust it to be easily shared and understood by a normal public person, and not to be taken for granted that today it's only, what's there is only acceptable according to standards of other people.

I don't think there is any proper African standards for Internet usage or data usage.  As much as we know that data can improve services, we are also aware that at the moment there is this overselling of data being it the new oil while we under estimate the impact it will have on it especially with the algorithms built on the perspective of white males mainly at the moment because they are the ones who work mainly in the engineering sector and they are the ones building the future of AI at the moment.  So we need more perspective and if we are going to say the data is the new oil, we need to realize what oil did to the world.

When oil has too much power, it creates too much dangers.  So we don't want AI to do the same to the world.

>> MODERATOR:  Thank you.  So there is already some controversial discussion here, if data is the new oil or shouldn't be a commodity at all.  Let's discuss this later on in the session.  Then we have as a next speaker Irmgarda Kasinskaite.  She is a program specialist of Knowledge Society Division in the communication and information sector at UNESCO in Paris.  Irmgarda, please.

>> IRMGARDA KASINSKAITE:  Thank you for this invitation.  I just would like to ask how many of you know what we speak about 2019?  2019 is a year of indigenous languages.  I don't know how many of you know.  There are some hands, very good.  This is where basically my intervention will be focused exactly on underresource and represented languages because what we tend to spook a lot about technology solutions, data processing and Artificial Intelligence and many other aspects, but we very often forget what we speak about a small number of languages around the world.  We speak ash 5100, maybe sometimes we can refer to 500 languages, but in reality, linguists count up to 7,000 and if we take into consideration dialects, we may count up to 10,000 languages and dialects around the world.  And it means when we speak about procedures we have to address data and open data related to linguistic and cultural diversity.

We can clearly see what many of those languages have spoken especially those which are not constitutionally recognized.  We very often could be community languages only spoken by very small community.  Or those that we have already heard by some speakers before me, but we see the oral cultures which may require as well specific arrangements in terms of what kind of solutions we provide, it's clearly what we don't have tools which would be available for those which are well represented dominant languages around the world and resources available would be much more broadly available and, of course, opportunities provided.

So what it brings us to one of the key questions is how, what do we do with those languages which are not dominant, which are under represented?  What different international models could be taken into consideration because as awe have heard, it was a huge investment to prepare full data set and we probably only focused on those which are economically financially interesting for the companies and as well organisations which involved in this work.

UNESCO will be changing to world address of languages but we clearly can say from today from data that we have around 40 percent of linguistic diversity around the world is in danger.  Some of those languages are vulnerable.  We are present on line, we have all of the systems which could be used for providing access to information in those languages and resources, but when we clearly see as well, what majority of those languages which are in danger are spoken by fewer than 10,000 speakers.  So it means economically it's really not interesting to document those languages unless we are really interested in the cultural linguistic diversity heritage, interested in traditional knowledge, passing his for classical to the next generation or in some cases for instance discovering new traditional practices, traditional practices which could be converted into profitable solutions in any industry.

So this means what basically our technological solutions very often have to be based exactly on audio and video files and on different solutions provided and not necessarily on one what we would imagine would be regional systems.  So another important issue is what I want to bring to your attention, as we are involved in indigenous languages and it's only one month left, but we could see clearly around the world we had more than 900 international events taking place at different levels, whether it be institutional, higher education institution whether it would be governmental organisations, private organisations and many other ones related to capacity building workshops and presentation of different solutions.

But one thing that comes out clearly, what means public and open data for us doesn't mean necessarily for communities the same way.  In some communities we clearly see what it's affects as well.  We weigh communities we see the understanding of external world and that is a need of clear and direct communication with communities where really the data and communication we had, first communes collecting this data is fully understood by the community.  Because very often we could clearly see what sharing something means as well giving away.

And that means that we have to explain to communities to language speakers we work with what something will be returned, and I would echo the few speakers are before me who said we have to be generous as well to communities.  It's not only for industry, it's not only for public goods but something which it could concretely mean for the communities themselves.  So, therefore, we have procedures related to privacy, to intellectual property issues, and we have to be, of course, clearly discussed with those communities.

What are the sustainable governance models for data commons?  I would say, of course, multi‑stakeholder approach is one of the commonly used approaches, but I would as well come back with dates, especially one which is provided by language communities should be owned by communities.  And we could decide whether it is open, free, or it's preparatory or other things.

We as well have to clearly define what we want of those structures, infrastructures, what is the purpose, what is objective, what are expected outcomes because sometimes data collection for the sake of collection data, it does not lead much.  We have, of course, to convince what this data has a value, not only just for community, but as well could have a value for humanity.

So here it brings a question of how data, what we collected is accurate and scientifically valid.  Because that brings me to the next issue what if we do not take into consideration let's say scientific aspects while we collect data, it is not easy for policy and decision makers to integrate with data in decision‑making processes.  And, for instance, first intergovernmental organisation like UNESCO we frequently discuss with our department, with our institute for statistics, which is an official budget to collect data, statistical information for UN agencies, and it has access to many national Department of Statistics where alternative ways of collecting data not necessarily in a scientifically valid.  So there is a need as well to have more dialogue with different Department of Statistics, different levels, but it would be clearly understood what we mean by open data, where it is collected, whether it is scientifically valid and accurate, and in order to avoid situations where data by default is rejected.

And that's very important point because if we want to integrate this, and then see data commons would be an instrument for innovation policy, this aspect is important for formulation of new policies and solutions.  So that would be it for the time being.  Thank you.

>> MODERATOR:  Thank you so much, so raising a lot of interesting questions, for instance, how to define the purpose of the data use and that is certainly is something that needs to be done collectively when we are talking about commons.  So as a last speaker we have K.S. Park.  He is coming from the Korea university, and he is also the Director of open net Korea, a digital rights organisation, and he has worked on key open data movements in the country such as court judgment databases, the right to be forgotten, and the use of pseudoanonymized data and other things,park.

>> K.S.PARK:  I want to talk about something that people in this room probably don't want to talk about being too nice.  Basically data protection law.

Databases of court decisions are the treasure trove of information about a society and it's norms and practices but are most often suppressed into silos reserved only for judges in different countries.  Most often for the reasons of data protection, for the people in involved in the dispute or for the reasons of personality rights of the people involved.  Korea, in Korea less than 1 percent of the Supreme Court decisions are published.  Less than 0.5 percent of lower court decisions are made available publicly, all for reasons of data protection.  So what arguments can we make to unleash the communal power of such databases to make the society more just in this distribution, make the economy more efficient in resource allocation.  I want to propose a certain idea, the idea of data socialism, which is not very far from what people have said so far.  Renata talked about data as a infrastructure.  If data is an infrastructure, then it should be socially owned and socially controlled.  Let me give you another example why we need to take head on the challenges of making reconciliation with the data protection law.

We talked about how AI may not function fairly or ethically.  Now, Amazon shut down its hiring system because it did not fairly select female candidates.  How do we solve the problem?  Some of the facial recognition technologies are being shut down because they don't recognize African‑American faces correctly.  One of the reasons that the AI functions are limited may be because there is not enough data about women who have made successful careers in the Amazon hiring system, and maybe because as one of the panelists said, people making the system have not collected enough data about African‑American communities, but what are we saying here?  

Do we need to go out and collect more data from them?  That means less privacy for them.  That means less data protection for them.  In what ways can we come out of this self‑defeating dilemma?  So we come back to the idea of data socialism.  In a way, data, personal data is born social.  I'm a law Professor, people call me Professor Park, but I cannot be a Professor alone.  I am a Professor only to the extend that there are students willing to sit and listen to my lecture.  My identity, my identity or my job as a Professor is something I cannot own or control.  It's not something that I can prohibit other people from sharing just because it's about me because that facet of me was not born entirely from me, but it was created socially.

Of course, I mean, socialism has nothing to do with whether the property, whether the commodities come from social sources or not, but just in advancing the argument why some of the personal data if not all should be considered public resources instead of libertarian commodities to advance the argument, I'm making that observation.

This idea is not necessarily in opposition to data protection laws.  Data protection laws, the actual mechanics of it is built on the metaphor to data ownership that you own your data but, I mean, that statement is cyclical, right?

If you, if it's yours, then that means you own it already.  What's really meant by the statement is that you own data about yourself, but then again, data about yourself as I said before, has social origins.  My job, I mean, even in the structuralist philosophy, I mean, what is a tomato?  Is there substance to tomatoness.  We know there is no such thing.

Tomato is tomato only because it doesn't have the features of other fruits or vegetables.

It's all relative.  So the identity, personal identities come from relations with other people, relations in the communities, another reason to think about communal ownership of some of personal data.  Now, the proponents of data personal laws themselves understand that data ownership is supposed to be only a metaphor, not the actual truth by which privacy is to be protected.

So there are exceptions carved out of the data protection laws.  For instance, Singapore, India, Canada, Australia, they all have exceptions to publicly available data just as Germany used to have until 2017.  So I will stop there for now.  And I will develop more in questions and answers if you have any.

>> MODERATOR:  Thank you, K.S.  So now it's actually time for Working Groups.  I already said this in the beginning, it's a real workshop, so you have also to do something, dear guests, namely forming groups now maybe in the corners of the rooms and you have 25, 30 minutes with the speakers to discuss the questions that you have which came to your mind during their statements and they also prepared some guiding questions for the discussion.

  I would put forward maybe to go about it in this way, that Renata and Baratang can team up and discussion institutions and Government structures and inclusiveness to efforts of both data commons, and then we will have a group on community efforts to but data sets, providing structures and building ecosystems around a specific data set with Alex and Audace and the third group, Irmgarda prescribe and K.S. to discuss privacy, data socialism and the right not to be represented in the data.  Please spread around the room.  If there is a fourth group which would like to form, please feel free to do so.  The only important point is that you have someone who can at the end be a Rapporteur and wrap up what you have been discussing in the next 25 minutes.  Thank you.

>> MODERATOR:  So hello, everybody, I hope you are having inspiring discussions but I need to ask you to come back to the bigger Forum now and share your insights.

  We have to wrap up here, so please come back to the Forum and let's see what you have been discussing.

  So time is ticking, please come back to the bigger ground and stop your conversations at this point so we can all learn what you have been discussing so fiercely.

>> Our group in terms of data collection, we looked at incentive mechanisms from digital Muganda shared their approach which is to do the data collection, basically leveraging on existing voluntary community day that happens I think once a Microsoft in Rwanda, and the idea of sort of south K corporation and they do that together with universities and it cools to collect data sets.  They are using a data platform provided by Mozilla which is offering a platform for crowd sourcing the data.  There was an interesting example for the examine We omplet map which builds open streets map infrastructure for collecting information about accessibility in places, also with schools (Wheel map), schools, students, voluntary contributions, we also talked about biases in data collection.  With common voice they find there is a lot of male voices so even though in theory crowd source data could be balanced in practice you feel that people who crowd source, their biases and presentations there is a challenge there.  On the demand side of things, we talked about let's see, from the side, for example, that they are working on a larger ecosystem around voice datas, universities, entrepreneurs, media companies, public institutions that are interested in using the voice data sets and voice recognition plod wills.  We talked about the preserving the languages.

There was one gentleman who talked about a language ‑‑ I didn't get where you were from, from Belarus, and that is a language which is being recorded by the audio samples and he has asked whether Mozilla would be able to incorporate that into the platform, and it would be possible, but the question comes down to the license.  In early it's of the challenges on the demand side it was about expectation management because building a voice data set in this case takes a lot of time, and another issue was raised that open access to such data sets doesn't necessarily mean equal benefit or usability of data sets because you need people with the skills to make use of the data.

>> MODERATOR:  Thank you so much, Daniel.  This is quite comprehensive, I would say, being part of the discussion group as well.  Let's go to, well, K.S. Park or who would like to go first as a group?  Philip, please, thank you.

>> We basically started with the idea of data socialism which maybe obviously was quite controversial, but we talked a lot about communities and the rights of communities in terms of protecting and also benefiting from the data that is collected about them, so we, for example,, talked about how scientists, researchers, et cetera, interact with indigenous communities and perhaps not collect data, but at some point extract data and then the idea came up at what point is the raw data part of the community and at what point does the researcher or any other institution make a data set out of the raw data and present it as something new.  And then the question is, okay, who owns the raw data, okay, then you maybe have a more direct link to the community, but once it's a data set it becomes more complicated.

So there is a disconnection between the data and the subject in the research, and how in the end going to use it.  And the second part of the discussion focused more on the data socialism part, and how personal your data can be because you are always embedded in relations to, well, not only other people, but also in a social environment in a community.

So I think if I got the idea right, what K.S. park tried to explain was try the data ownership, and that this would lead to a nuanced approach to data protection, but from the discussion, I think, what came out is everybody came up with different kinds of examples where a blanket approach to data protection would not work.

And if you are advocating for a nuanced approach, then it is basically, it goes down to every individual case, which can become problematic.

>> MODERATOR:  Thank you, Philip.  And I think the last one up is Unta giving a wrapup of the discussion on, what did we have here?  Institutions and governance structures.  Is this right?

>> Yes, that's true.  So maybe to sort of underline, again, why we are here, I found one statement very remarkable because we tend to forget about it was the statement that, I forgot your name that you said in Africa you need access.  We were talking very much about data protection here, but in Africa, be it in African countries be it through silos date is often protected.  Maybe not in the sense that we think of data protection here, but there is really the need for data, the need for access.  So it's not a luxury.  There is a need and you also highlighted a key question there that just needs to be sort of a guiding question, and that is does the data we are producing and freeing, whatever, does that serve the issues of the communities, and you need consent from the communities that I would put sort of as an overarching understanding in our group on top of this.

We were discussing about what would institutions for data look like, and how can we empower people with data?  We claim that, but how does it translate actually, and renata was saying how can we democratize tooled to enable people to do something with data.  That connects to a topic that we spent a little bit more time on on how to create usage of the data, be it by fostering the tools or be it also by creating communities that are interested in the data, and have the capabilities of using it and have use cases, and that could be, of course, commercial actors, but it, of course, could be other actors as well.

There was the idea to ‑‑ so, one step back, because we sort of understood that very often we are demanding the openness of data and we are not yet, we are able to show so much that this innovation hypothesis that we have, that that is holding true, because we don't see the widespread innovation yet.  That was sort of a diagnosis.

And we believe that this is, that this is possible, but maybe it hinges on also supporting side systems for this data such as journalism, such as science, for instance, people who can use it because not everybody will become a data scientist.  And in line with that, we were trying to think about the encouragement of innovation and the use of data to, for instance, address the question of what other drivers for use cases could be, and there was the sort of the take away to think about problems, global problems where data that we use, that we create can actually contradict or contribute to solving the problem.  One of the problems that was named was climate crisis.  So that would be a concrete problem space where we could and should argue for collaborative data generation that is actually of use because it's about showing that there are use cases for these, for the public data or the data commons depending which way you would want to go there.

What was left open a little bit is the question that we should include the, include the challenges of how to maintain clean, add and continue with data, data sources.  It's not a one‑time issue, but you have to sort of encourage future processes around the data governance.  And there the idea was sort of held that we should more explore the partnerships between the private sector and the public sector in terms ever maintenance.

Another approach I will close ‑‑ no, another approach, I will close with that, was concrete proposal for funding because that's always key.  The rule was proposed that for community or non‑commercial use the data could be free of charge.  Fees could be waived, but that there would be licenses for companies to use community generated data, that that was a tangible suggestion that came out of our group.

>> MODERATOR:  Thank you so much, Unta.

>> K.S. PARK: I also want to make sure that our group also ended with a little bit of privacy by supplementing the report.  I also took the notes.  So we talked about solutions as well.  So we talked about data extraction from indigenous communities, capacity building is important and also there are movements to share data between more businesses because only big businesses have enough data to do, to make good applications.  And also as to the nuanced approach as to one size fits all approach, distinction between private data and personal data were talked about where, where privacy is considered more like a boundary management for each person, where each person controls the boundaries over which their personal data are not allowed to leave or cross.  So the last comment was that along the line, the kind of data you submit to telecoms to get phone service, to complete the transaction should be protected strictly as private data.

>> MODERATOR:  Okay.  Thank you for this add on.  So time is already up, and I'm not trying to do a wrapup of the wrapups.  So let me just say thank you for sticking with us throughout this workshop.  It was, I think, an interesting discussion, and certainly a discussion that we have to take further.  And for now, I just wish you a nice evening.  Thank you.

(Applause).

Contact Information

United Nations
Secretariat of the Internet Governance Forum (IGF)

Villa Le Bocage
Palais des Nations,
CH-1211 Geneva 10
Switzerland

igf [at] un [dot] org
+41 (0) 229 173 411