IGF 2022 WS #425 Data divide worldwide: addressing data scarcity and bias

Organizer 1: Ana Laura Martinez, NIC.br
Organizer 2: Alexandre Barbosa, NIC.br
Organizer 3: Fabio Senne, NIC.br/Cetic.br

Speaker 1: Magpantay Esperanza, Intergovernmental Organization, Intergovernmental Organization
Speaker 2: Fabio Senne, Civil Society, Latin American and Caribbean Group (GRULAC)
Speaker 3: Lourdes Montenegro, Civil Society, Western European and Others Group (WEOG)
Speaker 4: Carolina Rossini, Civil Society, Latin American and Caribbean Group (GRULAC)
Speaker 5: Alison Gillwald, Civil Society, African Group

Moderator

Alexandre Barbosa, Civil Society, Latin American and Caribbean Group (GRULAC)

Online Moderator

Ana Laura Martinez, Civil Society, Latin American and Caribbean Group (GRULAC)

Rapporteur

Ana Laura Martinez, Civil Society, Latin American and Caribbean Group (GRULAC)

Format

Panel - Auditorium - 90 Min

Policy Question(s)

• What concerted efforts should policymakers make in order to address the “data divide”?
• What is the role of the civil society, private sector, and international organizations in addressing this issue?
• What are the keys to scalable and sustainable international programs to address data scarcity in the Global South and for marginalized populations worldwide?
• What are the traits of data literacy programs and strategies that can be implemented to enable individuals and organization can further participate in the new scenario?

Connection with previous Messages: The panel contributes and builds on both previous and current key IGF messages, since it focuses on how data is a key resource of the digital age, yet it remains a sensitive and unresolved topic. The panel further advances discussions on the challenges that need to be faced for achieving data widespread use, and for harnessing the promise of data for development and research. Specifically, the panel will contribute to relaunching the discussion on what is needed for data to have a positive and inclusive impact on the economy and social life, tacking issues of around governance, integrity, and privacy, while also addressing data scarcity, biased representation of populations in data sets, and integration between different sources of data, enriching the discussion with novel concepts and perspectives.

SDGs

1. No Poverty
2. Zero Hunger
3. Good Health and Well-Being
4. Quality Education
4.7
5. Gender Equality
6. Clean Water and Sanitation
7. Affordable and Clean Energy
8. Decent Work and Economic Growth
9. Industry, Innovation and Infrastructure
10. Reduced Inequalities
11. Sustainable Cities and Communities
12. Responsible Production and Consumption
13. Climate Action
14. Life Below Water
15. Life on Land
16. Peace, Justice and Strong Institutions
17. Partnerships for the Goals


Targets: The theme of this proposal potentially affects all SDG transversally, given that data are an integral part of the policy cycle. Therefore, data availability, or the lack thereof, has a direct impact on all different intervention areas comprised by the 2030 Sustainable Development Agenda. From the policy angle, lacking data, or having biased data –representing some social groups while not others, or partially representing some communities, which leads to building wrong pictures about them – impacts the very chances of identifying any social or policy issue on which to intervene, and makes tracking progress of any policy measure more difficult. This applies to any SGD.
Additionally, digital inclusion is recognized to affect most SDG. Even though it could be argued that the most directly related ones are 4, 5, 9 and 16, we have selected them all since digital exclusion potentiates preexisting inequalities and affects all dimensions of social development.

Description:

With 2.9 billion people still unconnected to the Internet – those in least developed countries, rural communities and vulnerable groups, in general, being the most affected –, multiple challenges can be identified: not only do these people lack an access that has become a prerequisite for participating in the contemporary society, education and economy, but also they are underrepresented in the datasets used to support decision-making at different levels, as well as for machine learning, thus increasing algorithm bias. A striking example of this is the lack of datasets that represent people with disabilities, limiting the progress in developing IA solutions for these populations, or at the very least, not amplifying inequalities via biased algorithms. On top of this, another layer of data scarcity is that of statistical data. According to OECD, no data exists for approximately two thirds of SDG indicators. The lack of timely and relevant data on ICT access and usage mostly affects countries in the Global South and the most disadvantaged populations within them, making it more difficult to accurately portrait their situation or to monitor policies affecting these populations. These multiple layers of inequalities can be conceived of as data divide.
When communities are underrepresented in the data, decisions made based on this data may both overlook their unique needs and/or lead to skewed conclusions. Nowadays, both structured data (mainly statistical data) and non-structured data (mainly user-generated data, or big data) are recognized to be complementary and necessary for policymaking and for tracking the SDG. However, the timely and meaningful access to both sources of data involves specific challenges. For big data, unequal access to broadband service, variations in access to different types of devices, disparities in digital literacy can influence who is included in the data and who is not. Moreover, information produced in a particular area of a given country can serve as an indirect indicator of the extent to which that community can capture the benefits of the data revolution. In the case of structured or statistical data, challenges range from budgetary to technical capacity and political will.
For strengthening a measurement ecosystem and the meaningful integration of different sources of data into policymaking, on the one hand, the full scope of the issue needs further conceptualization, research and understanding and, on the other hand, multistakeholder agreements for data access, and data governance mechanisms still need to be reached. This workshop will address these issues, explore novel notions such as 'datashpere', 'data divides' and other concepts through which an understanding of these new phenomena can be achieved. Reflections on their social impact and the way forward to bridge these divides will be shared from the perspectives of stakeholders from Africa, Europe, Latin America and North America.

Expected Outcomes

A joint publication about the issue is an expected outcome of the session, leveraging the input of both the panelists and the participating audience.

Hybrid Format: In order to ensure the best possible experience for online and onsite participants, the role of the onsite and online moderators will be key. They will be tuned to information from each other about each of the audiences.
The onsite moderator will address both the onsite and online audience in his initial and final greetings. Whenever he throws questions and encourages the audience to participate, he will explicitly address and mention both audiences.
The online moderator, on the other hand, will identify and communicate the onsite moderator about:
1. Number and regions present in the online audience
2. Comments, reflections and questions from the online audience
3. Remarkable comments or highlights present in the social media (via hashtags)
Therefore, there will be feedback and positive loops between both audiences and panels. We plan to use Twitter as a complementary online tool.

At least two speakers and the moderator will be onsite. Therefore, point 2 does not apply.

Online Participation



Usage of IGF Official Tool.