IGF 2021 WS #198 The Challenges of Online Harms: Can AI moderate Hate Speech?

Wednesday, 8th December, 2021 (08:30 UTC) - Wednesday, 8th December, 2021 (10:00 UTC)
Conference Room 1+2

Organizer 1: Raashi Saxena, The Sentinel Project
Organizer 2: Drew Boyd, The Sentinel Project For Genocide Prevention
Organizer 3: Pedro Peres, Laboratory of Public Policy and Internet (LAPIN)
Organizer 4: Bertie Vidgen, The Alan Turing Institute
Organizer 5: Safra Anver, Safra Anver
Organizer 6: Zeerak Waseem, University of Sheffield

Speaker 1: Lindsay Blackwell, Private Sector, Western European and Others Group (WEOG)
Speaker 2: Lucien M. CASTEX, Technical Community, Western European and Others Group (WEOG)
Speaker 3: Kettemann Matthias, Technical Community, Western European and Others Group (WEOG)
Speaker 4: Neema Iyer, Private Sector, African Group
Speaker 5: Raashi Saxena, Civil Society, Asia-Pacific Group
Speaker 6: Rotem Medzini, Civil Society, Western European and Others Group (WEOG)


Bertie Vidgen, Technical Community, Western European and Others Group (WEOG)

Online Moderator

Safra Anver, Private Sector, Asia-Pacific Group


Drew Boyd, Civil Society, Western European and Others Group (WEOG)


Round Table - U-shape - 90 Min

Policy Question(s)

Digital policy and human rights frameworks: What is the relationship between digital policy and development and the established international frameworks for civil and political rights as set out in the Universal Declaration on Human Rights and the International Covenant on Civil and Political Rights and further interpretation of these in the online context provided by various resolutions of the Human Rights Council? How do policy makers and other stakeholders effectively connect these global instruments and interpretations to national contexts? What is the role of different local, national, regional and international stakeholders in achieving digital inclusion that meets the requirements of users in all communities?
Promoting equitable development and preventing harm: How can we make use of digital technologies to promote more equitable and peaceful societies that are inclusive, resilient and sustainable? How can we make sure that digital technologies are not developed and used for harmful purposes? What values and norms should guide the development and use of technologies to enable this?


Hate speech is a growing concern online. It can inflict harm on targeted individuals and stir up social conflict. However, it has proven difficult to stop its spread and mitigate its harmful effects. In many cases, there is a real lack of agreement about what hate is and at what point it becomes illegal -- problems compounded by differences across different countries, cultures and communities. Further, there is little consensus on how protecting people from hate should be balanced with protecting freedom of expression.

Digital technologies have brought a myriad of benefits for society, transforming how people connect, communicate and interact with each other. However, they have also enabled harmful and abusive behaviours to reach large audiences and for their negative effects to be amplified, including interpersonal aggression, bullying and hate speech. Already marginalised and vulnerable communities are often disproportionately at risk of receiving such abuse, compounding other social inequalities and injustices. This has created a huge risk of harm, exacerbating social tensions and contributing to the division and breakdown of social bonds. Global tragedies demonstrate the potential for online hate to spill over into real-world violence.

In this session, we address the risk of harm that emerges from abusive online interactions and scrutinise the need for human rights to be more actively integrated into how online spaces are governed, moderated and managed. This session has direct relevance at a time when thought leaders, politicians, regulators and policymakers are struggling with how to understand, monitor, and address the toxic effects of abusive online content. We adopt a multi-stakeholder approach, reflecting the need for social, political and computational voices to be heard to develop feasible and effective solutions.


5. Gender Equality
9. Industry, Innovation and Infrastructure
10. Reduced Inequalities
16. Peace, Justice and Strong Institutions
17. Partnerships for the Goals

Targets: Through our on ground work in armed conflict zones such as Myanmar, Democratic Republic of the Congo (DRC), South Sudan, and Sri Lanka, we've come to realise that hate speech unequally impacts different groups of people. It's negative effects fall disproportionately for women and other gender-based groups in particular. They are frequent targets of hate speech, especially when they are also members of ethnic, religious, or other minority communities. This session aligns with the theme selected as we are committed towards empowering and safeguarding the rights of women, girls, and minority groups. We strive to shift this patriarchal narrative that has compounding effects. Testing the different innovative tools (such as Hatebase) and technical capabilities at the IGF will enrich the conversation on how emerging tech can be the key entry point towards protecting human rights and empowering these communities as change agents. We also understand that tackling hate speech should be built on establishing and strengthening core partnerships between different stakeholder groups across the world. More details can be found here : https://hatebase.org/


The impact of hate speech on fragile states has risen exponentially in recent years resulting from misinformation that spreads and creates an environment for hate speech to spread rapidly across social media. There are concerns that this has contributed on a large scale to persecution, armed conflict, and genocide in various developing countries. It is imperative for us to use this global forum to engage with relevant experts across different regional and cultural contexts, and with expertise from a range of fields.

A key challenge with online hate is finding and classifying it -- the sheer volume of hate speech circulating online exceeds the capabilities of human moderators, resulting in the need for increasingly effective automation. The pervasiveness of online hate speech also presents an opportunity since these large volumes of data could be used as indicators of spiralling instability in certain contexts, offering the possibility of early alerts and intervention to stem real-world violence.

Artificial Intelligence (AI) is now the primary method that tech companies use to find, categorize and remove online abuse at scale. However, in practice AI systems are beset with serious methodological, technical, and ethical challenges, such as (1) balancing freedom of speech with protecting users from harm, (2) protecting users’ privacy from the platform deploying such technologies, (3) explaining the rationales for their decisions that are rendered invisible due to the opaqueness of many AI algorithms, and (4) mitigating the harms stemming from the social biases they encode

In this session, we bring together human rights experts with computer scientists who research and develop AI-based hate detection systems, in an effort to formulate a rights-respecting approach to tackling hate. Our hope is that bridging the gap between these communities will help to drive new initiatives and outlooks, ultimately leading to better and more responsible ways of tackling online abuse.

Expected Outcomes

The main outcome is for participants to leave with a clear understanding of the complexities of online hate, the difficulties of defining, finding and challenging it, and the limitations (but also potential) of AI to ‘solve’ this problem. We will focus particularly on cultural, contextual, and individual differences in perceptions and understandings of online hate. Relatedly, participants will understand the complex ethical and social issues involved in tackling online hate, particularly the need to protect freedom of expression, the risk of privacy-invasion from large-scale data mining to monitor online hate, and the potential for new forms of bias and unfairness to emerge through online hate moderation. Participants will understand the opportunities in deploying a human rights based approach to tackling online hate.

The session will create a direct conversation between 4 key stakeholders (Private, Civil, Technical and Government) who all work to tackle online abuse to establish a shared understanding of challenges and solutions, but are rarely brought into contact. We hope that this session will motivate new discussions in the future and collaborations, encouraging efforts to ‘bridge the gap’ between human rights and data science researchers working in this space. In particular, we anticipate articulation of a global human rights based critique of data science research practices in this domain, helping to formulate constructive ways to better shape the use of AI to tackle online harms.

We will ensure that these outcomes reach back to the wider community through: 1)A summary report of the discussion that would be published in The Alan Turing Institute Blog and The Sentinel Project Blog 2)A follow up consultation workshop with attendees who can contribute as linguists towards Hatebase’s Citizen Linguist Lab 3)Dissemination of the blogs through various social media channels associated with the wider community 4)A one-hour discussion with stakeholders in the computer science community at the following year’s Workshop on Online Abuse and Harms (2022), hosted at the ACL conference

Discussion Facilitation

The session will be divided into two parts, each one exploring key issues of online hate 1) the challenge of defining, categorising and understanding online hate, 2) the opportunities and challenges of using AI to detect online hate, and 3) the ethical challenges presented in different interventions to tackle its harmful effects) Each part will be led by a moderator and will include a group of selected expert speakers. The speakers will start by discussing the questions posed by the moderator, followed by an open Q&A session before moving to the next part. This format will, on one hand, keep the speakers and participants focused on each one of the issues that we aim to address in each section and, on the other hand, it will keep the participants engaged, both on-site and online by providing opportunities for open discussion throughout the whole workshop. The interaction provided by the online platform will further enrich the discussion and the remote moderator will be able to share a summary of the chat interventions so that on-site participants -if not connected, are able to follow and engage with online participants. Other tools may be used at the beginning of each session to encourage participation and to fuel the debate. 2) Given the current circumstances, we hope to organise the session in a hybrid format. However, if the situation does not permit an on-site gathering, we will opt for a remote hub option. Our on-site moderator will ensure direct coordination with the on-site to enrich the discussion and collect feedback from those who will login remotely.

Part 1: 45 min Categorising, understanding and regulating hate speech using AI

35 Min Roundtable (2 mins interventions)

Question 1: What are the key dimensions that social media firms should report on in order to ensure clearer communication of policies such as content guidelines and enforcement to users? Question 2: What should we do with online hate? Is the answer just to ban people? Question 3: What role, if any, does AI have to play in tackling online hate?

Moderator, Dr. Bertie Vidgen, The Alan Turing Institute & Rewire Online

Speakers 1)Lindsay Blackwell, Twitter (Private) 2)Matthias Kettemann, Leibniz Institute for Media Research/Humboldt Institute for Internet and Society (Technical) 3)Lucien Castex, AFNIC (Inter-government)

10 minutes Q&A

Part 2: 40 mins Tackling conflicts and ethical challenges in Global South and Middle East

30 minute roundtable

Question 1: Who should be responsible for the development and enforcement of policies to restrict hate speech and incitement to violence online, and how should these be applied? Question 2: How do we protect freedom of speech whilst still protecting from hate? Question 3 : How can we crowdsource hate speech lexicons for appropriate linguistic, cultural, and contextual knowledge?

Moderator : Safra Anver ,WatchDog & Dr. Bertie Vidgen, The Alan Turing Institute & Rewire Online

Speakers: 1)Raashi Saxena , Hatebase for The Sentinel Project (Youth in Civil Society) 2)Neema Iyer, Pollicy (Private) 3)Dr. Rotem Medzini ,Israel Democracy Institute (Civil Society)

10 mins Q&A

Closing remarks: 5 minutes by On-site moderator

Online Participation

Usage of IGF Official Tool. Additional Tools proposed: The co-organisers will actively promote the session on their respective social media handles, encouraging remote participation and consultation on the issues raised during the discussion. Remote participants will be able to pose questions to subject matter experts and other participants during the session through Slido. We will also use polls, shared documents and activity based tools such as Miro/Mural board to enhance participation. Events would be created on LinkedIn and Facebook for maximum outreach. Digital promotional materials will be published on official online platforms of all co-organisers (eg. Blogs, Medium articles).