IGF 2021 WS #198 The Challenges of Online Harms: Can AI moderate Hate Speech?

Wednesday, 8th December, 2021 (08:30 UTC) - Wednesday, 8th December, 2021 (10:00 UTC)
Conference Room 1+2

Organizer 1: Raashi Saxena, The Sentinel Project
Organizer 2: Drew Boyd, The Sentinel Project For Genocide Prevention
Organizer 3: Bertie Vidgen, The Alan Turing Institute
Organizer 4: Safra Anver, Safra Anver
Organizer 5: Zeerak Waseem, University of Sheffield

Speaker 1: Giovanni De Gregorio, Civil Society, Western European and Others Group (WEOG)
Speaker 2: Lucien M. CASTEX, Technical Community, Western European and Others Group (WEOG)
Speaker 3: Vincent Hofmann, Civil Society, Western European and Others Group (WEOG)
Speaker 4: Neema Iyer, Private Sector, African Group
Speaker 5: Raashi Saxena, Civil Society, Asia-Pacific Group
Speaker 6: Rotem Medzini, Civil Society, Western European and Others Group (WEOG)


Bertie Vidgen, Technical Community, Western European and Others Group (WEOG)

Online Moderator

Safra Anver, Private Sector, Asia-Pacific Group


Shradha Pandey, Civil Society, Asia-Pacific Group


Round Table - U-shape - 90 Min

Policy Question(s)

Digital policy and human rights frameworks: What is the relationship between digital policy and development and the established international frameworks for civil and political rights as set out in the Universal Declaration on Human Rights and the International Covenant on Civil and Political Rights and further interpretation of these in the online context provided by various resolutions of the Human Rights Council? How do policy makers and other stakeholders effectively connect these global instruments and interpretations to national contexts? What is the role of different local, national, regional and international stakeholders in achieving digital inclusion that meets the requirements of users in all communities?
Promoting equitable development and preventing harm: How can we make use of digital technologies to promote more equitable and peaceful societies that are inclusive, resilient and sustainable? How can we make sure that digital technologies are not developed and used for harmful purposes? What values and norms should guide the development and use of technologies to enable this?

Hate speech is a growing concern online. It can inflict harm on targeted individuals and stir up social conflict. However, it has proven difficult to stop its spread and mitigate its harmful effects. In many cases, there is a real lack of agreement about what hate is and at what point it becomes illegal -- problems compounded by differences across different countries, cultures and communities. Further, there is little consensus on how protecting people from hate should be balanced with protecting freedom of expression.

Digital technologies have brought a myriad of benefits for society, transforming how people connect, communicate and interact with each other. However, they have also enabled harmful and abusive behaviours to reach large audiences and for their negative effects to be amplified, including interpersonal aggression, bullying and hate speech. Already marginalised and vulnerable communities are often disproportionately at risk of receiving such abuse, compounding other social inequalities and injustices. This has created a huge risk of harm, exacerbating social tensions and contributing to the division and breakdown of social bonds. Global tragedies demonstrate the potential for online hate to spill over into real-world violence.

In this session, we address the risk of harm that emerges from abusive online interactions and scrutinise the need for human rights to be more actively integrated into how online spaces are governed, moderated and managed. This session has direct relevance at a time when thought leaders, politicians, regulators and policymakers are struggling with how to understand, monitor, and address the toxic effects of abusive online content. We adopt a multi-stakeholder approach, reflecting the need for social, political and computational voices to be heard to develop feasible and effective solutions.


5. Gender Equality
9. Industry, Innovation and Infrastructure
10. Reduced Inequalities
16. Peace, Justice and Strong Institutions
17. Partnerships for the Goals

Targets: Through our on ground work in armed conflict zones such as Myanmar, Democratic Republic of the Congo (DRC), South Sudan, and Sri Lanka, we've come to realise that hate speech unequally impacts different groups of people. It's negative effects fall disproportionately for women and other gender-based groups in particular. They are frequent targets of hate speech, especially when they are also members of ethnic, religious, or other minority communities. This session aligns with the theme selected as we are committed towards empowering and safeguarding the rights of women, girls, and minority groups. We strive to shift this patriarchal narrative that has compounding effects. Testing the different innovative tools (such as Hatebase) and technical capabilities at the IGF will enrich the conversation on how emerging tech can be the key entry point towards protecting human rights and empowering these communities as change agents. We also understand that tackling hate speech should be built on establishing and strengthening core partnerships between different stakeholder groups across the world. More details can be found here : https://hatebase.org/


The impact of hate speech on fragile states has risen exponentially in recent years resulting from misinformation that spreads and creates an environment for hate speech to spread rapidly across social media. There are concerns that this has contributed on a large scale to persecution, armed conflict, and genocide in various developing countries. It is imperative for us to use this global forum to engage with relevant experts across different regional and cultural contexts, and with expertise from a range of fields.

A key challenge with online hate is finding and classifying it -- the sheer volume of hate speech circulating online exceeds the capabilities of human moderators, resulting in the need for increasingly effective automation. The pervasiveness of online hate speech also presents an opportunity since these large volumes of data could be used as indicators of spiralling instability in certain contexts, offering the possibility of early alerts and intervention to stem real-world violence.

Artificial Intelligence (AI) is now the primary method that tech companies use to find, categorize and remove online abuse at scale. However, in practice AI systems are beset with serious methodological, technical, and ethical challenges, such as (1) balancing freedom of speech with protecting users from harm, (2) protecting users’ privacy from the platform deploying such technologies, (3) explaining the rationales for their decisions that are rendered invisible due to the opaqueness of many AI algorithms, and (4) mitigating the harms stemming from the social biases they encode

In this session, we bring together human rights experts with computer scientists who research and develop AI-based hate detection systems, in an effort to formulate a rights-respecting approach to tackling hate. Our hope is that bridging the gap between these communities will help to drive new initiatives and outlooks, ultimately leading to better and more responsible ways of tackling online abuse.

Expected Outcomes

The main outcome is for participants to leave with a clear understanding of the complexities of online hate, the difficulties of defining, finding and challenging it, and the limitations (but also potential) of AI to ‘solve’ this problem. We will focus particularly on cultural, contextual, and individual differences in perceptions and understandings of online hate. Relatedly, participants will understand the complex ethical and social issues involved in tackling online hate, particularly the need to protect freedom of expression, the risk of privacy-invasion from large-scale data mining to monitor online hate, and the potential for new forms of bias and unfairness to emerge through online hate moderation. Participants will understand the opportunities in deploying a human rights based approach to tackling online hate.

The session will create a direct conversation between 4 key stakeholders (Private, Civil, Technical and Government) who all work to tackle online abuse to establish a shared understanding of challenges and solutions, but are rarely brought into contact. We hope that this session will motivate new discussions in the future and collaborations, encouraging efforts to ‘bridge the gap’ between human rights and data science researchers working in this space. In particular, we anticipate articulation of a global human rights based critique of data science research practices in this domain, helping to formulate constructive ways to better shape the use of AI to tackle online harms.

We will ensure that these outcomes reach back to the wider community through: 1)A summary report of the discussion that would be published in The Alan Turing Institute Blog and The Sentinel Project Blog 2)A follow up consultation workshop with attendees who can contribute as linguists towards Hatebase’s Citizen Linguist Lab 3)Dissemination of the blogs through various social media channels associated with the wider community 4)A one-hour discussion with stakeholders in the computer science community at the following year’s Workshop on Online Abuse and Harms (2022), hosted at the ACL conference

The session will be divided into two parts, each one exploring key issues of online hate 1) the challenge of defining, categorising and understanding online hate, 2) the opportunities and challenges of using AI to detect online hate, and 3) the ethical challenges presented in different interventions to tackle its harmful effects) Each part will be led by a moderator and will include a group of selected expert speakers. The speakers will start by discussing the questions posed by the moderator, followed by an open Q&A session before moving to the next part. This format will, on one hand, keep the speakers and participants focused on each one of the issues that we aim to address in each section and, on the other hand, it will keep the participants engaged, both on-site and online by providing opportunities for open discussion throughout the whole workshop. The interaction provided by the online platform will further enrich the discussion and the remote moderator will be able to share a summary of the chat interventions so that the participants - if not connected, are able to follow and engage with online participants. Other tools may be used at the beginning of each session to encourage participation and to fuel the debate. 2) Given the current circumstances, we hope to organise the session in a hybrid format. However, if the situation does not permit an on-site gathering, we will opt for a remote hub option. Our on-site moderator will ensure direct coordination with the on-site to enrich the discussion and collect feedback from those who will login remotely.

Part 1: 45 min Categorising, understanding and regulating hate speech using AI

35 Min Roundtable (2 mins interventions)

Question 1: What are the key dimensions that social media firms should report on in order to ensure clearer communication of policies such as content guidelines and enforcement to users? Question 2: What should we do with online hate? Is the answer just to ban people? Question 3: What role, if any, does AI have to play in tackling online hate?

Moderator, Safra Anver, British Council (Private)

Speakers 1) Giovanni De Gregorio, University of Oxford (Academia) 2) Vincent Hofmann, Leibniz Institute for Media Research/Humboldt Institute for Internet and Society (Technical) 3) Lucien Castex, AFNIC (Inter-government)

10 minutes Q&A

Part 2: 40 mins Tackling conflicts and ethical challenges in Global South and Middle East

35 minute roundtable

Question 1: Who should be responsible for the development and enforcement of policies to restrict hate speech and incitement to violence online, and how should these be applied? Question 2: How do we protect freedom of speech whilst still protecting from hate? Question 3 : How can we crowdsource hate speech lexicons for appropriate linguistic, cultural, and contextual knowledge?

Moderator : Safra Anver, British Council (Private)

Speakers: 1) Raashi Saxena, Hatebase for The Sentinel Project (Youth in Civil Society) 2) Neema Iyer, Pollicy (Private) 3) Dr. Rotem Medzini ,Israel Democracy Institute (Civil Society)

10 mins Q&A

Closing remarks: 5 minutes by Bertie Vidgen and Safra Anver 

Online Participation

Usage of IGF Official Tool. Additional Tools proposed: The co-organisers will actively promote the session on their respective social media handles, encouraging remote participation and consultation on the issues raised during the discussion. Remote participants will be able to pose questions to subject matter experts and other participants during the session through Slido. We will also use polls, shared documents and activity based tools such as Miro/Mural board to enhance participation. Events would be created on LinkedIn and Facebook for maximum outreach. Digital promotional materials will be published on official online platforms of all co-organisers (eg. Blogs, Medium articles).

Key Takeaways (* deadline 2 hours after session)

1) Having good linguistic coverage is difficult. There is a lot of disparity there. Things offensive in one language/culture/dialect are different in the same language but a different region 2) Access to data is essential to understand content moderation. One of the key takeaways is access for reach 3) Mapping out the local context of hate and improving Machine Language techniques requires a lot of quality data

Call to Action (* deadline 2 hours after session)

1) Development of policy should be done by a wide variety of stakeholders to create a standard practice. 2) Practically speaking the enforcement of policies to restrict hate speech comes down to service providers, social media companies, etc. They should adopt a hate speech policy that adequately balances freedom of expression, and develop different notification schemes and responses for national contact points, "trusted reporters", and users

Session Report (* deadline 26 October) - click on the ? symbol for instructions


Safra Anver, Moderator, Introduced the session and sets the context. Zeerak Talat talks about the Cultural and Linguistic hegemony that influences online spaces and disproportionally affects marginalized groups. Marginal Cultural Mean we are regression Data is mostly focused on Global North.  Social Media companies use Section 230 as their foundational value. Because of these foundational methods and values, we end up having systems which are developed to marginalise. 
Machine learning removes dissenters and contestation. We end up with all of the responses to the insurgency being removed from the content moderation because they are moving towards the salient means. 

Part 1 : Categorising, understanding and regulating hate speech using AI

Panelists discussing the regulation of hate speech in different geographical contexts


Giovanni De Gregorio invites the audiences not to view the questions around hate speech and AI just from a technological standpoint, but also to think about the social dimension. If we look at the issue of automated moderation of hate speech as a technological problem, it is a matter of context as well as a matter of language. In some languages where there is almost no training in AI, especially places like Eastern and Southern Africa, the automated moderation of content is limited. There are small projects carried out to translate a small piece of information into data to teach AI a particular language. Considering the examples of Myanmar and Ethiopia, it is critical to understand not only what AI can detect but also the incentives social media have for developing more accountable AI systems, especially considering the issue of different languages. This is not a problem that should be attributed just to AI. There are also other issues relating to understanding the incentives guiding social media content moderation and the nuances in the protection of free speech in context. The question is both technological and social at the same time.

Vincent Hofmann explains the legal perspective of online hate speech moderation. Companies' decisions for moderation have direct impact on the fundamental rights of people. In addition to Freedom of speech, there is an impact on political debate and political jurisdictions too. Facebook/Meta can moderate more content than it is prohibited by law. They also need to have procedural rights granted for these decisions. In the context of automated decision making : AI decision making must be made understandable so it can be challenged, i.e. the procedural rights of those confronted with the decisions made by AI. With the mass of the growing online space, it is impossible to moderate without AI. The language barrier for moderation, the cultural language problem still exists.

Lucien Castex addressed the fundamental rights and it's impact on speech and privacy from his work with the French National Human Rights commission. There are several drawbacks from taking down online content: There is a risk associated with mass removal of content i.e. grey area of big risk of censoring. The framework for evaluating the content is a very short time span and is harmful to the freedom of speech. The time proposed by the legislation to pull down content is within 24 hours and it is not adequate time to evaluate the content.  There is also a massive impact of the context on the content. Language should also be seen as ( language + context ), which makes it very difficult to assess the effectiveness AI systems. It reinforces biases for moderating content. He highlights the need for a national action plan for digital education to protect people online. 

We also had an active audience. There was an audience question raised by Babar Sohail on non-transparent algorithms. Responding to his question Lucien commented on the need to establish a strong team of moderators that natively speak the same language. These are the people who understand the cultural context and sears flagging the content and reliance on the judicial system to protect the information. Lucien highlighted the need to have rules that are transparent suggested to have committees for content moderation to help understand and enable better moderation.

Naz Balock, a parliamentarian from Pakistan posed a question : The hate speech in one country would not be considered in another, when we feel that hate speech has taken place in another part of the world, when this is flagged to the social media providers is not considered as hate speech in another part of the world?

Giovanni responded by commenting on how different content moderation is in different parts of the world. Language is one of the reasons for the inequality in  content moderation. In addition to language, there is a role of human moderators. Nobody knows where these moderators are located around the world. It is important to learn who is moderating this content across the world to understand context and increase accountability. A question also arises on who should be allowed to be a human moderator considering their role can be relevant to interpret context.

Part 2 : Tackling conflicts and ethical challenges in Global South and Middle East

Neema Iyer spoke about the online abuses women face during election. Pollicy conducted a hate speech study titled Amplified Abuse against women politicians during the 2021 general elections of Uganda. Mapping out local context of hate and improving ML techniques requires a lot of quality data which is currently scarce. The team used the Hatebase API and conducted a few lexicon building workshops to gather inputs. The abuse women get is sexualised and gendered. Women are targeted based on their personal life, while men are targeted based on their politics. The existing biases impacts women, Africans and people of color. Women tend to be shadow banned if they talk about queer issues or racism since the content gets flagged as hate speech. There is also a need that arises to fund indigenous researchers from the Global South who are compensated appropriately for their time by private sector players.

Rotem Medzini presented the co-regulatory model implemented during his research to study antisemitism. The model is divided into two parts. The first part is common criteria to identify hate speech (and balance it with freedom of expression). The common criteria are scaled to allow tech companies that provide online social platforms to more easily define a uniform policy on moderating hate speech. Each continuum supports a choice between the two poles: on the left side, more lenient options that enable less intervention in freedom of expression; on the right side, stricter options that lead to the deletion of more content. The second part is comprised of a procedural guide on how to implement the policies within the organization. We provide managers with steps on how to implement the criteria into their organization and online platform and then make a decision on content that violates these policies.

Raashi Saxena presented Hatebase and highlights the importance of understanding how the online world affects the offline world, especially in the context of violence and mass atrocities. Hate speech isn't a new phenomenon. Difficult to pinpoint the original source of the information and promote the rapid spread of hate speech.. (Eg. Armenian genocide and Nazi Germany). These instances in history required coordination efforts and large scale approvals from multiple bodies (institutional infrastructure & resources). Now anyone with an internet connection and smart phone can potentially reach a larger audience (outside the purview of national boundaries) to spread propaganda. There is no universal accepted definition of hate speech. There are several ethical, social and ethical challenges around hate speech. The sheer volume of growing information online makes it difficult to moderate without automation. It is also Human moderators are poorly paid and it takes a massive mental toll. Strong linguistic . The dialects and slangs spoken in one country are very different from the same language spoken in a different country. We do have human moderation human intervention is needed to train the technology and help make decisions in ambiguous cases with respect to Hate Speech. The Citizen Linguist lab is an opportunity for anyone from across the world to contribute towards Hatebase’s lexicon of keywords based on nationality, ethnicity, religion, gender, sexual discrimination, disability, and class to identify and monitor potential incidents of hate speech as well as providing the necessary social and cultural nuance to raise the overall linguistic profile. One does not have to be a professional to contribute. Along with a global network of people and organizations working on related issues, openness, information sharing, collaboration, counter messaging, informed policy making and education will enable communities to make better decisions (on how to navigate through violent situation or interact with other communities)

Key takeaways

  • AI cannot understand context or language. The companies invest little amount of their funding on moderators and focus more on developing the AI systems.
  • The boundaries of data and content which are very closely related with each other. To train AI we need not only non-personal data but also personal data since AI processes content
  • To be able to understand how moderation is operated and enforce such transparency obligations is the first step
  • Accessing data is essential to understand content moderation. One of the key point is access for research (which is GDPR compliant)
  • Moderations of images can be difficult in consideration of day to day items and may not be considered hateful such as the sending of machete images to women candidates in the Ugandan Elections
  • Contemporary words require that the new words that it seems are already known to it. Symbols and emojis within text are considered in AI moderation where it is compared against of variables to identify the possible closest word