IGF 2023 WS #217 Large Language Models on the Web: Anticipating the challenge

Thursday, 12th October, 2023 (01:30 UTC) - Thursday, 12th October, 2023 (03:00 UTC)
WS 3 – Annex Hall 2

Artificial Intelligence (AI) & Emerging Technologies
Chat GPT, Generative AI, and Machine Learning

Organizer 1: Sara Berger, IBM
Organizer 2: Diogo Cortiz da Silva, 🔒Brazilian Network Information Center - NIC.br
Organizer 3: Yuki Arase, Osaka University
Organizer 4: Reinaldo Ferraz, 🔒NIC.br
Organizer 5: Ana Eliza Duarte, NIC.br

Speaker 1: Santana Vagner, Private Sector, Western European and Others Group (WEOG)
Speaker 2: Yuki Arase, Civil Society, Asia-Pacific Group
Speaker 3: Barbara Leporini, Civil Society, Western European and Others Group (WEOG)
Speaker 4: Emily Bender, Civil Society, Western European and Others Group (WEOG)
Speaker 5: Dominique Hazaël-Massieux, Technical Community, Western European and Others Group (WEOG)


Diogo Cortiz da Silva, Technical Community, Latin American and Caribbean Group (GRULAC)

Online Moderator

Reinaldo Ferraz, Technical Community, Latin American and Caribbean Group (GRULAC)


Ana Eliza Duarte, Technical Community, Latin American and Caribbean Group (GRULAC)


Round Table - 90 Min

Policy Question(s)

A. What are the limits of scraping web data to train LLMs and what measures should be implemented within a governance framework to ensure privacy, prevent copyright infringement, and effectively manage content creator consent? B. What are the potential risks and governance complexities associated with incorporating LLMs into search engines as chatbot interfaces and how should different regions (i.e Global South) respond to the impacts on web traffic and, consequently, the digital economy? C. What are the technical and governance approaches to detect AI-generated content posted on the Web, restrain the dissemination of sensitive content and provide means of accountability?

What will participants gain from attending this session? The workshop will introduce a technical and governance debate about LLM focusing on the Web. Although there are other spaces for discussion on AI governance, this session focuses on raising concerns about the complexity of ethics and governance when incorporating LLMs into the Web ecosystem. Through an interdisciplinary and diverse approach (speakers are from different backgrounds, stakeholder groups, regions, and include a person with a disability), the panel will provide the necessary technical knowledge of how LLMs have the potential to change the Web and some possible governance consequences, such as: the challenge of privacy and consent in data collection, fundamental changes in how users search for information, potential impacts to the digital and physical economy and how to manage content production using AI. The workshop will stimulate the audience to critically think about and question LLMs' emerging governance concerns, especially in the Web context.


One of the leading generative AI approaches is the so-called Large Language Models (LLMs), complex models capable of understanding and generating texts in a coherent and contextualized way. Chatbots powered by this technology are becoming popular and disrupting different areas by offering a general-purpose conversational interface. LLMs improve the user experience, accessibility, and search functionalities on the Web. However, their integration raises governance and ethical concerns. This workshop will focus on three perspectives: data collection for model training, LLM integration into search engines, and the content production process. A language model is only as good as the quantity and quality of the data that feed it. One of the strategies for creating a training dataset is the large-scale and indiscriminate scraping of content on the Web. It raises questions about ethics and governance, such as user consent, privacy, copyright violations, misinformation, and social and cultural bubbles reinforcement. LLMs are changing the way people search for web content. Some users are using LLM-powered chatbots as primary data sources due to their ease of use and direct answers. Search engines also incorporate LLMs and conversational interfaces, potentially reducing access to the original content. This shift can impact web traffic and disrupt the dynamics of the digital economy. There are also concerns about biases, cultural under-representation of some regions, and possible manipulation of information. Generative AI also changes how people produce content. While LLMs may increase productivity in some circumstances, they can facilitate the production of false content and misinformation. In this sense, it seems reasonable to discuss strategies to facilitate transparency, explainability, and accountability of AI-generated content. The impact of LLMs on the Web will be transformative, and this workshop will provide a space to anticipate technical, ethical, and governance matters from an interdisciplinary perspective.

Expected Outcomes

Our workshop will introduce the theme to the IGF agenda and inform the audience about the challenges of integrating LLMs into web platforms. It is an emerging topic, so this workshop will contribute to anticipating potential use cases, risks, governance challenges, and possible mitigation approaches. We will inform the audience about the new trends, potential critical uses of integrating LLMs and Web services, and the ethical and governance challenges. We also plan to elaborate and provide a list of recommendations from speakers and participants to introduce a multi-stakeholder perspective about the impacts of LLMs on the Web to guide the local policy agenda.

Hybrid Format: The session will be structured into three main segments: an introduction, a discussion, and an interaction with the audience. During the first segment, each speaker will have 5 minutes to present their perspective on the topic, providing a multi stakeholder view. The second segment will feature the speakers sharing their opinions on the policy questions. The final segment will involve interaction between the audience and the speakers through a Q&A session. The online moderator will collect questions from the audience and share them with the onsite moderator, who will then distribute them among the speakers. To ensure a smooth session, we plan to hold an online meeting with all the speakers one week prior to the IGF event. This will allow us to align the interventions of the speakers based on their participation format, whether online or in person.