1. Key Policy Questions and Expectations:
• How can we support the development of digital public goods such as common data infrastructures to train artificial intelligences, e.g. for voice recognition technology in underrepresented languages?
• How can we develop sustainable governance models for data commons based on a multi-stakeholder approach?
• Which role can data commons play as an instrument of innovation policy and means to stimulate supply and demand for innovative technological solutions?
2. Summary of Issues Discussed:
The group discussed the areas of (1) institutions and data governance structured needed to govern and maintain the commons successfully, (2) incentivising structures and community engagament mechanisms for the collection of open data (supply) and how to build an ecosystem around them to stimulate the use of these datasets (demand) and (3) private vs. personal data ownership and the rights of the data holder.
In the discussion, the group tended towards a data governance model in the sense of "commons" as opposed to "public goods". There was a controversy around data ownership: One participant held the view that all data are intangible assets. But if data is the new oil, we have to study what oil actually did to people. Other participants held the view that data should not be a commodity at all, rather a common infrastructure. Also, sharing data means to give something away, benefits need to be returned to the communities who are the source of data (which is seldomly the case). The key to collect high quality data and use it effectively is by having more data commons and having capacity building for us to be able to use it.
3. Policy Recommendations or Suggestions for the Way Forward:
The group discussed two key policy questions regarding data governance: (1) whether to aspire for data as a commons in the sense that a community will decide about all governance questions and collectively maintain the data vs. data as public good that is maintained by the state. There is a need to clarify non-profit vs for profit-uses of data. Background: One participant held the view that all data are intangible assets. Individuals can give data in exchange for a service. Companies transform data to money through analysis – and offer customers (you) the product. (2) It was discussed how value can be created from data commons and data as public goods. While open data in theory is available to all, creating value from it requires economic and technical means that are unequally distributed. To level the playing field, it is not enough to invest in data collection. (Policy) solutions are needed to democratize the tools needed to extract value from data, that is, e.g. skills building and investment in high-value public datasets. At the same time, building an ecosystem around a public good/commons should follow potential use cases from the beginning.
4. Other Initiatives Addressing the Session Issues:
Data commons initiatives mentioned included the collection of open voice data through Mozilla Common Voice Project and the collection of accessibility information based on the Open Streetmap wheelmap.org. On the policy level, the example of a systematic judicial policy on open data in Brazil was mentioned as well as proposals to regulat data sharing for SMEs on the EU level.
5. Making Progress for Tackled Issues:
- Capacity building to build demand for data commons
- Strengthen data user communities, i.e. journalism, science – not everyone can become a data scientist
- Crisis as driver of change” approach: Create an ecosystem to solve concrete global problems, like climate - build commons around concrete use cases with a high level of interest from different stakeholders
- tackle the disconnection between data and the subject in data collection: While raw data is often directly associated with a person, the whole dataset is conceptulized as already new intellectual work with different principles
6. Estimated Participation:
80 participants, of which around 40% have been women
7. Reflection to Gender Issues:
Gender was discussed in terms of biases in existing and newly collected data sets. Even if data is crowdsourced, biases will prevail. One concrete example: open voice data is heavily biased towards male speakers.