Network

#IPIArchive: The Road to a Press Freedom AI Commons

Opening the International Press Institute (IPI) archive is not just for preservation, but the first step toward creating a digital press freedom commons for the public good

#IPI75 Archive commons cover by Dinara Satbayeva

Imagine having a time travel machine and being able to prevent autocracy before it happens. If you could identify the moment, the trigger that could be averted, so that society does not go down the spiral of censorship and repression. While time-travelling is still science fiction, we might be closer to the possibilities of knowing where to act so history does not repeat itself.

One key to unlocking the interaction between the past and the possibility of shaping our future is the International Press Institute (IPI) archive, the most comprehensive collection of contemporary press freedom struggles, comprising thousands of reports, country case files, legal and expert analysis, commentary by key political and historical figures, audio files, and other materials dating from 1950 to 2005. These materials span the Cold War, post-colonial transitions, military regimes, and the rise of digital information control, providing granular, longitudinal insight into the tactics, laws, and narratives used to suppress or defend the press globally. All this information is currently in analogue formats – our ambition is to digitize it and transform this historically significant yet largely inaccessible archive into a structured, open, and AI-ready dataset, thereby opening up for researchers and the public over 75 years of unique documentation on the fight for press freedom as well as the extraordinary resilience of journalism in the face of censorship, political and economic pressure, and technological revolution.

Opening this archive is not just for preservation, but the first step toward creating a digital press freedom commons for the public good. We want to use the lessons of history as well as the technology of our times to help solve the democratic crisis we are in. This includes exploring how machine learning and other techniques could help us learn from the past to shape the future of press freedom.

A missing approach in the current ecosystem

Despite the exponential growth of AI capabilities, the current data ecosystem remains heavily skewed toward scale over substance—favoring vast, unstructured, and often biased corpora scraped from the web. What is missing are high-quality, longitudinal, and ethically sourced datasets that reflect governance, rights, and participatory dynamics across diverse global contexts.

We consider that datasets such as the IPI archive address this gap by offering a uniquely structured, time-rich dataset focused on the evolution of media repression and resilience, reform, and democratic fragility. If we can make it happen, we will experiment with governance models but also the digitisation process will make sure the data set is purpose-built for fine-tuning, retrieval-augmented generation, and causal inference—domains where context and signal quality far outweigh volume.

The archive’s human-curated, multilingual content would offer a grounded alternative to the noisy, siloed, and commercially controlled data pipelines shaping today’s AI systems.

And, unlike most of the current efforts, a commons-based governance structure will be considered from the first moment. The AI Commons governance will be stewarded through a Commons Charter, co-created with the IPI membership and technical contributors, defining licensing, contributor rights, and ethical safeguards for the use of the data.

In our exploration, we think we could deliver a dataset that, combined with other ethical AI tools, could help us with:

  1. Identifying early signs of media repression based on historical precedents. Tracing the narratives used to justify censorship in specific geopolitical contexts, and understanding the tools and strategies to support and protect journalism.
  2. Supporting evidence-based policymaking and rights-based advocacy with historical analogues.
  3. Powering search, risk modelling, or natural language understanding in the media freedom domain.

Paving the way towards a federated Press Freedom Commons 

We also know that other global and regional press freedom or broader human-rights organisations have their own collections in different languages documenting similar struggles. So, what would happen if we grow the effort further? IPI and the Open Knowledge Foundation (OKFN) want to document this experimental effort to provide a tested, replicable model for other institutions holding high-value, underutilised public interest and civic archives—be they journalistic, legal, or human rights-related. These institutions could adopt its governance and data design framework, contributing to a global, decentralised corpus for open, ethical, and socially responsive AI development.

This initiative represents a strategic opportunity to create a concrete example of what a shared digital infrastructure for public-interest AI looks like. For the teams involved, this is what AI infrastructure for the public good looks like. The initiative is in its ideation and fundraising phase. If you are interested in knowing more about it, contact us.


Written by Renata Ávila (OKFN) & Scott Griffen (IPI)
Renata Ávila & Scott Griffen

Renata Ávila is the CEO of the Open Knowledge Foundation (OKFN). She is an international human rights and technology lawyer and openness advocate, helping individuals and organisations access and use data to take action on the most pressing social problems, as well as preserving and enhancing human rights through open standards, policy and advocacy. In her previous practice, focused in strategic litigation for access to information and access to justice, she represented high profile human rights advocates, including Nobel Peace Prize Rigoberta Menchu Tum. A former fellow and affiliate of the Stanford Institute of Human-Centred Artificial Intelligence, she is currently associated with the Center for Internet and Society at CNRS, France. She participates on the boards of several organisations, including Open Future, the Center for the Advancement of Infrastructural Imagination and the Just Net Coalition. She co-founded the Alliance for Inclusive Algorithms and the Progressive International. She has co-authored two books and contributed chapters to several others, and regularly contributes to different publications in English and Spanish.

Scott Griffen is the executive director of the International Press Institute (IPI), the global network of editors, publishers, and media executives for press freedom. A respected media freedom advocate and expert, he joined IPI in 2012 and served as the organization’s deputy director from 2018 to 2024. He is the author of numerous reports on press freedom and independent journalism and has led IPI programs and in-country advocacy in dozens of countries on six continents. He holds degrees from Yale University, King’s College London, and Johannes Kepler University Linz.

Become a member

IPI membership is open to anyone active in the field of journalism, in news media outlets, as freelancers, in schools of journalism or in defence of press freedom rights, who supports the principle of freedom of the press and desires to co-operate in achieving IPI’s objectives.

Become a member

Latest