Imagine having a time travel machine and being able to prevent autocracy before it happens. If you could identify the moment, the trigger that could be averted, so that society does not go down the spiral of censorship and repression. While time-travelling is still science fiction, we might be closer to the possibilities of knowing where to act so history does not repeat itself.
One key to unlocking the interaction between the past and the possibility of shaping our future is the International Press Institute (IPI) archive, the most comprehensive collection of contemporary press freedom struggles, comprising thousands of reports, country case files, legal and expert analysis, commentary by key political and historical figures, audio files, and other materials dating from 1950 to 2005. These materials span the Cold War, post-colonial transitions, military regimes, and the rise of digital information control, providing granular, longitudinal insight into the tactics, laws, and narratives used to suppress or defend the press globally. All this information is currently in analogue formats – our ambition is to digitize it and transform this historically significant yet largely inaccessible archive into a structured, open, and AI-ready dataset, thereby opening up for researchers and the public over 75 years of unique documentation on the fight for press freedom as well as the extraordinary resilience of journalism in the face of censorship, political and economic pressure, and technological revolution.
Opening this archive is not just for preservation, but the first step toward creating a digital press freedom commons for the public good. We want to use the lessons of history as well as the technology of our times to help solve the democratic crisis we are in. This includes exploring how machine learning and other techniques could help us learn from the past to shape the future of press freedom.
A missing approach in the current ecosystem
Despite the exponential growth of AI capabilities, the current data ecosystem remains heavily skewed toward scale over substance—favoring vast, unstructured, and often biased corpora scraped from the web. What is missing are high-quality, longitudinal, and ethically sourced datasets that reflect governance, rights, and participatory dynamics across diverse global contexts.
We consider that datasets such as the IPI archive address this gap by offering a uniquely structured, time-rich dataset focused on the evolution of media repression and resilience, reform, and democratic fragility. If we can make it happen, we will experiment with governance models but also the digitisation process will make sure the data set is purpose-built for fine-tuning, retrieval-augmented generation, and causal inference—domains where context and signal quality far outweigh volume.
The archive’s human-curated, multilingual content would offer a grounded alternative to the noisy, siloed, and commercially controlled data pipelines shaping today’s AI systems.
And, unlike most of the current efforts, a commons-based governance structure will be considered from the first moment. The AI Commons governance will be stewarded through a Commons Charter, co-created with the IPI membership and technical contributors, defining licensing, contributor rights, and ethical safeguards for the use of the data.
In our exploration, we think we could deliver a dataset that, combined with other ethical AI tools, could help us with:
- Identifying early signs of media repression based on historical precedents. Tracing the narratives used to justify censorship in specific geopolitical contexts, and understanding the tools and strategies to support and protect journalism.
- Supporting evidence-based policymaking and rights-based advocacy with historical analogues.
- Powering search, risk modelling, or natural language understanding in the media freedom domain.
Paving the way towards a federated Press Freedom Commons
We also know that other global and regional press freedom or broader human-rights organisations have their own collections in different languages documenting similar struggles. So, what would happen if we grow the effort further? IPI and the Open Knowledge Foundation (OKFN) want to document this experimental effort to provide a tested, replicable model for other institutions holding high-value, underutilised public interest and civic archives—be they journalistic, legal, or human rights-related. These institutions could adopt its governance and data design framework, contributing to a global, decentralised corpus for open, ethical, and socially responsive AI development.
This initiative represents a strategic opportunity to create a concrete example of what a shared digital infrastructure for public-interest AI looks like. For the teams involved, this is what AI infrastructure for the public good looks like. The initiative is in its ideation and fundraising phase. If you are interested in knowing more about it, contact us.
