Uwazi’s AI-powered Metadata Extractor: A game changer for human rights documentation

In a world where making sense of large swaths of data is becoming more crucial for accountability and justice, civil society needs tailored and ethical tools that lessen the heavy burden of manual data processing and analysis.

That’s where Uwazi comes in: our powerful, flexible, open-source platform is specifically developed to help civil society actors such as human rights defenders, journalists, and researchers make information actionable in the pursuit of truth, social justice and upholding human rights.

Uwazi’s powerful machine learning features are designed to unlock information through scalable, fast classification and analysis features. These include extracting metadata, translating texts, identifying key information, classifying topics and themes, generating tables of contents, analysing documents, and breaking down long texts into usable chunks. Our team is expanding this work further to also add automated cross-references, automatically extract specific entities and convert PDFs to HTML.

Why Uwazi’s AI-powered metadata extraction is a game changer

One of Uwazi’s most transformative features is the Metadata Extractor, designed to solve the challenges of manual data entry and processing. When dealing with thousands of documents, manually labelling and extracting pieces of data is time-consuming, costly, and prone to error. Uwazi’s AI-driven extractor automates this process, pulling key information like dates, names, locations, and topics from documents with speed and precision.

The 5 steps of Uwazi’s metadata extraction

This 5-step workflow streamlines the processing of sizable document libraries, reducing human error, and saving valuable time, especially for organisations managing large collections of evidence.

Step 1: Create Extractor
Define what you want to extract (dates, names, titles) and from which type of document (PDFs).
Step 2: Label Data
Manually highlight examples in documents to teach the system what to look for.
Step 3: Train
The extractor learns from your labeled examples to identify patterns and predict metadata.
Step 4: Automate
Apply the trained extractor across your entire document collection for bulk metadata extraction.
Step 5: Review
Users can accept or reject the system’s predictions, ensuring accuracy and control.

Watch the video below to see how this works in practice:

5 steps of Uwazi’s metadata extraction

Why metadata extraction is crucial

Metadata is the cornerstone of information management. By using the Metadata Extractor, it transforms raw data buried in natural language documents into manageable structured information, ready for actionable insights. Structured data can be used to uncover trends and compile statistics and data visualisations for both quantitative and qualitative evaluation. This helps defenders to focus on what matters most: advocating for the protection of rights and documenting evidence to hold perpetrators accountable.

Metadata extraction offers the following benefits for human rights and social justice workflows:

It is efficient at scale

Human rights organisations often manage thousands of documents, many in PDF or scanned formats. Manually tagging each with dates, names, locations, and topics is not only tedious, it’s nearly impossible to do consistently at scale. Metadata extraction automates this, saving hours of manual labour and reducing costs.

It is accurate and consistent

Manual data entry and categorisation is prone to human error. AI-powered metadata extraction ensures that both humans and the AI work together to ensure consistent labeling, which is essential for building reliable databases used in litigation, advocacy, and reporting as it helps maintain the integrity of evidence.

It enhances searchability and accessibility

Without metadata, large document collections can become unmanageable. With it, they become searchable, sortable, and analysable. This means defenders can quickly locate critical information, like all reports involving a specific location or date range, without combing through every file.

It enhances strategic analysis

Extracted metadata helps documenters recognise patterns across cases such as identifying recurring perpetrators, systemic abuses, or geographic hotspots. This empowers organisations to build stronger cases and advocate more effectively.

Use case: Slashing processing time from months to hours

During the Universal Periodic Review (UPR), the Geneva-based non-profit non-governmental organisation UPR Info monitors the commitments and recommendations from states. They have used Uwazi to build a database containing a high volume of documents that needs to be curated and analysed as it increases access to the UPR, which is a unique mechanism of the Human Rights Council. Manually tagging and extracting from documents for better categorisation can take anything from eight to ten hours per single document!

UPR Info has to sift through 100s of documents, each with 100s of recommendations, all captured within dense and complex text. By using the ML features in Uwazi, and specifically the Metadata Extractor, the amount of manual processing time is slashed dramatically.

Before using ML, it took approximately three months of manual work to process 228 documents.
After using ML, it took only 43 hours to extract and process all metadata from 30,267 recommendations in two languages.

The Metadata Extractor slashes manual processing time dramatically

Making justice accessible, one document at a time

Uwazi is built to be an innovative mechanism for transparency, accountability, and empowerment. Whether civil society needs a tool for preserving evidence, managing complaints, or compiling collective memories, Uwazi offers a secure, ethical, and independent solution tailored for human rights work.

By using Uwazi, civil society actors can rest assured that HURIDOCS employ the following approaches and adhere to the ethical and principled use of AI:

Security and privacy: The data is kept on our servers and the models are run in our own virtual machines
Customisation: Models are trained inside Uwazi, from specific datasets contained in the database
Speed and accuracy: The models take from a few minutes to a few hours to train and can achieve 90% plus accuracy on well-defined tasks with enough data.
Human-centred design: The human is always in the loop. Data points are double-validated, through the ML extraction and human revision. In practice, we have seen models spotting human errors and humans correcting ML mistakes. This works like a symbiosis between humans and AI.
Open-source everything: All the metadata extraction algorithms are open-source and can be found on Github

HURIDOCS’ team of documentation specialists, researchers and software developers are united by a belief in the power of accurate, accessible information and by a commitment to human rights. We partner with organisations across the globe to support human rights defenders with cutting-edge documentation technology and strategies.

The Uwazi journey began in 2015 with a collaborative meeting in Kenya. By 2017, the platform was launched and quickly adopted by human rights organisations worldwide. In 2019, Uwazi integrated AI features with support from the Google.org Impact Challenge, and by 2022, it was recognised as a digital public good. In 2023, Google continued its support through the AI for Global Goals Challenge, helping expand Uwazi’s machine learning capabilities. From 2023, Patrick J. McGovern Foundation supported the development of Uwazi for using machine learning to enhance human rights data management and advocacy.

Uwazi, with its built-in AI features, is central to our technological offering and is designed to preserve evidence, manage cases, build libraries and conserve collective memories for facilitating reconciliation and transitional justice.

If you want to learn more about the transformative power of Uwazi’s AI-driven features, reach out to HURIDOCS. Get in touch with us.

This article is a summary of our presentation delivered at a workshop titled “Information Management & Machine Learning for Human Rights: Digital Transformation in the Public Sector” at the Latsis Symposium 2025.