This post was co-authored by our partners Santiago Borrajo of the Centro de Estudios Legales y Sociales and Lesedi Bewlay of The Engine Room. It originally appeared on The Engine Room’s website and is republished here with permission.
Over the past year, as part of The Engine Room’s Matchbox programme, we have been supporting Argentine human rights organisation CELS (Centro de Estudios Legales y Sociales) to reconceptualise their existing database documenting cases of institutional violence. Together, we sought to construct a new one that better fits the organisation’s documentation needs, workflows and security requirements. To build this database, we relied on the open-source tool Uwazi, developed by HURIDOCS.
In this blog post, we dive deeper into the process behind how we selected the Uwazi tool and migrated CELS’ 12,000 cases of institutional violence onto the new platform. Adopting a new database can be time-consuming, tricky and maybe not the most glamorous of tasks. That said, it is a key way to strengthen many other parts of an organisation’s work – we hope others can use our learnings in their own journeys!
Figuring out needs and mapping potential tools
To start, we knew it would be important to identify the key needs of the new platform. Starting from the ‘must-haves’ helped ensure we and the CELS team were starting from a shared understanding. In this case, together with CELS, we determined that any tool chosen should:
- Be sensitive to the context of a civil society organisation, with demonstrated use cases and successful use by other civil society organisations
- Be open-source
- Have the potential to be self-hosted
- Allow for document management and tagging
- Have a search functionality
- Be capable of exporting data in a way that is interoperable with other softwares, and also in a human-readable format
Taking this list of key requirements as a base for exploring potential tools, we conducted a landscape scan of tools that had the potential to be a good fit. Mapping these tools against the list of key requirements, CELS decided to use HURIDOCS’s open source document management tool Uwazi.
Restructuring data to fit into a new platform
Adopting a new data management platform is rarely straight-forward. To ensure that existing data would have a suitable home in Uwazi, we needed to look closely at the data itself and plan how it would map from the existing system to Uwazi. To do this, we performed a mapping of the data from CELS’ existing system by extracting the relevant tables from the database (through multiple steps) and flattening this data into a spreadsheet so that we could have a visual layout of where all the relevant data currently sat.
Uwazi stores data in a different structure than CELS’ existing database did; Uwazi uses a MongoDB database, which stores data in a document structure, and the existing database uses a relational structure in an SQL database. What this meant was that we had to think about the data structure itself differently, from the ground up. So, one of the first challenges we faced was that of restructuring existing data in a way that would both respect the essence of how it was already organised and also fit into Uwazi’s structure. Importantly, restructuring the data for the new system meant that we would also have to re-work the forms and document templates used to enter data into the system.
Each case in the original database was linked using a unique case number. The case number enabled connections with complementary information, such as people involved in the case, judiciary information, sources, etc. The team decided to continue using this methodology–since it provides a simple way to link different aspects of a case’s data independently from a software–and replicate the relational database structure using Uwazi’s relationship feature. This feature creates a visual diagram where users can click through each document and see how many relationships there are. For example one case may have three police officers and four victims that users can see and explore visually.
Moving data to the new tool
Originally, CELS had over 12,000 cases on their original system that would need to be migrated to Uwazi. While Uwazi does enable bulk uploading, using a CSV file, the data in the file needs to be standardised and precisely matched to the structure set up in Uwazi. (This isn’t unique to Uwazi, moving data around and matching formats is a key part of most data work, demonstrating the importance of care and patience in these processes.)
In order to create the CSV files for upload into Uwazi, we had to go through some data wrangling steps. First, we ran several custom queries in the existing databases and formatted the resulting datasets into CSV files. Once we’d extracted the records, we needed to ensure that each column name in our CSV file had a matching data field in Uwazi. The data formats in the CSV file also needed to match the data formats we’d specified in Uwazi. (For example, a column with a date field in the CSV file had to only contain date information in order to upload correctly into Uwazi; this rule applies for text and number fields, too.)
This took a few rounds of trial and error before we got it right.
In addition to our own attempts and learnings, other factors impacted our success. For example, the speed of the connection and the stability of the server where Uwazi is running affected the speed and accuracy of the uploads. Since we were uploading thousands of cases, we split the uploads into batches 500, 1,000 or more, depending on the size of the upload. While they were processing, we made sure to monitor the progress and check in for errors.
Additionally, the importance of upload order quickly became clear. In order to re-create the relational structure, we had main tables and “sub-tables,” which sat beneath the main tables. In order for the sub-tables to create the correct relationships between documents, they needed to be uploaded after the main tables.
With all of these moving pieces, the importance of backups quickly became clear! If an upload goes wrong, you might need to start entirely over unless you have periodic snapshots from previous stages. We learned to prepare a backup after every new data upload stage, so we’d have checkpoints we could return to if need be.
Towards the end of our process, we also took advantage of a built-in API in Uwazi, using it to connect CELS’ new database to a public-facing dashboard. The output of this is Violencia Policial, a website that publicly shows key cases of institutional violence in Argentina.
What we learned through the process
Through this process we learned several lessons, including:
- Get to know your data. Before working with and structuring your original data, it’s key to understand how clean your data is, how it fits your data structure and where gaps may lie. Particularly when migrating data from an existing database to a new one, you need to understand how your current data structure fits your new structure.
- Be conservative with your timeline. Migrating the data to your new system is a phased process and probably shouldn’t be done all at once. It’s critical to take your time and not rush this process–it’s important to get it right. Planning for mistakes, by allowing more time than you think you’ll need, can help alleviate some of the stress of figuring out what works best.
- Get the right people on board. We suggest engaging the help of partners who know the tool you’ve chosen well and have worked with it in a similar context. For example, HURIDOCS have a great team we relied on, alongside consultants within the civic tech space who have experience working with Uwazi. We also suggest looking up publicly available documentation on the tool you have selected and experiences by other organisations who have relevant experiences (perhaps that is The Engine Room!).
- Choosing the right servers can be hard. To host the tool and its data, the team decided to go with an external Virtual Private Server (VPS) provider, which is a good option if you would like to increase accessibility and lower costs and human resource requirements. However, if data ownership and security is key then you may want to self-host your instance. Keep in mind that this will require more planning around human resources and maintenance.