THE GLOBAL SYNTHETIC DATASET
News: In December 2022, IOM released The Global Victim-Perpetrator Synthetic Dataset produced using an updated version of Synthetic Data Showcase with added support for differential privacy. The resulting dataset describes victim-perpetrator relations. It is CTDC's second synthetic dataset, and the first to provide the guarantee of differential privacy.
Microsoft Research has worked with IOM to develop a new algorithm to derive “synthetic data” from CTDC’s sensitive victim case data. Rather than systematically redacting cases, which results in a substantial amount of data being suppressed, the algorithm generates a synthetic dataset that accurately preserves the statistical properties and relationships in the original data. Representative data on all of CTDC’s victim of trafficking cases are now available as a downloadable data file thanks to the new algorithm.
The synthetic dataset provides first-hand, critical information on the socio-demographic profile of victims, types of exploitation, and the trafficking process, including means of control used on victims. The new algorithm has enabled CTDC to share more data and allow more effective research to be conducted while protecting privacy and civil liberties. Access to additional attributes of victim case records will enable stakeholders to develop a more comprehensive understanding of this crime and the needs of survivors.
The records of the synthetic dataset no longer correspond to actual individuals and each is constructed entirely from common attribute combinations. This means that none of the attribute combinations in the synthetic dataset can be linked to distinctive individuals (or even small groups of distinctive individuals) in the sensitive dataset, or world at large.
The new privacy-preserving synthetic data solution, developed at Microsoft Research in the Python programming language, has also been made freely available via GitHub.
To help detect unseen patterns in case records, Microsoft Research has also applied the synthetic dataset on an interactive tool called “ShowWhy”. “ShowWhy” assumes no prior knowledge of coding or causal inference. It includes a pattern detection dashboard that enables users to explore causal relationships over observational data.
This data release and supporting technology were made possible by the Tech Against Trafficking 2019 Accelerator Program, in which IOM worked with Microsoft, Amazon, BT, Salesforce, and the broader community to advance the data and technology foundations of the CTDC platform.
Please find the dataset, codebook, and data dictionary below. We encourage you to check out the FAQs page for more information about the data.
Data and Resources
- The Global Synthetic Datasettsv
This is the global synthetic dataset. It represents data from over 156,000…Preview Download
- Global Synthetic Dataset Codebookdata
This document contains a list of variables and definitions the global synthetic…Download
- Global Synthetic Data Dictionarydata
This file describes all the variables in the global synthetic dataset.Download
- The Global Victim-Perpetrator Synthetic Explainer Presentationpdf
This presentation provides an overview of our collaboration with Microsoft…Download
|Public Access Level||
Global Synthetic Data Dashboard
This dashboard is generated using the Global Synthetic Dataset. On pages 1-3, you can visualise descriptive trends and explore the relationship between variables. Click the ? icon (top right) to learn more about the privacy-preserving properties and utility of the synthetic dataset. Click around and explore!