Microsoft Research has worked with IOM to develop and refine an algorithm to generate synthetic data from CTDC’s sensitive victim case records. The resulting synthetic data accurately preserves the statistical properties of the original victim data without representing actual victims. Data linking the profiles of trafficking victims and perpetrators are now downloadable thanks to this technology.
The Global Victim-Perpetrator Synthetic Dataset provides first-hand information on the relationships between victims and perpetrators. The dataset includes IOM case data from over 17,000 victims and survivors of trafficking identified across 123 countries and territories, and their accounts of over 37,000 perpetrators who facilitated the trafficking process from 2005 to 2022.
This is the second synthetic dataset derived from victim of trafficking case records, and the first to provide the guarantee of differential privacy. Differential privacy was developed at Microsoft Research in 2006, and today represents the gold standard in privacy protection. The new differential privacy approach to synthetic data generation provides quantifiable privacy guarantees against any privacy attacks, even across multiple data releases. The technology has enabled CTDC to share more data and conduct more robust research while protecting privacy and civil liberties.
The new privacy-preserving synthetic data solution, developed at Microsoft Research, is available as both open-source software and a free-to-use web application that enables the creation of synthetic datasets interactively in the web browser.
This data release and supporting technology were made possible by the Tech Against Trafficking 2019 Accelerator Program, in which IOM worked with Microsoft, Amazon, BT, Salesforce, and the broader community to advance the data and technology foundations of the CTDC platform.