What is the Counter-Trafficking Data Collaborative?
The Counter-Trafficking Data Collaborative is the first global data hub on human trafficking, publishing harmonized data from counter-trafficking organizations around the world. Launched in November 2017, the goal of CTDC is to break down information-sharing barriers and equip the counter-trafficking community with up to date, reliable data on human trafficking.
What data are available?
Data on CTDC is available either to download or to visualize. The visualizations are powered by aggregate statistics which are available for download, and anonymized versions of the CTDC datasets are available publicly to download from the site. Additional datasets will be published on CTDC as the Platform receives more data from partners, and each dataset type will be detailed here:
The global victim of trafficking dataset
The CTDC global victim of trafficking dataset is the largest of its kind in the world, and currently exists in two forms. The data are based on case management data, gathered from identified cases of human trafficking, disaggregated at the level of the individual. The cases are recorded in a case management system during the provision of protection and assistance services, or are logged when individuals contact a counter-trafficking hotline. The number of observations in the dataset increases as new records are added by the contributing organizations. The global victim of trafficking dataset that is available to download from the website in csv format has been mathematically anonymized, and the complete, non k-anonymized version of the dataset is displayed throughout the website through visualizations and charts showing detailed analysis.
The global synthetic dataset
In September 2021, CTDC released its first downloadable Global Synthetic Dataset, representing data from over 156,000 victims and survivors of trafficking across 189 countries and territories (where victims were first identified and supported by CTDC partners). The privacy-preserving synthetic data solution, developed at Microsoft Research in the Python programming language, is freely available via GitHub. Please refer to the definitions page for more information on synthetic data.
The global victim-perpetrator synthetic dataset
In December 2022, CTDC released the second synthetic dataset, the Global Victim-Perpetrator Synthetic Dataset, which was produced using an extension of the algorithm with added support for differential privacy. Please refer to the definitions page for more information on synthetic data.
Where do the data come from?
The data come from a variety of sources. The data featured in the global victim of trafficking dataset come from the assistance activities of the contributing organizations, including case management services and counter-trafficking hotline logs.
How are the global datasets created?
Each dataset has been created through a process of comparing and harmonizing existing data models of contributing partners and data classification systems. Initial areas of compatibility were identified to create a unified system for organizing and mapping data to a single standard. Each contributing organization transforms its data to this shared standard and any identifying information is removed before the datasets are made available.
How is the individual-level data protected?
Counter-trafficking case data contains highly sensitive information, and maintaining privacy and confidentiality is of paramount importance for CTDC. For example, all explicit identifiers, such as names, were removed from the global victim dataset and some data such as age has been transformed into age ranges. No personally identifying information is transferred to or hosted by CTDC, and organizations that want to contribute are asked to anonymize in accordance to the standards set by CTDC.
In addition to the safeguard measures outlined in step 1 the global victim dataset has been anonymized to a higher level, through a mathematical approach called k-anonymization. For a full description of k-anonymization, please refer to the definitions page.
IOM collects and processes data in accordance to its own Data Protection Policy. The other contributors adhere to relevant national and international standards through their policies for collecting and processing personal data.
What is GIS?
CTDC uses Geographic Information Systems (GIS) to map the main geographic trends at country level, without pointing to specific route coordinates. More information about this can be found in the definitions page.
How does human trafficking case data relate to prevalence data?
There are currently no global or regional estimates of the prevalence of human trafficking. National estimates have been conducted in a few countries but they are also based on modelling of existing administrative data from identified cases and should therefore only be considered as basic baseline estimates. Historically, producing estimates of the prevalence of trafficking based on the collection of new primary data through surveys, for example, has been difficult. This is due to trafficking’s complicated legal definition and the challenges of addressing difficult, sensitive questions to respondents in household surveys in an ethical manner.
The only comparable global estimate is the 2017 Global Estimate of Modern Slavery, which estimates the prevalence of the related crimes of Forced Labor and Forced Marriage. This estimate was produced by the International Labour Organisation (ILO) and the Walk Free Foundation (WFF) in collaboration with IOM. The 2017 report estimates that 40 million people were victims of modern slavery in any given day in 2016. Out of these, approximately 25 million people were in forced labour and another 15 million people were in a forced marriage.
CTDC case-level data are from victims of human trafficking who have been identified or assisted by the contributing organisations. As with all data from identified cases, it is challenging to infer to what extent trends within identified victim populations are representative of the total victim population, since trafficking is a crime intended to be undetected and identified cases are not random samples of the population. This does not mean that they are unrepresentative of the population, however, and testimony from survivors of trafficking are one of the best and only sources of information available on this complex crime. They provide detailed data and opportunity for analysis on the profile and form of trafficking.