Counter-Trafficking Data Collaborative (CTDC) serves as a central repository for critical information about human trafficking. It publishes standard and harmonized data from various organizations using a unified schema in its Global Dataset. CTDC facilitates an unparalleled level of cross-border, inter-agency analysis and provides the counter-trafficking movement with a truly in-depth understanding of this complex issue. Equipped with up to date and reliable information, decision makers are empowered to develop and implement targeted interventions.
The Global Dataset available to download from the website has been k-anonymized(k=11), and currently has approximately 47,000 observations. The complete, non k-anonymized dataset currently contains nearly 80,000 observations. The visualizations on this website are based on the complete, non k-anonymized dataset. The data in both of these datasets is based on case management data gathered on identified cases of human trafficking which recorded in a case management system, as well as on data gathered from individuals contacting a counter-trafficking hotlines. The number of observations is constantly increasing as new records are added by the contributors.
The Creation Process
Many attempts at data standardization do not come to fruition. Organizations often struggle to complete the resource intensive process of updating their technical systems and retraining staff to the degree required to fully adopt a newly devised standard. Knowing this, the founding partners of the CTDC, IOM and Polaris, decided to pursue a more pragmatic approach. Rather than attempting to reach a global consensus on the ideal human trafficking classification system and then overhauling their systems to match this standard, IOM and Polaris decided to begin by focusing on compatibilities in their existing data and data management systems.
IOM and Polaris began by comparing their existing data models and data classification systems and identified areas that were identical or compatible. The organizations then agreed on a shared lexicon and format. The first iteration of the Global Dataset comprises data already collected by each organization but mapped to the shared standard. In doing so, CTDC is setting new global standards for collecting, managing, and de-identifying human trafficking data. In the future, and to the extent possible, CTDC aims to align these standards with pre-existing standards in other fields (such as the ISIC , for example). A comprehensive summary of the variables used in the Global Dataset can be found in the Codebook.
Privacy Issues and De-identification Techniques
IOM collects and processes data in accordance to its Data Protection Principles as specified in its Data Protection Manual, and Data Governance Policy. The other contributors adhere to relevant national and international standards through their policies for collecting and processing personal data.
Counter-trafficking case data contains sensitive information, and maintaining privacy and confidentiality is a priority for CTDC. In preparing for the release of the Global Dataset, all reasonable and necessary precautions to preserve the confidentiality of personal data and the anonymity of data subjects were taken. For example, all explicit identifiers such as name were removed, and some data such as age has been transformed into larger categories. For example, a victim aged 23 years old in the Global Dataset will have associated an age range (e.g. between 21 and 24). No personally identifying information is transferred to or hosted by CTDC, and organizations that want to contribute are asked to anonymize in accordance to the standards set for the current Global Dataset through the Data Dictionary and the de-identification techniques.
In addition to the safeguard measures that are implemented when sharing sensitive data, the Global Dataset has been anonymized at a higher level, through k-anonymization. Where k=11, K-anonymization is a technique that redacts data until it is not possible to query a data set and return fewer than 10 results, regardless of the query. This process ensures that individuals cannot be identified through triangulation of more than one data point while at the same time allowing for preservation of the main statistical trends of the dataset.
CTDC also uses Geographic Information Systems (GIS) to map the main geographic trends at country level, without pointing to specific route coordinates. More information about this can be found on the Map page.
By pursuing this approach, the Global Dataset will primarily be a reflection of the types of information that are currently being collected. The Collaborative’s main aim is to bring data together, not to dictate what information is collected or how it is collected. However, as the dataset grows and is used and applied best practices will emerge. These insights will inform the design of new data collection systems and upgrades to existing systems, making the information available to use in the fight against trafficking increasingly powerful.
More information on the Data Contributors can be found here.
This initiative is made possible by the generous support of the American people through the United States Department of State. The contents are the responsibility of IOM and do not necessarily reflect the views of the Department of State or the United States Government.