The Olympic Dream: A Definitive Database of Every Olympian

EDC Card — The Olympic Dream: A Definitive Database of Every Olympian
Definitive Olympic Athletes Dataset Sports Data Analytics Data Integration Data Standardisation Data Enrichment
Case study

EDC support

The Olympic Games are the world’s largest sporting event, yet until recently no comprehensive dataset existed that systematically captured both who Olympic athletes are and the inequalities that have shaped participation and representation over time. To address this gap, Prof. Dr. Gijsbert Oonk initiated OlympHIS: Olympic History & Inequality Statistics, a project designed to build an authoritative, accurate, and analytically powerful registry of every Olympian while embedding data on gender, nationality, migration, colonial legacies, and unequal access to competition. EDC’s data lab partnered in this effort to consolidate, clean, and enrich athlete records from multiple sources, transforming them into a single source of truth for studying not only sporting excellence, but also the global politics of inclusion and exclusion at the Olympic Games

Our multi-faceted approach began with matching and merging records—linking athletes and events across sources using unique IDs, names and event details, and manually reviewing unmatched cases. We then corrected and standardised data, including mapping country abbreviations against the official IOC list. Finally, we significantly enriched the dataset with athlete details, event data, geographical coordinates and economic indicators from specialised sources—overcoming inaccuracies, gaps and inconsistent standards in the original materials.

  • Entity resolution: athlete & event matching and manual review
  • Data correction & IOC-aligned standardisation
  • Enrichment with geo and economic context
  • Reusable framework for large-scale sports datasets

Impact

The resulting large-scale dataset unlocks “big data” research across sports science, history, sociology and economics. With a unique, authoritative registry comprising 300,000+ athlete–event combinations, scholars and journalists can explore long-term trends in athlete demographics, relationships between economic development and Olympic performance, and the global distribution of athletic talent. The EDC framework provides a solid foundation for future socio-economic and historical research on the Olympic movement.

Testimonial

Prof. Dr. Gijsbert Oonk ESHCC

Working with the EDC team was an outstanding experience. Their support was instrumental in helping us validate, clean and enrich Olympic athlete data drawn from multiple, often inconsistent sources. From intelligently matching and merging records to meticulously correcting and standardising country codes with official IOC references, every step was handled with expertise. What truly stood out was their ability to enrich the dataset with geographic and economic data from specialised sources. Thanks to their multi-faceted and collaborative approach, we now have a single, authoritative dataset that we can confidently rely on—and with this updated dataset we are ready for the next steps!

Further reading

  • IOC country codes reference
  • Sports data entity resolution methods
  • Best practices for dataset standardisation & enrichment
EN