The Olympic Dream: A Definitive Database of Every Olympian

EDC Card — The Olympic Dream: A Definitive Database of Every Olympian
Definitive Olympic Athletes Dataset Sports Data Analytics Data Integration Data Standardisation Data Enrichment
Case study

EDC support

The Olympics are the world’s biggest sports event, yet no comprehensive dataset of all Olympic athletes had existed. To change that, researcher Prof. Dr. Gijsbert Oonk initiated a project to build an authoritative, accurate registry of every Olympian. EDC’s data lab partnered to consolidate, clean and enrich athlete data from multiple sources into a single source of truth.

Our multi-faceted approach began with matching and merging records—linking athletes and events across sources using unique IDs, names and event details, and manually reviewing unmatched cases. We then corrected and standardised data, including mapping country abbreviations against the official IOC list. Finally, we significantly enriched the dataset with athlete details, event data, geographical coordinates and economic indicators from specialised sources—overcoming inaccuracies, gaps and inconsistent standards in the original materials.

  • Entity resolution: athlete & event matching and manual review
  • Data correction & IOC-aligned standardisation
  • Enrichment with geo and economic context
  • Reusable framework for large-scale sports datasets

Impact

The resulting large-scale dataset unlocks “big data” research across sports science, history, sociology and economics. With a unique, authoritative registry comprising 300,000+ athlete–event combinations, scholars and journalists can explore long-term trends in athlete demographics, relationships between economic development and Olympic performance, and the global distribution of athletic talent. The EDC framework provides a solid foundation for future socio-economic and historical research on the Olympic movement.

Testimonial

Prof. Dr. Gijsbert Oonk ESHCC

Working with the EDC team was an outstanding experience. Their support was instrumental in helping us validate, clean and enrich Olympic athlete data drawn from multiple, often inconsistent sources. From intelligently matching and merging records to meticulously correcting and standardising country codes with official IOC references, every step was handled with expertise. What truly stood out was their ability to enrich the dataset with geographic and economic data from specialised sources. Thanks to their multi-faceted and collaborative approach, we now have a single, authoritative dataset that we can confidently rely on—and with this updated dataset we are ready for the next steps!

Further reading

  • IOC country codes reference
  • Sports data entity resolution methods
  • Best practices for dataset standardisation & enrichment
NL