Autocrisp: AI Agents that Automate CRISP-DM

EDC Card — The Olympic Dream: A Definitive Database of Every Olympian
Automated Data Mining CRISP-DM Automation Agent-based AI Natural Language Data Science Data Lab Automation Educational AI tools Data Sandbox
Case study

EDC support

Modern data science often follows a standard methodology called CRISP-DM (Cross Industry Standard Process for Data Mining), which outlines six essential steps: understanding the business problem, understanding the data, preparing the data, modeling, evaluating results, and deploying solutions. While powerful, this process is often time-consuming and requires advanced coding skills, posing barriers for many researchers and students. To address this, EDC started an initiative in collaboration with the Applied Data Science & AI program at Hogeschool Rotterdam where one of their students acted as lead AI engineer. This resulted in a tool called “Autocrisp”, a system that intelligently automates this entire pipeline. Built and tested in the EDC Data Sandbox, Autocrisp turns the traditional data mining process into a natural language-based experience, redefining who can participate in advanced data work.

Autocrisp uses a team of AI agents powered by Large Language Models (LLMs), designed to understand user questions and carry out complex tasks. The system is coordinated through N8N, which enables automated workflows to guide users through each phase. In the business understanding stage, users can perform meta-research, scan the web for relevant studies, and discover datasets, all with natural language prompts. As they move to data understanding and preparation, tools like Streamlit and Autogen (AG2) allow users to explore, clean, and process data. During the modeling stage, machine learning models can be built with no coding experience required. Based on the natural language prompts, the results can be evaluated. All actions are logged and reviewed in a human-in-the-loop process, where code is checked before being executed. Support systems like Supabase handle secure data storage and user authentication behind the scenes.

Diagram illustrating the Process
Title.

Impact

Autocrisp highlights EDC’s deep technical capabilities and its commitment to making data science more inclusive and efficient. By automating each step of the CRISP-DM process and translating it into plain English, Autocrisp opens up data analysis to a much wider audience, including students, researchers, and other professionals without programming backgrounds. It makes data exploration and model building faster, easier, and far more accessible. Autocrisp offers a reusable, scalable framework that empowers users to transform complex questions into real-world impact. This initiative further builds on the extent of storage and computational power the EDC Data Sandbox carries.

Further reading

  • IOC country codes reference
  • Sports data entity resolution methods
  • Best practices for dataset standardisation & enrichment
NL