Autocrisp: AI Agents that Automate CRISP-DM

EDC Card — The Olympic Dream: A Definitive Database of Every Olympian
Automated Data Mining CRISP-DM Automation Agent-based AI Natural Language Data Science Data Lab Automation Educational AI tools Data Sandbox
Case study

EDC support

Modern data science often follows a 6 step methodology called CRISP-DM (Cross Industry Standard Process for Data Mining): understanding the business problem, understanding the data, preparing the data, modelling, evaluating results, and deploying solutions. While powerful, this process is often time-consuming and requires advanced coding skills, posing barriers for many researchers and students. To address this, EDC started an initiative in collaboration with the Applied Data Science & AI program at Hogeschool Rotterdam where one of their students acted as lead AI engineer. This resulted in a tool called “Autocrisp”, a system that intelligently automates this entire pipeline. Built and tested in the EDC Data Sandbox, Autocrisp turns the traditional data mining process into a natural language-based experience, redefining who can participate in advanced data work.

Autocrisp replaces complex coding with a conversation-based experience, powered by a team of AI agents powered by Large Language Models (LLMs) and coordinated via N8N workflows.

  • Business Understanding: Users can perform meta-research, scan the web for studies, and find datasets using simple text prompts.
  • Data understanding and preparation: Using Streamlit and Autogen (AG2), users can explore, clean, and processes data.
  • No-code Modelling: machine learning models can be built with no coding experience required
  • Evaluation: Results are analysed and evaluated based on natural language prompts.

All actions are logged and reviewed in a human-in-the-loop process, where code is checked by a human before being executed. Support systems like Supabase handle secure data storage and user authentication behind the scenes. By removing the code barrier, Autocrisp allows researchers and students to perform high-level data work independently.

Diagram illustrating the Process
Six Essential Steps of CRISP-DM

Impact

Autocrisp highlights EDC’s deep technical capabilities and its commitment to making data science more inclusive and efficient. By automating each step of the CRISP-DM process and translating it into plain English, Autocrisp opens up data analysis to a much wider audience, including students, researchers, and other professionals without programming backgrounds. It makes data exploration and model building faster, easier, and far more accessible. Autocrisp offers a reusable, scalable framework that empowers users to transform complex questions into real-world impact. This initiative further builds on the extent of storage and computational power the EDC Data Sandbox carries.

Further Visuals

EN