Post

Simulating Madness with bigdance

A Python package for simulating March Madness brackets and estimating your pool's win probability

March Madness is a combinatorial playground unlike any other. The sheer number of possible outcomes leads to a plethora of bracket-picking strategies, some more logical than others. Knowing close to nothing about college basketball but still wanting to participate, I started building a python package in 2019 as an attempt to bring some numerical insight into an event that is and will always be best described as madness (and we love it for that).

But this toy model of a package also became an incredibly valuable experience for my career in the context of software development and package management. In data science and open source coding, it is remarkably easy to come down with imposter syndrome. “I’m not a real software engineer…” “What if my PR looks like crap?” Fantasy sports gave me a gateway to ask questions that in other more serious contexts might feel “stupid” (even though they’re probably not). “What the hell is PEP8?” “How do I load this damn thing into PyPI?” “Unit tests are incredibly annoying, why would I ever need them… oh wait, that’s why…” The less-than-serious nature of the subject gave me a sandbox to learn new concepts, ask questions, and most importantly, not be afraid of messing up.

What started as a thesis-writing distraction in grad school and evolved into a pandemic project sadly grew stale after becoming a father. But with the advent of AI coding assistants like Claude Code, I decided to dust it off and return to the sandbox one more time. It has once again taught me way more than I thought it could around what’s possible with open source software, especially in the new frontier of AI agents. After playing around in my free time for a few years, I finally feel like it’s in a developed enough state to show the rest of the world, hence the purpose of this article!

Getting into the details of the package itself, bigdance pulls real-time college basketball ratings from Warren Nolan, simulates tournament brackets with adjustable upset factors, and can even scrape your ESPN Tournament Challenge pool to estimate each entry’s win probability. It also includes game importance analysis to figure out which remaining matchups matter most for your bracket’s chances. Whether you’re a fan looking to improve your bracket picks, a data scientist analyzing tournament patterns, or a researcher studying sports predictions, bigdance offers customizable tools to help you simulate and analyze the Big Dance. You can install it with a simple pip install bigdance (PyPI)), and the full source code is available on GitHub.

There’s also a live web app at bigdance-bracket.streamlit.app — no installation or login required. It lets you pick your bracket, configure your pool size, and run Monte Carlo simulations to estimate your win probability, all from the browser. It supports both the men’s and women’s NCAA tournaments and includes an Upset Strategy tab with pre-computed analysis of winning bracket patterns by pool size, so you can see how your picks compare to what historically wins.

For data scientists just starting their career, I highly encourage you to find a subject that you derive joy from and put it down in code. It can be as serious as climate change or cancer research, or as silly as fantasy sports or cooking shows, whatever brings you joy. It alleviates some of the frustrations in learning something new, while hopefully putting a smile on your face as you progress in your craft and your career.

(Can’t think of what subject to play with? I highly recommend scrolling through the TidyTuesday datasets for inspiration!)

This post is licensed under CC BY 4.0 by the author.