AI Predictive Model to Beat NFL Point Spreads - Part 1

I set out to build an AI predictive model to beat the spread and over/under for the NFL. This will ultimately be used for NFL betting and I will share the ups and downs of the journey. I will not share the actual model I develop - but will give hints. While this is not about Public Sector AI - I believe the story would be good insight for a typical AI predictive model journey. This is Part 1 in a series (Part 2).

When building AI models, "Journey" is a great description. Any AI journey is going to start, experiment, learn, iterate and improve - throughout the entire process. A good AI model is never fully complete. The first phase of this journey, took me about 50 hours. Here is a little background about NFL betting...

Point Spread: The point spread is a sports betting term that represents the expected margin of victory for the favored team. Bettors can wager on whether the favorite will win by more than the spread or if the underdog will lose by fewer points or win outright.
Over/Under (Total): The over/under is a bet on the combined total points scored by both teams in a game. Bettors can wager on whether the actual total score will be over or under the predicted line set by the sportsbook.

I picked NFL Point Spread betting as a challenge because, Point Spreads...

Are notoriously difficult to beat as they are almost like a coin toss
Have tons of historic data that is easy to access to train a predictive model
Are easy to monitor success - because you either win or you don't

Understand the Domain and Data

I started with research to better understand the NFL domain and the data available. I certainly wasn't going to be the first or last to try and do this - so what did others find? Lots of great articles and previous experiments from others to build upon.

Through research I was able to determine that an ELO rating (a system to rank teams and players) would not be predictive. And was able to rapidly move on.

Additionally through research, I found that Thursday night NFL games are particularly difficult to predict - so I decided to avoid those games.

I spent the majority of my time doing lots of basic analytics and some advanced analytics to better understand the domain and data. I cannot stress enough the importance of basic analytics to help guide understanding - I know AI is exciting - but lets not overdo it. Basic math is often the most insightful path.

One example of better understanding, was an analysis I completed with over/under averages per week from 2014 - 2023. This chart above shows the over/under betting average total score and the actual average total score. Through visual observation you can see that (1) the over/under is very similar to the trends of the actual total score, (2) the total scores tend to lessen as the season goes on, and (3) when the playoffs come the score skyrockets as better teams enter the playoffs and lesser teams stop playing.

I decided I was going to avoid playoff games as part of my predictive model. Interestingly, point spread for the favorite team follows a similar trend with spread difference getting smaller as the playoff begin. See the point spread chart below.

Have Theories - Avoid Emotional Attachment

I had a theory that good plays cause wins and teams that more recently executed good plays would be more likely to execute good plays in the future. I was convinced. I downloaded 10 years of every play NFL teams executed. Through causal AI and basic analytics - I was able to find strong correlations that the team with more "good" plays than their opponent was much more likely to win the game.

What I couldn't do was develop a predictive indicator from play data. I spent a lot of time down that rabbit hole. Eventually, I had to admit defeat on my most favorite theory and let it go. I was a little dejected that after so much work - I didn't have any indicators that were good at predicting point spread or over/under.

I put the project down and decided to give it a 2 week break.

While my head cleared - a conversation with a friend provided some clarity. He found from an article that - predicting covering point spread and over/under was very difficult... the article recommended predicting final score. Additionally, A contact in the Data Science community remined me to keep it simple, start small and build from there.

After a 2 week break. I started over... based on pass learnings and research - it only took another 10 hours to build a "moderately good" AI predictive model to beat NFL point spread and over/under.

The resulting predictive model was actually two different models that started with Lasso regressions and later improved by being converted and tuned to Polynomial regressions. My metrics to determine the strength of my model on the test data - proved to be "moderately good". Not good, not great - but good enough to get started.

Tools I used were MySQL, R, Python, Google Cloud, STATA, and ChatGPT. Ultimately, I was able to reduce the tools to a combination of ChatGPT and R.

To adjust to the "moderate" nature of my model - I decided to only pick games where there was a level of certainty by the AI model. I came up with a basic math equation to determine "confidence" leading the AI model to likely make recommendations on only 25% to 50% of the games each week.

I'm running out of time, there are many new ideas and new indicators I want to test, and the NFL season first Sunday is just a couple days away. I feel like I could have tweaked my AI model endlessly. I'm a big believer in failing fast and iterating - so it was time to put my Beta Version 1.0 in the real world. I plan to bet very low amounts while uncertainty is high - so risk is mitigated.

Dozens of other experiments would have to wait for future iterations. My model was simple and counter-intuitive to many of my own biases. Test it with real future NFL 2024 season data and see how it does. I'll watch it for 30-40 games/picks and make adjustments after that. I'll report back after roughly 30 games.

Wish me luck, I have my doubts - and am excited to see the results and adapt.

By Greg Godbout from Flamelit

CEO of Flamelit - a start-up Data Science and AI/ML consultancy. Formally the Chief Technology Officer (CTO) and U.S. Digital Services Lead at the EPA. Greg was the first Executive Director and Co-Founder of 18F, a 2013 Presidential Innovation Fellow, Day One Accelerator Fellow, GSA Administrator's Award Recipient, and a The Federal 100 and Fedscoop 50 award recipient. He received a degree in Economics with a concentration in Business from St. Mary’s College of Maryland, a Masters in Management of IT from the University of Virginia, and is currently working on a Masters in Business Analytics and AI from NYU.

AI Predictive Model to Beat NFL Point Spreads - Part 1

Recent Posts

Comentarios