top of page

76.5% Accuracy - AI Predictive Model to Beat NFL Point Spreads - Part 2

I set out to build an AI predictive model to beat the spread and over/under for the NFL. The model is performing really well - likely best NFL game picker (handicapper) right now. I will not share the actual model I develop - but will give hints. If I give away the model - the betting advantages will disappear. While this is not about Public Sector AI - I believe the story would be good insight for an AI predictive model journey. Advice that would be useful in public sector AI initiatives. This is Part 2 in a series (Part 1).


Before the 2024 season, I built version 1.0 of a predictive model (see Part 1), and created a small group of friends and family who were interested in playing along and seeing if AI can beat the point spreads. It has been a lot of fun. After 5 weeks of applying the AI model against 2024 NFL games - we are doing extremely well. 


For fun, my family and friends helped to select a name for my AI product. Meet “Side Bet Sally”. This photo was generated by ChatGPT.



Sally picks anywhere from 4 to 9 NFL games each week. She makes recommendations for which team will cover and only recommends games she is “confident.” After week 5 she has successfully picked 26 out of 34 games giving her an astounding 76.5% accuracy rate!


Sally is Likely the top NFL Handicapper after NFL Week 5


After some success, I decided to compare Sally against human NFL Handicappers. We decided to use this popular NFL betting site Capper Review to help track Sally’s success (see ranking below). Currently, Sally is beating all the other NFL Handicappers for 2024, as she has a record of 26-8, which using Capper Review calculations means $1,538 profits, and an ROI of 45.23%. For my family and friend’s group - our modest betting has actually profited $2,239 with an average of $88 per bet. Our plan is to gradually increase bets from here as we grow more confident.



Side Bet Sally Evolving to an AI Product


An AI model is a mathematical or computational representation that has been trained on data to recognize patterns, make decisions, or predict outcomes. Examples include machine learning models, neural networks, and deep learning architectures. An AI product is a fully developed application or service that incorporates one or more AI models to deliver specific functionality or solve a problem.


Sally as a BETA version started out as two AI predictive models and has developed into a 1.0 version that is the early stages of a full AI product. The graphic below defines the process and components of typical AI products. As you can see the AI product is much more than just a model. It encompasses 9 raw data sources, 2 learning data sets, 2 AI predictive models, 4 decision models, and governance & risk mitigation throughout. The process is constantly evolving with regular feedback loops. AI is not a simple model that acts blindly - it is an interactive flow of learning, testing, action - iteration and repetition. Governance and risk management is and should be an expected, natural part of AI because they improve the AI product.



The Power of Lift - Put it in Production


In the context of AI models, particularly in machine learning, "lift" refers to a measure of how much better a predictive model is at identifying true positive outcomes compared to a random guess or a baseline expectation. Lift quantifies the improvement a model provides over random predictions. NFL spread betting is generally considered a 50/50 proposition or a coin toss. So in the case of Sally picking at 76.5%, her lift is 1.53 times (76.5% / 50%) better than the random baseline. When you have a model that shows a lift during testing - be sure to put it in production, while applying governance and mitigating risks. Don’t let perfection get in the way of improving the current state. Any lift provided by AI is an improvement.


At the core of the benefit of AI to society, businesses, and government - is the lift it gives to our everyday life, work processes and improvement of value of our existing services and products. These AI improvements come with additional flaws and risks which should be taken into consideration with every AI implementation - or you risk losing the benefit of the “lift”. 


Side Bet Sally, Like all AI, is Flawed


“All models are wrong, some are useful.” (Statistician George Box). As George points out, we should focus more on applying useful AI with the understanding that it is flawed - rather than debating endlessly if an answer is correct in all cases. Instead we can govern and mitigate risk for that.


As I apply Sally to real world NFL data it becomes clearer her strength is the success of the patterns she discovered through machine learning and her weakness is not understanding the “common sense” of specific contexts. For example, Sally does not know when a team has "quit on themselves". Teams that quit because of fired coaches or they already qualified for playoffs before the end of the regular season.

At the aggregate level of Sally’s patterns is the built-in assumption that teams are always trying. But occasionally teams stop trying. If we can identify that moment, then we can prevent Sally from making that error. Currently I use a Human in the Loop (me) to not make bets when a team has clearly quit. 


ChatGPT has proven helpful in doing that analysis. This is a useful strategy - using AI to help govern AI while keeping a Human in the Loop. Upcoming NFL week 6 in particular has a few teams of concern that are showing signs of quitting. The summary text below is the advice ChatGPT provided. ChatGPT was also able to provide additional references and likely causes of team quitting on themselves.


ChatGPT Advice: “If you're looking at which teams are still fighting despite their records, the Patriots might be the best bet for showing some signs of life as they rebuild. The Browns appear to be struggling the most mentally, and the Panthers' injury issues might limit their ability to improve significantly in the near term.”


In future versions of Sally, I’d like to create an automated AI governance model that identifies when a team has quit. 


Remember the advice of Dr. Cox: “all models are wrong, some are useful” – this means it is best to always apply governance and mitigate risks for every AI product. Any discussion of AI should be inclusive of a discussion of model weaknesses, and how to govern and mitigate. Sally is no different. 


For my next blog (in another 15-20 games), I’ll cover the importance of a Human in the Loop, Governance and the Decision Model. Applying these concepts have likely improved Sally’s performance. I’ll also let you know if Sally is continuing to pick well or performing worse.


Wish me luck, my doubts have turned to excitement. Winning money has a tendency of doing that.

 
CEO of Flamelit - a start-up Data Science and AI/ML consultancy. Formally the Chief Technology Officer (CTO) and U.S. Digital Services Lead at the EPA. Greg was the first Executive Director and Co-Founder of 18F, a 2013 Presidential Innovation Fellow, Day One Accelerator Fellow, GSA Administrator's Award Recipient, and a The Federal 100 and Fedscoop 50 award recipient. He received a degree in Economics with a concentration in Business from St. Mary’s College of Maryland, a Masters in Management of IT from the University of Virginia, and is currently working on a Masters in Business Analytics and AI from NYU.

Stay informed, join our monthly newsletter

Thanks for subscribing!

© 2024-2025 Analytics & Insights Alliance

bottom of page