I recently published a post about in-play tennis prices on Betfair Exchange and how, given the mechanical structure of tennis scoring, one could identify ahead of time the likely levels player prices might reach in the event of a pivotal moment (such as breakpoints and sets won). Entering a trade at these stretched prices offers low downside risk entry points to potentially capture swings in prices. This is very valuable data when trading tennis in-play.
But, not everyone wants to sit there trading a tennis match. it requires time and focus to capture and lock in profit opportunities. So, I’ve been looking at ways to implement a different approach for people who just want to make systematic bets on tennis matches and make some extra money in the process. So, is it possible to develop a statistical-based tennis betting model that would allow for making simple picks rather than trading in-play? Let’s find out 🙂
What I’m about to outline actually has the most unlikely source of inspiration 😉
Some years ago, when I started to get really deep into in-play tennis trading on Betfair Exchange, I would often scour the internet for more knowledge and insights as to how one might model of a tennis match from a math and probability perspective, and how this might relate to in-play prices and odds.
There are countless academic papers and studies on this topic, some of which I list at the end of this article. I warn you though, these are not light reading 🙂
In amongst the academic gems, of course, were the inevitable and countless garbage get rich quick ‘systems’ offered for sale. These are a dime-a-dozen not just in tennis betting, but in any form of sports betting, forex or crypto trading and the like. You can spot them a mile off with their outlandish claims, sales funnel landing pages and ‘one time only discounts’
“I usually charge £5k for this course, but if you order in the next 5 mins, I will give this to you for just £20″…. You know the drill 🙂
One such tennis betting system was marketed so brazenly it actually caught my attention because it appeared to be highly transparent in its methodology, so much so that anyone could quite easily back-test the claims for accuracy.
Like the nerd I am, it became a game for me to see how quickly I could unravel the spin and spot the critical or fatal error in the sales pitch.
I won’t name the offending strategy (amazingly it’s still out there) and, I imagine, some poor suckers continue to part with their cash for access to these top-secret strategies. Amusingly, if you just search for the offending strategy and add the term ‘pdf’ into Google you can find it all there for free… it must be good then 🙂
The basic premise of this get rich quick scheme was something like this….
Favourites win tennis matches most of the time. This is actually true by the way. So, if something is likely to win 65-70% of the time, you’d back it time and time again. Reasonable logic you might think, but the problem is the risk: reward is awful. That being said, the author provided quite detailed history over several years as to how he regularly pulled in a claimed £5-£6k a month from this relatively simple strategy.
So, I went a got hold of some comparable data and replicated the strategy for myself in Excel, rule-for-rule. What I found was almost the total opposite of what was claimed – in fact, it was almost the perfect way to lose money consistently, season after season.
Having proven the marketed strategy as total garbage, I sat on the data I had gathered for a while. I kept asking myself whether there was actually something I could do with this data. Could I reverse engineer something out of nothing?
The advertised system involved backing tennis players via a traditional bookmaker. Nowadays, of course, one can also bet against or ‘lay’ players via the betting exchanges.
You see, betting on favourites in tennis means betting at odds below evens (2.0 or lower in decimal terms) and that means that your potential reward is always going to be less than the risk you assume.
For example, betting £10 on a short-priced favourite like Serena Williams at a price of 1.10 means that you are risking £10 to make £1. Sure, it’s likely she’ll win at that price, but risking £10 to make just £1 is hardly the key to long-term success. You might win 9 of those bets in a row (making £9 profit in the process), but if the 10th bet fails, you’d lose everything you’ve built up. This was essentially the problem with the advertised strategy – the losing bets devastated your bankroll.
So then I thought if backing these favourites was the perfect way to lose money, how would things look if you bet against these favourites instead? In a variation of this strategy, I would bet against pre-match favourites using the exchanges rather than a traditional bookmaker.
This would flip the risk:reward profile on its head. Yes, favourites don’t lose too often, but when they do, you could profit handsomely. A bit like each way betting on horses, you’d only need to win some of the time to more than compensate for losing bets.
Of course, a straight flip of the advertised strategy failed too – it was too simplistic and so, to find an edge, more data and statistics were called for.
A Statistical Approach
So, I began with testing my ideas on the most recent full season’s worth of data. A few hours later and hmmm… things looked interesting! But….. perhaps that year was unique, an outlier in a pool of bigger data.
So I pulled another season’s worth, and another until I had not one or two seasons, but ten full seasons worth of tennis data stretching back to 2009 covering the main tours of both the men’s (ATP) and women’s (WTA) game.
This data-set is comprehensive and includes everything from the tournament name; tournament grade; surface; indoor vs. outdoor; player name; ranking; full score and set-by-set results… and most importantly, the player betting odds across several soft and sharp bookmakers, including Bet365 and Pinnacle.
From there, I began running my pricing theories across multiple seasons worth of data, testing varying scenarios and assumptions in trying to find an angle for betting.
So What Did I Find?
My work here continues to be an active side-project but the results I can share with you thus far are encouraging. Broadly speaking, I’ve observed persistence in specific subsets of data which is remarkably consistent season-to-season and therefore, a good predictor of future potential results.
“If my calculations are correct when this baby hits eighty-eight miles per hour… you’re gonna see some serious shit.” – Doc
Through lots of statistical back-testing, I’ve developed a set of qualifying criteria that I look for in screening tennis matches where the pre-match favourite might lose. These would represent winning bets. The starting point for this criteria is each players starting price, as determined by the main bookmaker, and most importantly, the relationship between these prices that I have observed.
Now, before any naysayers out there start to talk about sample size etc, this sample size is …… huge! I’ve run this over more than 46,000 historic tennis matches on the WTA and ATP Tour, covering full season data from 2009 to 2018 inclusive, plus 2019 year-to-date. I’d argue that’s a meaningful enough sample size to work with.
Let me hit you with some numbers 🙂
The Opportunity Set is Vast
The professional tennis calendar is well established and, season-on-season, the main tours of the ATP and WTA combined have an average of 4,600 3-set matches every year. I’m only focused on this highest tier of tennis and ignore the lower grade tours such as the Challenger and ITF circuits. The liquidity for these events is sub-optimal.
I also only focus on 3-set matches in my analysis as the vast majority of the season is played in the 3-set format with the exception of the four Men’s Grand Slam events (Australian Open, French Open, Wimbledon and the US Open). I’m not ruling out looking at 5-set matches, but they obviously introduce more variables and thus should be treated as a different dataset than 3-set matches. To make solid conclusions from the data set, I’m just focused on 3-set matches for now, given 95% of the season is played in that format.
Of these main tour 3-set matches across the ATP and WTA, around 1,970 per year, on average, meet my ‘qualifying criteria’ for a selection. So that’s a pretty healthy 43% (on average) of all matches in a given season provide the opportunity to bet. This is also spread relatively equally across both the men’s and women’s tour with slightly more in the women’s tour on account that their Grand Slam matches are 3-set affairs and are not excluded as they are for the men.
The tennis season runs pretty much runs for 11 months of the year from January to November, with a small number of matches spilling into early December, or late December (pre-season matches just prior to the main season starting in January each year). But really, the vast majority of activity takes place from January to October – that’s 10 solid months of the year to take advantage of tennis betting.
Regardless of the variation towards the end of the season, if we take it all in, that works out at around an average of 195 qualifying selections per month; 45 per week; or 6-7 per day. This is certainly a very manageable number to implement into a consistent trading plan and would require very little time to implement.
In reality, the number of selections per day will be higher at the start of the given week as more matches are being played as tournaments nearly always begin on a Monday. As the field gets whittled down through the advancement of rounds, the number of qualifying bets will then tail off towards the business end of the week. I think this is also quite a nice feature of tennis betting in terms of accommodating betting around the working week/family time etc. Contrast this to another betting side hustle of mine (each way betting), where there may be anywhere from 15-50 bets available per day, requiring time and focus to capture them all.
This chart shows the number of qualifying bets that were identified, based on the day of the week each season. Note how the number of selections peaks mid-week (Tuesdays), before trailing off towards the end of the week. This is a very common distribution.
So, now we have an expectation for the likely number of qualifying bets we might expect on a typical day, week, month etc, but what about the overall strike rate of success? How many of the qualifying bets were shown to have been winning bets over time?
Like with most of the data observed, things are remarkably consistent. Over the full ten-year period between 2009 and 2018, the strike rate of successful selections averaged around 30% with very little variation season-to-season.
Year-to-date in 2019, it’s running at 29% currently – bang on trend with the long-term history.
Now you might think that a 30% strike rate is far too low. However, remember that we are betting against favourites who are typically priced significantly below evens (2.0). This means that our risk:reward is tilted heavily in our favour, offering a greater upside potential return to downside risk.
For example, if I bet against (lay) a player priced at 1.25 on Betfair, in points terms, I’m risking 0.25 points to potentially make 1 full point. So, if we made that exact bet with a £10 maximum liability (the maximum amount we could lose), we’d be risking £10 to make £40 should the bet be a winning bet. We’d have to deduct 5% commission to Betfair on a winning bet (in this case £2), but you can see, in this example, the risk: reward is roughly 1:4 in our favour.
So, a winning bet can potentially offset a handful of losing bets. In other words, we don’t require a particularly high strike rate to be profitable.
For those of you that also do Each Way Betting on the horses, you’ll know that our strike rate of winning horses runs at about 10% over the long-term, and horses that place runs at around 25-30% in the long-term. In much the same way, we can expect to lose 60-65% of our bets but still produce incredible gains.
Net Points: Laying To Win 1 Point
Before, introducing monetary variables into a model, it’s important to first give a sense of return from a pure points perspective. This is because how any one individual chooses to implement a strategy from a risk management and staking perspective will vary from one person to another. So let’s first look at the purest form of return – points risked versus points won.
Here we are betting on our qualifying selections (i.e. laying pre-match favourites) in order to win 1 full point. Let me explain:
I’ll repeat the same principle from the earlier example. If we are laying a favourite at the desired price of 1.25 on Betfair Exchange, in doing so, we are assuming 0.25 points of risk in order to win 1 full point. If the bet goes our way, our net points won will be +1.0. If we lose the bet, our net points will be -0.25. Pretty simple right 😉
You then simply tally up net points won or lost relative to the total points risked and you arrive at the purest form of return on investment (ROI) calculation. With a small sample size, the ROI is somewhat meaningless, but as the sample size grows it becomes a more dependable measure such as having 10 seasons worth of bets.
So, let’s see how many net points were won or lost over 10 full seasons worth of bets that met my qualifying criteria. This bar chart provides the total net points per season and the ROI based on total points won divided by total points risked.
So, we can see that the strategy has yielded healthy net points for ten consecutive seasons with an average of 118 points won from an average of 710 points risked, yielding an average ROI of +16.7%. The next table, adds a bit more detail as to the number of qualifying bets per season; winners versus losers and a repeat of the Net points and ROI figures above.
Taking this a step further, the following chart shows the cumulative net points achieved per season. This helps us look beyond the final, end of year numbers and consider the variance of results throughout a season. As you’d expect, there is some variation each year (the lowest net points achieved was still a very good 85.6 in 2010, while the highest net points achieved was 136.1 in 2015), but it is undeniably a consistent pattern year-on-year, and most importantly a profitable one at that.
Finally, it is worthwhile to examine results month-by-month. The following table groups the net points achieved every month over the ten season period. You’ll observe that losing months, from a net points perspective, are generally few and far between.
Looking just at net points, of course, will not account for an individual’s staking plan or commissions charged by the exchange for winning bets. We’ll look at the potential impacts of both below.
Laying to a Fixed Liability
Okay, so far so good. We can prove, conceptually at least, that from a pure points gained relative to points risked standpoint, the strategy appears to be consistently profitable and that those profits are observable over more than 10 seasons and a universe of 46,000 completed tennis matches. Let’s then run the same analysis with some money factored in.
In this example, I’m showing the results for adopting a flat staking plan season after season. Let’s assume an individual is willing to risk a flat £10 for every selection and chooses not to adjust this up or down as his or her bank changes based on overall profitability. We’ll also now introduce the 5% commission charge on winning bets. This is the current going rate charged by Betfair Exchange. Other exchanges are cheaper and might, therefore, result in less chargeable commission which would increase profits.
So laying to a relatively conservative £10 fixed risk has yielded consistent profits each season, averaging around £2,357 a year with a 12.9% ROI. That’s around £25,000 in net profits over the entire period, just placing fixed £10 bets. Not bad at all. Again, observe the remarkable consistency across the number of qualifying bets, winners versus losers; strike rate; liability risked; profits earned and ROI.
As before, let’s consider this from a cumulative perspective, season over season. In similar fashion to the cumulative net points charts, things here are also quite correlated season-by-season with profits ranging from £1,700 to £3,000 per season from fixed £10 stakes.
And finally, the month-by-month picture gives a sense of what a typical month might return using a flat staking plan such as £10 fixed. Ignoring the quieter months of November and December, the average monthly net profit is roughly £250.
Dynamic, Rolling Bank With Fixed % Risk
Lastly, let’s look at how things would be if we adopted a rolling bank strategy. This is probably the most likely scenario people would choose when betting (myself included). In this scenario, we start each season with a fresh bank of £1,000 and risk 1% per bet. The 1% risk is fixed as a percentage of the bank value, and the amount that is actually risked in nominal terms is adjusted at the start of each day as the bank grows or declines based on the result of the prior day’s selections.
So, as the bank increases over time, so too does the nominal risk per bet, but always staying at 1% overall risk. This does introduce more volatility but also offers potentially more meaningful returns, especially if you rolled your profits from one season to the next.
As with the other measures, the strategy has yielded profits each season, but with noticeably more variation in returns. At its worst, a £1,000 starting bank would have grown to £4,222 (+£3,222 in net profits) in a year such as 2010. Meanwhile, in a standout year such as 2017, a £1,000 starting bank would have grown to £15,112 (+£14,112 in net profits). The ROI ranges from +7% to +18%, averaging +11.5% per season.
As before, the table below adds more granularity to these returns. I also show the percentage growth in the £1,000 starting bank each season together with how low the bank value would have dropped from that £1,000 starting point, and a measure of that in terms of percentage drawdown. You can see that, at its worst, a £1,000 starting bank in 2016 would have dropped as low at £711 – the equivalent of a 29% drawdown from the starting balance. Despite this, that year still produced net profits of £3,222 with an ROI of 7.3%, net of all commissions.
This greater variation in returns is more evident when we look at the cumulative profits chart for this staking strategy. All seasons were highly profitable but clearly some were better than others in a relative sense.
Lastly, we can view that net profit on a month-by-month basis. The worst individual month was a -£1,283 back in September 2009, whilst the best month on record was +£5,461 in July 2014. The overall totals by year and month can be seen at the foot and to the far right. It looks like profits consistently pick up the pace in the second half of the year which is an interesting trend, of course, partly explained by the compounding of the bank during the early part of each season.
So, across a number of different measures, the strategy appears to be very profitable and consistent. But there are some considerations in trying to implement this strategy real-time.
In thinking how this strategy can be applied in a real-world, forward test situation one major factor became very apparent to me early on. So much so, that once I identified this hurdle I actually binned the whole idea, rendering it inapplicable in a real-world scenario.
However, I soon began to test some workaround assumptions. You see, my qualifying selections are anchored to the player starting prices as recorded by Bet365 shortly before a match starts. These prices are, of course, the prices available to back either player to win the match. While I use these (and their implied probabilities) to screen for qualifying bets, in acting upon them I’m looking to lay the favourites on the betting exchanges.
My first iteration of the model simply took the quoted Bet365 starting prices and applied the lay stakes and liabilities to work out the returns. However, in reality, the back prices at a bookmaker are always lower (at least at a given snapshot in time) than the price available on the exchange to lay that same player.
My historical data and live spot checks suggest that, on average, the lay prices on the exchange are typically anywhere from 4-7 ticks higher than the starting back prices as reported by Bet 365 at the start of the match. This might not sound significant, but every tick matters in this scenario.
Below is one such example. Simona Halep is the clear favourite in this upcoming quarterfinals match at Roland Garros and is available to back at a price of 1.16 on Bet365. However, in jumping straight over to Betfair Exchange, we can see that her price to lay was slightly higher at 1.20 – in this case a 4 tick difference.
If we layed Halep at 1.16 for a £10 liability, we’d stand to win £62.50 on the exchange if she lost the match (£59.38 after commission). If we took the higher lay price on Betfair of 1.2, we’d win £50 on the exchange if Halep lost (£47.50 after commission)
So, in this example, just 4 ticks difference would lower our net profit on that bet by a meaningful 20%. If you can imagine extrapolating that dynamic out across thousands of bets, you can quickly see how that would serve to severely limit overall profitability.
*** Update – Simona Halep lost this match – she was beaten easily 🙂 ***
Keep In-Play Orders
So what can be done to get as close too, or even better than Bet365 starting prices in a lay scenario on the exchange?
Thankfully, the mechanical price nature of tennis that I explored in my prior post is such that a players price will comfortably shorten 5-10 ticks upon winning their service game. So, orders to lay can be placed in the exchange at the equivalent Bet 365 starting price (or even a tick or two lower) and these will almost always get hit shortly after the match begins.
Given favourites are expected to win their matches most of the time, it is therefore very likely that they will at least hold their service game against an inferior opponent at some point early in the match. That’s all we need in order for our price to get filled.
The only scenario this is unlikely to happen would be a favourite getting blitzed off the court (perhaps losing 2 or more service games from the start). This is quite unlikely. I’d also add that the criteria I choose, does not include narrow favourites, but moderate to strongly priced favourites which only adds to the likelihood that they will hold their service game at some point early in the match.
To test this, I placed 50 qualifying bets over a period of 2 weeks a tick or two lower than the advertised starting prices on Bet365 and every single one was matched either pre-match or shortly after play started. More testing is clearly needed to ensure this is an observable trend over many hundreds of bets, but I’m quite confident it will be the case.
The downside of not getting filled is, of course, not the end of the world. Yes, we may have missed out on a potential winning bet, but no entry also means no money lost. I’d rather miss the trade than chase an entry point a few ticks higher than the starting price given the headwinds this would result in terms of reduced profits and higher liability.
Another challenge all betting activity faces is variance and the possibility of a sustained downturn in results. Looking across the full data set of qualifying bets over the 10 seasons, you can see the typical ratio of consecutive winning bets versus consecutive losing bets below. As the chart shows, there have been periods where 20 or more consecutive bets lost. That is a mental challenge to overcome, but equally knowing this can happen ahead of time (and that profits were still achieved if you stayed the course) should help manage those emotions.
Just imagine for a moment, laying to a £30 or £40 liability and losing 25 consecutive bets – that would translate into a loss of between £750 to £1,000, most likely in a just a matter of days (given our average number of selections is around 6-7 per day). Not everyone would be able to handle that, even if these stakes still represented just 1% of your bank.
So, there it is (in a nutshell) my phase 1 tennis betting model. The early results are very compelling and there are more refinements that can be explored to identify further persistencies and trends in the data that may eliminate areas of weakness.
For example, it will be useful to consider results in the early part of the season versus the latter part. Think player tiredness; more injuries; players going through the motions etc. The higher ranked players have the motivation to finish strongly and make the lucrative tour finals whilst the lower ranked players seasons are effectively over (having likely played many more matches over a season, qualifying rounds etc)
The data set is also fully consistent across time, so analysis can be done by any number of variables, such as surface, the round of the tournament, player ranks and ranking differentials etc. One must be mindful of not over-fitting the data to find a desirable outcome, which is why these initial results (that take just a handful of high-level variables) are so encouraging.
The next steps are to forward-test the data with real money in a dedicated exchange account, away from other betting/trading activity. While Betfair is clearly the best for liquidity, pricing etc, Smarkets and others have lower commissions (all my results are based on net returns, after the 5% winning commission from Betfair Exchange). A few years ago Matchbook even offered zero per cent commissions for an entire tennis season!
The nature of these bets is to find moderate volume selections that can be set and forget bets each day, requiring little time. It is very quick to identify qualifying matches each morning and then it’s just a case of placing orders into the exchange at our required prices. There is no watching of matches, trading in-play etc.
There will be long periods of losing returns as there is with any betting strategy, but despite those we have observed, the data supports season over season ROI’s of 10-15% typically which is very compelling. Just as is the case with each way betting, if you can handle the variance, it might be something for you.
If any willing volunteers want to join me in a 3-month experiment putting this into practice please let me know and we’ll set something up. It will be important that you accept these selections in good faith, only put minimal money towards them and stick with the method, even when variance runs against you. Like each way betting, this is not something one can dip in or out of. You’d also have to record your bets diligently to allow for proper analysis.
You’ll observe that I’ve chosen not to reveal my underlying methodology. I feel that is completely my right, given I have spent a great number of hours testing data and hypotheses in reaching this point. I consider this my intellectual property. Of course, one might argue its worthless until proven in a forward test and I’m inclined to agree. But that said, I don’t feel I should just give it away either. Who knows, perhaps it’s something I could monetize down the line and finally offer a tennis betting product that is built on solid statistical fact rather than fantasy get-rich-quick aspirations.
I will document the journey periodically from here by way of progress reports. It will be interesting to see what comes of it. 🙂
I’d be keen to hear any thoughts you might have about what I’m doing here.
Until the next time 🙂
- Machine Learning for the Prediction of Professional Tennis Matches
- Predicting the Outcome of Tennis Matches From Point-by-Point Data
- Using Microsoft Excel to Model a Tennis Match
- Combining player statistics to predict outcomes of tennis matches