The Model takes on the World

 

Hi all. This is just a quick post following on from last weekend’s post, in which we detailed a statistically driven model for Brownlow prediction. In case you have missed it, the vote count is tomorrow and we have hope everyone is ready to roll. In our previous post we included one iteration of results from our 2016 prediction, which was run using the Random Forest machine learning method, and we got quite a few questions about how useful it was for betting and what its limitations were. The aim of this blog post is to highlight some of the significant differences between our model and bookies probabilities,  in this case Sportsbet, and perhaps how these differences could be used to find some value. Like last time, this is just our own number crunching and interpretative dribble so if you are going to use it to bet you do so at your own risk! For all you know I am drunk right now as I write this.

One of the key findings from our earlier blog was that the reliability in the model decreased the higher in the rankings you got. As fun as it is to bet on the top 10, and it’s still possible some value could be derived from the model, to use it properly we need to heed our own advice and concentrate on the less interesting lower down results. With that in mind I have chosen several of the “Team Voting” betting markets and also some “Will they poll more than last year?” markets to have a closer look at.

Note: All odds are Sportsbet unless otherwise stated and were correct at the time of writing (Sunday evening when I should be doing almost anything but this).

gwilt

Essendon legend James Gwilt has a small chance of getting a vote in round 4. Get around him.

Hawthorn most votes w/o Lewis and Mitchell.

Both our model, all the bookies, my Nan and the AFL and Phantom prediction sites give almost no chance for any other combination than Mitchell followed by Lewis for most votes at Hawthorn this season. For that reason, Sportsbet has a market without them in it. Cyril is favourite at $1.66 and Isaac Smith has odds of $3.25. Our algorithm predicts Cyril to get 5 votes and Isaac with 8, which obviously offers some potential value. We decided to drill down into the results and see if anything weird was going on. Sorry for the low quality of images, I didn’t have time to teach myself the html required to generate nice ones (the Brownlow is tomorrow after all).  We have coloured relevant rounds with green for agreement, yellow for votes predicted by us and not the AFL or Phantom, and red where the model hasn’t predicted on probable games.

The “Model” row is actually what the algorithm generates – the percentage chance of the player acquiring 3 votes (its important to note – this isn’t relative to other players in the game, its relative the global population and statistical signature of a “3 vote game”. Three players could get 100% in a game, or a player could be most likely to get 3 votes with a 40% modeled likelihood).

AFL HAWTHORN COLLINGWOOD

Cyril ($1.66):cyril2

  • Round 12: Hawthorn beat Essendon by over 100 points, so there are a lot of players to fit in, but Cyrils 20 disposals and 3 goals must have a chance here. He is special after all (Verdict: more likely 1 vote, possible none).
  • Round 20: Cyril was the best Hawk, but in a loss to Melbourne where Viney and Gawn starred it seems unlikely he will get any more than 1 vote. (Verdict: 1 at most)
  • Round 23: AFL predictor has him at 3 votes, noone on earth does. Our model gives him 5% chance of 3 votes, and Phantom gives him no love at all (Verdict: Bruce apparently did the voting for the AFL website – likely no votes).

Isaac ($3.25):isaac2

  • Round 11: This round is key if Smith is going to outvote Cyril as Rioli is in with a chance as well. The model gives him a 62% chance, which is strong but far from definite. Hawthorn beat Melbourne, but general consensus is that Dom Tyson was best on ground. Smith in “with a chance”, after 29 disposals and 108 DT points. (Verdict = Mitchell and Tyson more likely, but Smith a chance for 1 or 2).

Betting summary: According to our model, the chances of Cyril scoring more than 5 votes are slim. It is Cyril though, and he doesn’t need a lot of stats to look “special”. $3.25 seems pretty good odds for Isaac Smith to score 6-8 votes and outscore him.

St Kilda Most Votes:

Jack Steven is essentially a lock to get the most votes for the Saints this year, with a projected 20-21, including 15 guarantees from 5 games. Our model however strips him of a 3 and suggests Nick Riewoldt is a good chance to poll higher than projected. Will it be enough?

1466925122734

Nick’s likely vote count in round 20. 

Jack (1.04):jacksteven

  • Round 14: This round is KEY if Riewoldt is going to have any chance of getting more than Steven. The model is in almost total agreement with Phantom and AFL, with 5 clear best on grounds and a couple of 2 vote games. The big difference is round 14, a big upset by St Kilda to get over Geelong by 3 points. Seb Ross was clear best on ground, according to everyone but the AFL, however who gets the 1 and 2 votes seems a lot more contentious and it could be a raffle between Steven, Henderson and Riewoldt. Steven was relatively statistically quiet by his lofty 2016 standards, getting 25 disposals and a couple of tackles, which accounts for his relative low modeled chance (25%). Riewoldt got his trademark 26 touches, 10 marks but no goals. Verdict: Anyone’s guess, but critical.

Nick ($8.00):riewoldt

  • Round 2: The Saints got smashed by the Bulldogs in this game, but Riewoldt still managed to get 23 disposals, 13 marks and 2 goals. Verdict: If he is to get votes in this game, umpires will need to be giving a charity vote for his 300th, as he wasn’t his normal efficient best.
  • Round 20: Riewoldt had 26 disposals and 16 marks however he was not named in the best in the AFL website, and was only given 51% chance by our model. Verdict:  Seems unlikely , but maybe commentators are just used to 26 and 16 from him?

Betting summary: Riewoldt is 8-1 odds for most votes at Saints, which means Sportsbet has given him 12.5% chance of victory. Our model gives him up to 40% chance of victory, with a significant chance of a tie. For this to happen, round 14 is critical, and he will need to get well clear of the 15 points Stevens is all but guaranteed. Its seems unlikely, but if you are looking for some risk this one will be fun. Can we all give a round of applause for Saint Nick at age 34 as well?

Port Adelaide Most Votes:

Our model gives both Robbie Gray and Ollie Wines 17 votes – startlingly and worryingly different to the AFL, Phantom and the bookies (Sportsbet has Wines at $11). To quote Seth Eisenburg:

“If it looks like shit, smells like shit, and feels like shit, you don’t have to actually eat it to know it’s shit.”

Regardless, we are going to eat it!

Robbie Gray ($1.01)gray

  • Round 2: Our model actually gives Robbie Gray more votes than the other prediction methods, which makes the Wines prediction even more strange and unlikely. The main anomalous round is Port Adelaide’s loss to cross time rival Adelaide. Robbie Gray had 37 disposals and kicked a goal, but the Power went down by 60. Verdict: a chance of a vote, but Lynch, Betts, Jenkins and Laird good in a big win. This drops Gray to a likely 15-16 votes, but significant upside.

Ollie Wines ($11.00)wines

  • Round 11: Ollie Wines is not mentioned in the best in the Powers big win over Collingwood, but he did have 24 disposals, 14 of them contested (hence the 51% chance). Verdict: Maybe a sneaky for 1 vote, but unlikely for many more.
  • Round 15: Travis Boak seems a lock for the 3, although the model has Wines as more likely. Verdict: A likely 2 votes for Wines.

Betting summary: If you take into account the potential increase in Robbie Gray’s score on the AFL and phantom predictions, and the slight decrease in Wines after looking at other predictions, it actually does seem possible both players could score around the 15-16 vote mark. Robbie Gray still has to be the favourite; he has the runs on the board with votes and has less question marks, but at $11 dollars it may be worth a small flutter on the thunderous thighs of Ollie Wines.

ollie-wines-resize

Ollie likes those odds.

More than last year bets:

Tom Mitchell:

Mitchell is $2.50 to beat his total from last year of 12 votes. I didn’t realise until I went through his stats how much of a monster this guy is. If Hawthorn get him and J’OM, I quit.

mitchell

  • Round 16: 33 disposals and 6 tackles – definitely a chance of votes.
  • Round 7: A small chance of 1 vote but unlikely with Heeney and Lance Franklin obtaining 11 goals between them in a domination against Essendon. He did rack up 37 disposals though.
  • Round 20: 39 disposals, 9 marks and 7 tackles still might not be enough in this game.

Bet summary: Tom Mitchell is a statistical beast and the algorithm has rewarded him as such. However, when you dig down into the statistics, it appears that his team mates may damage his chances significantly. Its possible, but the odds aren’t good enough for us. Avoid.

850754-4c0d1ae0-df0f-11e3-9096-801fbb4e8d9b

Probably how he gets so many possessions. 

Scott Pendlebury:

Scott Pendlebury is one of the elite midfielders of the competition and has consistently picked up votes over his illustrious career. With injury and a switch to the HB flank, 2016 wasn’t his finest year and our gut feeling was that he would be unlikely to finish above his 15 votes from last year. Our model strongly agrees, predicting him to get 9, which is significantly different from the AFL and Phantom predictions  of 13 and 14 respectively. Sportsbet also agrees, giving odds of $1.65 for him not to make it. That being said, never turn your back on a champion, and especially don’t turn your back on someone from Collingwood.

pendles

  • Round 9: Round 9 is the issue here, with the model giving him a 56% chance of votes, but not putting him in the top 3. It seems the model may have underestimated Pendlebury’s influence as his 26 disposals and 3 goals seems likely to be enough to get him at least one vote (and the match reports agree).

Bet Summary: Like Cyril, Pendlebury’s statistics sometimes don’t reflect the influence he has a on game. That being said, the model backs up Sportsbet and suggests that the $1.65 on offer for him not to make 15 votes is fairly safe money. Maybe throw it into a funner multi for some extra value.

Dayne Zorko:

It’s fair to say Brisbane had a shit-house year, and as an Essendon supporter I pity yet still loathe them. This guy isn’t to blame though and it seemed sometimes he was the only player who cared. Sportsbet has him at $3 to not get his 2015 total of 5 votes, however, our model suggests he may be hard done by  and there may be some value.

zorko

  • Round 4: AFLPA gave him 2 votes in Brisbane’s win over Gold Coast, although he wasn’t mentioned in the other game ratings. Verdict: Possible for 1, but 2 unlikely.
  • Round 18: Zorko played well in a domination against Essendon, and was listed in the best, but seems unlikely to get the votes over Rockliff, Rich and Martin.

Bet Summary: The model says 6, but it doesn’t seem to be a solid 6. Avoid, or go for the tie (which at $4.50 may actually be a solid bet).

summary

Summary of votes for mentioned players. These results were taken over 100 iterations. 

Running out of time, so that is all for now. Good luck tomorrow, if anyone uses these in their decision making and it works, let us know! If all of these predictions are wildly wrong, I will be deleting this blog so good luck finding us.

Notes:

Sportbet BYO: We came across this late, but Sportsbet offer custom made bets. You simply send them details of your bet and they work it up for you. It’s probably too late, but using a system like this for a model like ours may have some benefit.

3 Votes: J…..Gwilt (Ess)

 

Predicting the 2016 Brownlow

 

Image result for guess who image

Guess who!

Named after Charles Brownlow, the Brownlow medal is awarded to the player judged the “best and fairest” over the duration of an AFL football home and away season.  The Brownlow is presented on the Monday evening before the grand final and is the crowning social event on the AFL calendar. Betting on the Brownlow is a great Australian pastime, perhaps not of the same magnitude as the Melbourne Cup, but definitely with a similarly high percentage of punters losing money on frivolous betting.

fevolaBrendon Fevola has a long history at the Brownlow Medal count, and currently, hands out gambling advice for OddsChecker

The aim of this experiment was to try and use statistics to predict the Brownlow medal. In the process we would be taking out the human component in betting – the part of you which always wants to have a flutter on Sam Mitchell or Kepler Bradley – just in case. We will try and keep it relatively maths free, with most of the theory provided with links to well-explained pages. If you are really impatient, you can skip all the way to the bottom and look at the predictions.

Human bias is almost impossible to avoid, in particular in relation to gambling. There is a reason why running a book is an ancient and successful way of making money.  See here for a good summary of why humans typically suck at gambling. The Brownlow medal is no exception, in fact, it’s likely worse due to factors such as:

  • The wide field of contenders (Roughly 300 players per week, 9 games, 23 rounds = A lot of data)
  • The long period of time between round 1 and Brownlow night: people need to summarize half a year into what is essentially a complex probability equation. It will invariably be skewed towards the second half of the season and even finals (which obviously don’t count).
  • The inherent reliance on the rationality of umpires. We don’t need to go far past the James Hird debacle of 2004 for evidence that the system can be flawed. Hird had publicly criticized umpire Scott McClaren during the week, leading to fines and public shaming. He went on to have one of the best games of his career against the West Coast Eagles that weekend, picking up 33 disposals, including 14 in the last quarter along with three goals (You probably remember this). He received no votes for the game and umpires confirmed to no one that they hold grudges.

 

How the Brownlow Voting works:

The process for distribution of votes for the Brownlow is a simple and time honoured decision. Voting is carried out by field umpires immediately after games, with 3, 2 and 1 votes distributed after consensus is reached. No statistics are used in the process which has been carried out in this way since 1930 (with a small hiatus in 1976-1978 where two field umpires both voted).

woewodin

Shane Woewodin was the recipient of the 2000 Brownlow medal.

To the ire of some media commentators, the Brownlow medal has been historically dominated by midfielders. Glenn Mitchell for the Roar concisely summarized and discussed the breakdown here.  The medal has been presented in 85 seasons and, accounting for ties, a total of 98 medals have been awarded. Of these, 61 have been won by mid-fielders (centremen, wingmen, ruck-rovers and rovers), and another 19 by ruckmen. This leaves 18 medals to be shared by forwards and defenders (who make up 66% of the players on the field at any given time).  In recent years, midfielders’ domination of the medal has increased, with only one non-midfielder winning the award since 1996 (Adam Goodes, playing predominately in the ruck in 2003. Goodes, however, went on to prove he was no ordinary ruckman when he won the award again in 2006 playing mostly as a wingman).  In 2015, only 2 non-centre players bucked the trend to finish in the top 20; Todd Goldstein at 10 and Aaron Sandilands at 15. The highest ranked forward was Jeremy Cameron from the GWS giants with 12 votes (19 behind medal winner Nat Fyfe).

While the playing position of the Brownlow winner is relatively predictable, accurate prediction of the eventual vote count is a much more difficult proposition. There is little publicly available literature on prediction of Australian sports in general, much less so the Brownlow medal.  The official AFL website runs an Brownlow Predictor, updated weekly throughout the year and the guys at Phantom run a great blog dedicated to this very question. The AFL website doesn’t reveal their prediction mechanisms, and Phantom uses the average results of “Expert Voting” for theirs. Only two researchers (that we can find) have publicly investigated the prediction of the Brownlow using statistical methods – Michael Bailey in 2005, and Robert Nguyen in 2014 (Unpublished, but links to his work in the media here and here).

bailey-prediction

A snapshot of Michael Bailey’s predictions from 2000 and 2001. (He got Shane Woewodin drastically wrong, but we can’t hold that against him)

As part of his thesis for Swinburne University in 2005, Michael investigated the potential for predicting the Brownlow using an ordinal logistical regression model and the standard suite of statistics that champion data doesn’t hog (and obtain themselves, in fairness). An ordinal logistical regression is a relatively simple prediction technique, summarised very elegantly here. Bailey essentially identified the optimal combination of statistical variables for predicting the likelihood of a player receiving 3, 2 or 1 votes in a match.

There was a lot of work that went into Michael’s model and anyone interested in this field should definitely read the entire thesis. Three important and interesting points to come out of it were:

  • Players with distinctive appearances were twice as likely to vote as a “non-descript” person (0.24 votes per game instead of 0.12 votes per game). For his thesis, Michael described distinctive appearance as any player with red or blond hair, or with significantly darker or lighter skin. Had he done his thesis more recently, he would have definitely included shaved heads, heavy tattooing and what I call “Mitch Robinson Head”.
  • It is statistically difficult to predict the top ten’s vote count within a reasonable accuracy. However, for the remainder of players, the problem becomes much more reliable. To quote Michael: “By accurately predicting 66% of players to within one vote of their actual total, and 90% of players to within three votes of their total, the modelling process provides an objective assignment of probabilities that has many benefits.”
  • The prior 3-4 years is the optimal training period, providing the most accurate prediction in any given year. If trained for longer, the performance of the model actually decreased. This is likely due to numerous factors:  changes in game style, media attention or even implicit directions by the AFL to punish certain football clubs for at best circumstantial offenses*.

In general, the approach we used was similar to Michael Bailey’s previous methods. An analysis of variable importance was carried out, as well as experiments into the ideal length of training data (how many seasons to use). We did not use the distinctive features variable as it seems that it might add some subjective bias, and was also beyond the limits of our lazy meter. We incorporated the use of Dream Team scores as a variable, which applied a scaling of importance to particular variables (6 points for a goal, 4 for a tackle and so on). Dream Team score actually ended up being the most important variable, which would come as no surprise to those who are avid fantasy football coaches.

dreamteam_probability_histogram

An equalised histogram showing the likelihood of scoring a Brownlow Vote relative to Dream Team Score. The blue bars represent a count of players who scored votes, and the red players who did not. The points with error bars show the likelihood of a vote or a non-vote occurring. You will note that at around 125 Dream Team points the chance of you scoring votes goes above 50% – i.e. you are more likely than not. 

Variable_Importance.png

The above image shows the relative importance of each input variable for predicting Brownlow Votes. Unsurprisingly, Dream Team score is the strongest predictor and how many free kicks against a player gets is of no influence. Interestingly how many free kicks for a player gets is more important than contested marks – Hello Joel Selwood. 

While a logistical regression is a fairly simple technique, Michael Bailey still managed to predict Brownlow votes to a reasonable accuracy, particularly for lower vote-getters  We decided to trial the use of some more complex prediction algorithms to determine whether we could at least emulate Bailey’s work, and potentially improve it. Using the common machine learning technique, Random Forest, and publicly available statistics data (1,2) we attempted to predict the 2016 Brownlow Medal.

A very brief summary of the theory:A Random Forest is basically a more complex, iterative version of the easier to understand decision tree. A decision tree works out a set of “questions” to ask the data, to determine the optimal way to split the training data into categories – in this case, “No Votes” or “Votes”. The example below is an extremely simple version of a decision tree – the Random Forest would do thousands of these using different subsets of training data and variables to calculate the most robust prediction model possible. Once a set of “rules” has been established, test data (i.e. 2016 data) can be fed into the algorithm. See here and here for more detailed and excellent descriptions of how a Random Forest works.

decision-tree-simple-1

An example (and potentially over-simplified) decision tree showing the underlying logic behind the Random Forest Algorithm.

The algorithm was tested over several years to ensure it was predicting within an acceptable range. In general, our findings were consistent with Bailey in that the algorithm was much more accurate in the lower ranked players than the top 10-20. We have included the results of that in a link at the bottom, but for interest’s sake, here is the top 20 from last year:

2015_Prediction.png

2015 Modeled Vote Distributions versus Actual Results (Top 20 Players). For an example of how to read a Box Plot, click here.

 Comments on 2015:

  • The model picked Nat Fyfe and Matt Priddis to be going head to head at the end, which turned out to be correct. Fyfe had a tighter range of possible scores than Priddis, but Priddis had a higher average.
  • Lachie Neale is apparently statistically a Brownlow gun, but the umpires haven’t realised yet. His differential of 13 votes is the biggest difference between predicted and modelled in the entire prediction.
  • Most bookies had Scott Pendlebury to poll more votes than Dane Swan. If the model had been followed, there was money to be made there. In addition, the model nailed Pendlebury’s predicted votes.
  • The Model correctly identified that Zak Dawson would not poll a vote.

We were fairly happy with this result, so we included last year’s data in the training algorithm and ran the numbers on 2016 statistics. The expected number one vote-getter should surprise no one – Patrick Dangerfield finished on top in no less than 100% of the iterations, by a minimum of 7 votes. For this model to be correct, Dangerfield would need to obtain the most votes ever by a player in a season, so there is room for skepticism, however what is clear is that he has had a statistically superior year over all his peers.

Without further ado, we humbly put our 2016 Brownlow Medal Top 20 predictions in the public domain:

2016.png

2016 Modeled Vote Distributions (Top 20 Players).

Comments on 2016 (More to come as the day approaches):

  • Patrick Dangerfield has had an amazing year. Put your house on it. Put your neighbours house on it. You might only make 30 bucks profit, but it’s safe money**.
  • Sam Mitchell and Joel Selwood were (at the time of writing), both 11 dollars to win the Brownlow in the Danger free market at Sportsbet. Sam Mitchell seems to have the most upside of the two, particularly when you take into account the Danger effect.
  • It is worth reiterating, that the value in using an algorithm like this is in the players who score between 10 and 20 votes. The error in the model drops significantly after the top 10 to 15 players. With this in mind, markets such as “Most at the club” or “Will this player improve on their total from last year” are likely to provide the best value.

We will report back shortly after the Brownlow to either bathe in glory, or eat humble pie. If anyone picks up any errors, has any suggestions or questions, wants different data or just to chat about methods please hit us up. The main reason behind creating this blog was to engage different people in the community and develop a dialogue in this niche area of interest. Also, none of our friends will listen to us anymore and we need a new audience.

Detailed results can be found here:

*Allegedly

** Do NOT make any bets using this information that you aren’t prepared to lose. This is a summary of an algorithm and the results – not gambling advice. D.Y.O.R.

Notes:

  • We use oddschecker.com.au for our up to date betting odds over multiple agencies.
  • Interestingly, its very difficult to find historical betting odds. If anyone has a source that would be excellent.
  • The results shown in the box plots are over 100 iterations of the entire process – the Random Forest algorithm itself has its own iterative improvement process.
  • We also applied measures to limit the stratification problem of the input data containing many 0 vote games, and significantly less vote scoring games. The models were run iteratively to give a potential range of votes a player might receive, which at a practical level gives a confidence measure of the player’s likelihood of scoring a particular number of votes. We also switched our model to only predict the likelihood of a player scoring a three vote game (3-2-1 was not included). This removes the diluting effect of 1 and 2 vote games and gave a statistically more robust method in cross-validation.