Head 2 Head Bets (Sportsbet)

We used AutoZerrett v1.4 to calculate some probabilities and compare them against the Sportsbet H2H markets:.  DYOR, but there appears to be some value in some of the battles. As usual, beware of ruckmen, small forwards, defenders and Lachie Neale.

H2H_1

H2H1Gamble Responsibly, this is just a model, don’t blame us, yah di yah da.

New AutoZerrett Version Released

Hi all.  It came to our attention over Friday beers yesterday that players in crappy teams still seemed to be polling quite well. When we investigated this we noticed a bug in the code which was stuffing up the score differences. Essentially, if Essendon had beaten Carlton by 100, the score difference column was +100 for all players in the game, and not -100 for Carlton Players. We fixed this and re-ran the model and for some players it has made a significant difference. For the big three, not so much, but several players in the top 25 have shifted significantly. Apologies for not picking up on this earlier – the last model wasn’t “wrong”, as the score difference variable did nothing, it just wasn’t as good as it should have been. We think this one looks better, and hopefully people hadn’t made any bets on Patrick Cripps and/or Dayne Beams as they probably look shaky now.

variation_between_models.png

The above graph is players who predicted average votes shifted by 3 or more. Red numbers are the earlier version of the model, and blue is the new one. As you can see the hardest hits were players from the bottom of the ladder. Seb Ross has made a mockery of our earlier blog post and is now predicted to get an average of 19 votes. Bryce Gibbs seems too low, but we will investigate that. All in all, its a significant change, so have a good look!

The new pdf is available here: FatStats AutoZerrett 2017 Predictions V1.4

If this is your first time, the original 2017 predictions post is here and last years more thorough description of the process is here. 
Please let us know if you spot any more bugs, its more than likely they exist.

“There is no way Seb Ross is going to get 25 votes” – Most People.

The PDF has been updated to fix a small error in the round by rounds, download here: FatStats AutoZerrett 2017 Predictions V1.3

800px-Sebastian_Ross_2017.2

 

Seb Ross has been the subject of a lot of online debate since the AutoZerrett results were released on Monday.  The St Kilda player has had a break out year, with many tipping the midfielder to take home the Trevor Barker award ahead of last year’s winner Jack Steven. In their summary of St Kilda’s year, Zero Hanger had this to say about Ross:

 

The 24-year-old broke out in 2016 and continued his excellent form this season, playing all 22 games for the second year in a row.

Ross averaged 30 disposals per game this season along with four marks, four tackles and four inside 50s. Impressively, Ross ranked 9th in total effective disposals which speaks volumes of how good his season really was.

He also ranked 9th in the league for total disposals with 657, the second-most of anyone that finished outside the top eight. Should he be able to continue his strong season into 2018, it’ll hold the Saints in good stead going forward”

Not only is Ross an effective and prolific player with ball in hand, he gets his own ball as well, averaging just under 10 contested disposals and 5 clearances per game – traits that are diagnostic of players who poll well on Brownlow day. He played in every game, and St Kilda won 11 games which should be enough to generate a significant amount of votes. Why then, is the common consensus that he will struggle to get over 15 votes (Sportsbet his line at 14.5 votes for reference)? We dig into the AZ results and compare them to AFL Match reports and SEN’s inside footy ratings and comments to have a look.

Seb Ross

To summarise the above table, its possible for Seb Ross to get 25 votes, but unlikely. There appears to be two key games where AZ may have over predicted – round 15 v Fremantle and round 19 versus Port Adelaide (the infamous Ryder to Gray to match winning goal game which will surely stick in the Umpire’s mind). That being said, there are several games where it’s possible AZ underestimated Seb Ross as well – particularly rounds 7 and 22.  Seb has had a great , and consistent year and a vote range for Seb of around 20-23 seems to not be beyond him

Seb Ross Probability.PNGSeb Ross had an amazingly consistent season with only 7 down games – he is a chance in the other 15.

Based on the above, we have included a few possible Sportsbet markets where this information may come in handy. As usual, gamble responsibly, and its just a model  – we take no blame if its wrong:

Simple over and unders: 

SebRoss OU

Seb Ross gets more than Trent Cotchin in every model we ran:

SebRoss_v_Cotchin

Also, some risky risky single game bets (see above table for details):

Round 9 v Sydney:

SebRoss_Round9

Round 12 v Adelaide:

SebRoss_Round12

Round 19 v Port Adelaide: 

SebRoss_Round19

Happy punting!

2017 Brownlow (AutoZerrett)

Update: Version 1.4 (All early versions are redundant):FatStats AutoZerrett 2017 Predictions V1.4

Its Brownlow time of year again, and the newly re-badged AutoZerrett Brownlow tipping algorithm we published last year is back and it is new and improved (hopefully!). To save going through the basics of how it works again, there is a fair bit of detail  in the post here.  AutoZerrett is powered by a machine learning algorithm called a Random Forest, which uses player’s statistics for each game to give them a probability from 0-100% of getting 3 Brownlow votes from the umpires. The three top highest probabilities for each games are awarded the synthetic votes. 6 years of data is used to train the algorithm and only 3 vote games are used as the target variable (1 and 2 vote games are discarded due to the mixed statistical signal that can give them i.e. umpires often give two votes to the best player on the ground from the losing team). There are pros and cons to the discarding of 1 and 2 vote games, but our testing showed that the predictions worked better without them (although Tom Mitchell or “Two Vote Tom” as we call him, may prove us wrong this year).

Lord Zach

Lord Zerrett and his blind assistant ponders another top 10 finish under age 22

AutoZerrett performed reasonably well last year, but there was several areas we wanted to improve on. The main one was the lack of ability to recognise a 3 vote game from the games more enigmatic players, that is, Cyril Rioli, Buddy Franklin, ruckmen and Alex Rance. There are two main reasons behind this: 1. The readily available statistics do not capture everything, in particular relation to things like pressure acts, spoils, high skill efforts, eye catching marks and other non-tangible events that makes AFL what it is.   The second point,  arguably the more important, is that there just isn’t that many of them – the bulk majority of 3 vote games (~60%)  go to midfielders, and in the previous 6 years those votes have been distributed to inside midfielders more often than not.

Position Actual Actual (%) After SMOTE After Smote (%)
Defender 88 7% 356 12%
Forward 254 19% 653 23%
General 157 12% 426 15%
Inside Mid 518 38% 611 21%
Outside Mid 267 20% 312 11%
Ruck 68 5% 508 18%
Total 1352 100% 2866 100%

What this means practically in machine learning is that we have class imbalance, or rare events, which we are trying to model. The first and obvious one is that we are trying to predict the 3 vote game, which for a particular game is only awarded to one out of 44 players playing (2.2%). The second is that even though a ruckman or a low possession forward with 15 pressure acts is not the “normal” way to get three votes, it doesn’t mean the umpires aren’t going to reward it. We dealt with the first issue last year using an cross-fold sampling scheme that used a different random sample of non 3-vote scoring games each iteration. Essentially we just changed the background data to cover a number of different statistical scenarios. This year we attempted to go one step further and improve the accuracy by creating artificial 3 vote games for low likelihood classes, essentially defenders, forwards and ruckmen, using a process called SMOTE. As the above table shows we created approximately 1500 data points using the existing statistical signatures to try and level the playing field for minority positions. To get an idea of whether this worked, we re-run last year the traditional way and the new way and investigate the error.

Position Average Votes error per player (traditional) Average Votes error per player (SMOTE) Change in model (votes) Change in accuracy due to SMOTE Actual Vote Percentage
Defender -1.28 -1.07 0.21 16% 9%
Forward -1.38 -0.83 0.55 40% 24%
General -0.57 -0.54 0.02 4% 3%
Inside Midfield 1.67 0.90 -0.77 46% 49%
Outside Midfield 0.29 -0.13 -0.41 56% 10%
Ruck -3.10 -1.69 1.41 46% 4%

SMOTE worked quite remarkably well, improving error for every single position. The above table shows from left to right for each position type the average error per player using the traditional methods and the new SMOTE method. As you can see, last year’s model was overestimating inside midfielders by 1.67 votes per person, and under estimating rucks by 3 votes. The SMOTE predictions improved for all of the positions, albeit only a small improvement for the bucket group of “general” (not just Jon Patton). There is still work to be done on defenders, but as the last column shows they only make up 9% of the final votes so it doesn’t make too much difference unless you’re a Ranceophile.

One big difference in the way the results look this year is that last year we released one iteration of the model – the one we deemed the best looking (not very scientific really, but it worked). This year we are releasing the full spread of 100 models. There is likely models in the data that are more conservative, and ones that over-estimate, but the hope is that the averaged response gives a good indication of likelihood. Dustin Martin’s modelled votes are included here to show the variation between models:

Pasted image at 2017_09_18 08_20 AM

As you can see, Dustin is project to get on average 43 odd votes – which is huge – but there are some models where he gets below 40 and even one where he gets 50 (massive outlier). Our gut feel is that these numbers are over estimating, and a conservative approach should be taken however we will wait and see.

Dusty

Dustin questions the 3 models which had him under 40 votes. 

RESULTS

This year, we have put an emphasis on generating usable outputs and visualisations. Our 2017 Brownlow report breaks down predictions into round by round, team by team and overall ranking which should be a bit easier to use than the table format last year.

So without further ado,  winner and the remaining 49 of the top 50 predicted vote getters for 2017 is…

 

AZ_Final_1_50_Rankings.PNG

We will comment more on the predictions throughout the week, but a couple of key points that jump straight out:

  • Dustin Martin has had a HUGE YEAR. Its unlikely he will get 43 votes, but he is every chance to break the record set by Dangerfield last year. Round by round, he was rarely not in the mix and no-one else at Richmond is a vote stealer (this applies to, you Joel Selwood).
  • The model also predicts massive numbers for Patrick Dangerfield and Tom Mitchell. Personally, I can see ways for both of those two to be mathematically over-inflated – Dangerfield has a large amount of 3 point games in the training data – the model loves a “Danger” style game and like the umpires in the Sydney match, is probably susceptible to bias. Tom Mitchell may be the prime example of a player who even when he was the best player on the ground statistically, is only going to get 2 votes due to being on the losing team (hence, two-vote Tom). He also isn’t very impressive to anyone but other inside midfielders. It will be interesting to see how that plays out but its worth keeping in mind.
  • The numbers seem high overall.. Three players  are predicted to score over the Brownlow record set last year. It seems unlikely that the magnitudes predicted here are going to be achieved, but the overall trends seem solid. Further investigation neeeded.
  • Josh Kelly is further down the rankings than expected. We have watched a lot of Essendons next great recruit this year and he is a Gun (Editors note: We just saw that he signed with GWS and we would like to clarify that he is not that great and AZ is being generous with 19 votes). There could be several reasons for him to be predicted lower on the rankings  than what most pundits have (19 votes is not low overall however). The main statistical reason is that there hasn’t been a lot of players like him winning votes for a while, the last one was probably Chris Judd. In the six years used as training data, Brownlow votes have tended to go to midfield bulls of the Danger, Fyfe, Priddis and Watson types, not necessarily the silky Rolls Royces. On the flip side, its a common occurrence in Brownlow vote allocation for players to have to “pay their dues”, and their performance is often not recognised by the umpires until the year after.
  • The model has absolutely no idea how to judge Clayton Oliver. He has by far the biggest range of any player. We will investigate this during the week and maybe put out a little post on why that might be.

Again, the results are included in a downloadable PDF in full here:FatStats AutoZerrett 2017 Predictions V1.4 Please hit us up with any questions or observations (especially if we have stuffed anything up). We are definitely interested if people use this data to have a punt, however as usual remember that its at your own risk and FatStats takes no responsibility. That being said, we will be going through the data ourselves for punting purposes and we plan on releasing bits and pieces during the week so follow along if you’re interested. Enjoy!!

 

Clayton is as unpredictable on twitter as he is for the brownlow medal. 

Notes: All data is sourced from http://www.afltables.com.au. All images are from wikimedia.creative commons and Clayton’s tweet is Clayton’s tweet. All processing was carried out using R Studio which is open source and great.

Round 23 Predictions

So we promised a description of AutoTippa’s bones this week, but instead we decided to let the season finish and allow some digital and mental OCD completedness in our analysis. Instead, we are just releasing our predictions for the final round of the year!

Rplot02

The black line is the odds that the bookies have given this team, so an example of how to interpret this would be that the bookies predict Geelong would win this game roughly 53% of the time, while AutoTippa says GWS would win roughly 58% of the time. If you are that way inclined, these differences in modelled versus offered is how people attempt to make money off gambling.

Notes on this week:

GWS and Hawthorn differ from the bookies, Hawthorn especially. Both of these tips pass the gut test for me, especially Hawthorn. Whether Hawthorn should be that far favourite is another question, I predict the game will be close.

In three games, AT predicts it to be closer than the bookies suggest – Essendon, Port Adelaide and Sydney. Adelaide on the other hand is expected to beat West Coast by a lot more than the bookies think.

Its going to be a long off season!! We will continue to attempt to predict the finals, and as previously mentioned release a few longer blog posts over the next month. Go bombers.

07768c8950719f8d08a09d9a38e94731

Jobe Watson is back for one of his last games in the black and red, and his inclusion amongst others gives Essendon 77% of their best 22 playing as opposed to 66% last week. Fremantle also improve from 54% to 72% with the return of Lachie Neale et al. 

Fat Stats is BACK

 

After our semi-successful Brownlow predictions last year, we promised more regular blog posts for 2017. Turns out we lied to everyone and released nothing between then and now. Unfortunately, and more boringly, our focus been on starting our new company which predicts thing in the mining world and not the sports world. However, things are starting to settle down now and we actually have two big posts coming in the coming month.

mcdonald20tipungwuti20anthony

Firstly, next week, we will be launching our predictive tipping model AutoTippa, short for AutoAnthonyMcDonald-Tipungwuti.  AutoTippa has been live and spitting out predictions on the hour (sometimes) since Round 8 to a small beta testing group of unreliable gentleman. As the year has gone on its performance has improved to the point where at the time of writing it has correctly tipped 120 out of 180 games – an average of 66% success rate. In a normal year this would be relatively poor, however 2017 is no ordinary year – 120 correct tips place AutoTippa in the top 3% on ESPN’s footytips.com.au and it would currently be sitting equal top on the Squiggle leaderboard.  (Sidenote: anyone interested in tipping, predictive models or just footy should check out squiggle. Its a combination of a heap of different publicly available predictive models and they release some really interesting blog posts. Follow them all!!) Next week’s blog post will break down the machine that is AutoTippa, look into where it has struggled, where is has dominated and map out some plans for 2018.

Sneak Preview: Round 22 Predictions. The further the bar is away from the line, the more confident that AutoTippa is in that team winning. 

r22

Secondly, our Brownlow medal prediction algorithm, now named AutoZerrett, will be running for its 2nd year. The model has been improved (hopefully) and taken into account the learnings and feedback of 2016 as well as the additional of some now publicly available statistics. Similarly to last year, we will release the results of the model and also drill down into them in Brownlow week to see if there is any discrepancies punters can take advantage of.

If anyone would like to sign up for AutoTippa or AutoZerrett updates over the next month please send an email to fatstatspredictions@gmail.com. Feedback, no matter how critical, is also welcomed. We do prefer compliment sandwiches however.

Disclaimer: If you use these models for gambling or other financial decision making, its at your own risk.  You shouldn’t take the word of people on the internet.

All data used in models is from the excellent websites http://www.footywire.com.au and http://www.afltables.com.au and all processing is done in Rstudio.

The Model takes on the World

 

Hi all. This is just a quick post following on from last weekend’s post, in which we detailed a statistically driven model for Brownlow prediction. In case you have missed it, the vote count is tomorrow and we have hope everyone is ready to roll. In our previous post we included one iteration of results from our 2016 prediction, which was run using the Random Forest machine learning method, and we got quite a few questions about how useful it was for betting and what its limitations were. The aim of this blog post is to highlight some of the significant differences between our model and bookies probabilities,  in this case Sportsbet, and perhaps how these differences could be used to find some value. Like last time, this is just our own number crunching and interpretative dribble so if you are going to use it to bet you do so at your own risk! For all you know I am drunk right now as I write this.

One of the key findings from our earlier blog was that the reliability in the model decreased the higher in the rankings you got. As fun as it is to bet on the top 10, and it’s still possible some value could be derived from the model, to use it properly we need to heed our own advice and concentrate on the less interesting lower down results. With that in mind I have chosen several of the “Team Voting” betting markets and also some “Will they poll more than last year?” markets to have a closer look at.

Note: All odds are Sportsbet unless otherwise stated and were correct at the time of writing (Sunday evening when I should be doing almost anything but this).

gwilt

Essendon legend James Gwilt has a small chance of getting a vote in round 4. Get around him.

Hawthorn most votes w/o Lewis and Mitchell.

Both our model, all the bookies, my Nan and the AFL and Phantom prediction sites give almost no chance for any other combination than Mitchell followed by Lewis for most votes at Hawthorn this season. For that reason, Sportsbet has a market without them in it. Cyril is favourite at $1.66 and Isaac Smith has odds of $3.25. Our algorithm predicts Cyril to get 5 votes and Isaac with 8, which obviously offers some potential value. We decided to drill down into the results and see if anything weird was going on. Sorry for the low quality of images, I didn’t have time to teach myself the html required to generate nice ones (the Brownlow is tomorrow after all).  We have coloured relevant rounds with green for agreement, yellow for votes predicted by us and not the AFL or Phantom, and red where the model hasn’t predicted on probable games.

The “Model” row is actually what the algorithm generates – the percentage chance of the player acquiring 3 votes (its important to note – this isn’t relative to other players in the game, its relative the global population and statistical signature of a “3 vote game”. Three players could get 100% in a game, or a player could be most likely to get 3 votes with a 40% modeled likelihood).

AFL HAWTHORN COLLINGWOOD

Cyril ($1.66):cyril2

  • Round 12: Hawthorn beat Essendon by over 100 points, so there are a lot of players to fit in, but Cyrils 20 disposals and 3 goals must have a chance here. He is special after all (Verdict: more likely 1 vote, possible none).
  • Round 20: Cyril was the best Hawk, but in a loss to Melbourne where Viney and Gawn starred it seems unlikely he will get any more than 1 vote. (Verdict: 1 at most)
  • Round 23: AFL predictor has him at 3 votes, noone on earth does. Our model gives him 5% chance of 3 votes, and Phantom gives him no love at all (Verdict: Bruce apparently did the voting for the AFL website – likely no votes).

Isaac ($3.25):isaac2

  • Round 11: This round is key if Smith is going to outvote Cyril as Rioli is in with a chance as well. The model gives him a 62% chance, which is strong but far from definite. Hawthorn beat Melbourne, but general consensus is that Dom Tyson was best on ground. Smith in “with a chance”, after 29 disposals and 108 DT points. (Verdict = Mitchell and Tyson more likely, but Smith a chance for 1 or 2).

Betting summary: According to our model, the chances of Cyril scoring more than 5 votes are slim. It is Cyril though, and he doesn’t need a lot of stats to look “special”. $3.25 seems pretty good odds for Isaac Smith to score 6-8 votes and outscore him.

St Kilda Most Votes:

Jack Steven is essentially a lock to get the most votes for the Saints this year, with a projected 20-21, including 15 guarantees from 5 games. Our model however strips him of a 3 and suggests Nick Riewoldt is a good chance to poll higher than projected. Will it be enough?

1466925122734

Nick’s likely vote count in round 20. 

Jack (1.04):jacksteven

  • Round 14: This round is KEY if Riewoldt is going to have any chance of getting more than Steven. The model is in almost total agreement with Phantom and AFL, with 5 clear best on grounds and a couple of 2 vote games. The big difference is round 14, a big upset by St Kilda to get over Geelong by 3 points. Seb Ross was clear best on ground, according to everyone but the AFL, however who gets the 1 and 2 votes seems a lot more contentious and it could be a raffle between Steven, Henderson and Riewoldt. Steven was relatively statistically quiet by his lofty 2016 standards, getting 25 disposals and a couple of tackles, which accounts for his relative low modeled chance (25%). Riewoldt got his trademark 26 touches, 10 marks but no goals. Verdict: Anyone’s guess, but critical.

Nick ($8.00):riewoldt

  • Round 2: The Saints got smashed by the Bulldogs in this game, but Riewoldt still managed to get 23 disposals, 13 marks and 2 goals. Verdict: If he is to get votes in this game, umpires will need to be giving a charity vote for his 300th, as he wasn’t his normal efficient best.
  • Round 20: Riewoldt had 26 disposals and 16 marks however he was not named in the best in the AFL website, and was only given 51% chance by our model. Verdict:  Seems unlikely , but maybe commentators are just used to 26 and 16 from him?

Betting summary: Riewoldt is 8-1 odds for most votes at Saints, which means Sportsbet has given him 12.5% chance of victory. Our model gives him up to 40% chance of victory, with a significant chance of a tie. For this to happen, round 14 is critical, and he will need to get well clear of the 15 points Stevens is all but guaranteed. Its seems unlikely, but if you are looking for some risk this one will be fun. Can we all give a round of applause for Saint Nick at age 34 as well?

Port Adelaide Most Votes:

Our model gives both Robbie Gray and Ollie Wines 17 votes – startlingly and worryingly different to the AFL, Phantom and the bookies (Sportsbet has Wines at $11). To quote Seth Eisenburg:

“If it looks like shit, smells like shit, and feels like shit, you don’t have to actually eat it to know it’s shit.”

Regardless, we are going to eat it!

Robbie Gray ($1.01)gray

  • Round 2: Our model actually gives Robbie Gray more votes than the other prediction methods, which makes the Wines prediction even more strange and unlikely. The main anomalous round is Port Adelaide’s loss to cross time rival Adelaide. Robbie Gray had 37 disposals and kicked a goal, but the Power went down by 60. Verdict: a chance of a vote, but Lynch, Betts, Jenkins and Laird good in a big win. This drops Gray to a likely 15-16 votes, but significant upside.

Ollie Wines ($11.00)wines

  • Round 11: Ollie Wines is not mentioned in the best in the Powers big win over Collingwood, but he did have 24 disposals, 14 of them contested (hence the 51% chance). Verdict: Maybe a sneaky for 1 vote, but unlikely for many more.
  • Round 15: Travis Boak seems a lock for the 3, although the model has Wines as more likely. Verdict: A likely 2 votes for Wines.

Betting summary: If you take into account the potential increase in Robbie Gray’s score on the AFL and phantom predictions, and the slight decrease in Wines after looking at other predictions, it actually does seem possible both players could score around the 15-16 vote mark. Robbie Gray still has to be the favourite; he has the runs on the board with votes and has less question marks, but at $11 dollars it may be worth a small flutter on the thunderous thighs of Ollie Wines.

ollie-wines-resize

Ollie likes those odds.

More than last year bets:

Tom Mitchell:

Mitchell is $2.50 to beat his total from last year of 12 votes. I didn’t realise until I went through his stats how much of a monster this guy is. If Hawthorn get him and J’OM, I quit.

mitchell

  • Round 16: 33 disposals and 6 tackles – definitely a chance of votes.
  • Round 7: A small chance of 1 vote but unlikely with Heeney and Lance Franklin obtaining 11 goals between them in a domination against Essendon. He did rack up 37 disposals though.
  • Round 20: 39 disposals, 9 marks and 7 tackles still might not be enough in this game.

Bet summary: Tom Mitchell is a statistical beast and the algorithm has rewarded him as such. However, when you dig down into the statistics, it appears that his team mates may damage his chances significantly. Its possible, but the odds aren’t good enough for us. Avoid.

850754-4c0d1ae0-df0f-11e3-9096-801fbb4e8d9b

Probably how he gets so many possessions. 

Scott Pendlebury:

Scott Pendlebury is one of the elite midfielders of the competition and has consistently picked up votes over his illustrious career. With injury and a switch to the HB flank, 2016 wasn’t his finest year and our gut feeling was that he would be unlikely to finish above his 15 votes from last year. Our model strongly agrees, predicting him to get 9, which is significantly different from the AFL and Phantom predictions  of 13 and 14 respectively. Sportsbet also agrees, giving odds of $1.65 for him not to make it. That being said, never turn your back on a champion, and especially don’t turn your back on someone from Collingwood.

pendles

  • Round 9: Round 9 is the issue here, with the model giving him a 56% chance of votes, but not putting him in the top 3. It seems the model may have underestimated Pendlebury’s influence as his 26 disposals and 3 goals seems likely to be enough to get him at least one vote (and the match reports agree).

Bet Summary: Like Cyril, Pendlebury’s statistics sometimes don’t reflect the influence he has a on game. That being said, the model backs up Sportsbet and suggests that the $1.65 on offer for him not to make 15 votes is fairly safe money. Maybe throw it into a funner multi for some extra value.

Dayne Zorko:

It’s fair to say Brisbane had a shit-house year, and as an Essendon supporter I pity yet still loathe them. This guy isn’t to blame though and it seemed sometimes he was the only player who cared. Sportsbet has him at $3 to not get his 2015 total of 5 votes, however, our model suggests he may be hard done by  and there may be some value.

zorko

  • Round 4: AFLPA gave him 2 votes in Brisbane’s win over Gold Coast, although he wasn’t mentioned in the other game ratings. Verdict: Possible for 1, but 2 unlikely.
  • Round 18: Zorko played well in a domination against Essendon, and was listed in the best, but seems unlikely to get the votes over Rockliff, Rich and Martin.

Bet Summary: The model says 6, but it doesn’t seem to be a solid 6. Avoid, or go for the tie (which at $4.50 may actually be a solid bet).

summary

Summary of votes for mentioned players. These results were taken over 100 iterations. 

Running out of time, so that is all for now. Good luck tomorrow, if anyone uses these in their decision making and it works, let us know! If all of these predictions are wildly wrong, I will be deleting this blog so good luck finding us.

Notes:

Sportbet BYO: We came across this late, but Sportsbet offer custom made bets. You simply send them details of your bet and they work it up for you. It’s probably too late, but using a system like this for a model like ours may have some benefit.

3 Votes: J…..Gwilt (Ess)

 

Predicting the 2016 Brownlow

 

Image result for guess who image

Guess who!

Named after Charles Brownlow, the Brownlow medal is awarded to the player judged the “best and fairest” over the duration of an AFL football home and away season.  The Brownlow is presented on the Monday evening before the grand final and is the crowning social event on the AFL calendar. Betting on the Brownlow is a great Australian pastime, perhaps not of the same magnitude as the Melbourne Cup, but definitely with a similarly high percentage of punters losing money on frivolous betting.

fevolaBrendon Fevola has a long history at the Brownlow Medal count, and currently, hands out gambling advice for OddsChecker

The aim of this experiment was to try and use statistics to predict the Brownlow medal. In the process we would be taking out the human component in betting – the part of you which always wants to have a flutter on Sam Mitchell or Kepler Bradley – just in case. We will try and keep it relatively maths free, with most of the theory provided with links to well-explained pages. If you are really impatient, you can skip all the way to the bottom and look at the predictions.

Human bias is almost impossible to avoid, in particular in relation to gambling. There is a reason why running a book is an ancient and successful way of making money.  See here for a good summary of why humans typically suck at gambling. The Brownlow medal is no exception, in fact, it’s likely worse due to factors such as:

  • The wide field of contenders (Roughly 300 players per week, 9 games, 23 rounds = A lot of data)
  • The long period of time between round 1 and Brownlow night: people need to summarize half a year into what is essentially a complex probability equation. It will invariably be skewed towards the second half of the season and even finals (which obviously don’t count).
  • The inherent reliance on the rationality of umpires. We don’t need to go far past the James Hird debacle of 2004 for evidence that the system can be flawed. Hird had publicly criticized umpire Scott McClaren during the week, leading to fines and public shaming. He went on to have one of the best games of his career against the West Coast Eagles that weekend, picking up 33 disposals, including 14 in the last quarter along with three goals (You probably remember this). He received no votes for the game and umpires confirmed to no one that they hold grudges.

 

How the Brownlow Voting works:

The process for distribution of votes for the Brownlow is a simple and time honoured decision. Voting is carried out by field umpires immediately after games, with 3, 2 and 1 votes distributed after consensus is reached. No statistics are used in the process which has been carried out in this way since 1930 (with a small hiatus in 1976-1978 where two field umpires both voted).

woewodin

Shane Woewodin was the recipient of the 2000 Brownlow medal.

To the ire of some media commentators, the Brownlow medal has been historically dominated by midfielders. Glenn Mitchell for the Roar concisely summarized and discussed the breakdown here.  The medal has been presented in 85 seasons and, accounting for ties, a total of 98 medals have been awarded. Of these, 61 have been won by mid-fielders (centremen, wingmen, ruck-rovers and rovers), and another 19 by ruckmen. This leaves 18 medals to be shared by forwards and defenders (who make up 66% of the players on the field at any given time).  In recent years, midfielders’ domination of the medal has increased, with only one non-midfielder winning the award since 1996 (Adam Goodes, playing predominately in the ruck in 2003. Goodes, however, went on to prove he was no ordinary ruckman when he won the award again in 2006 playing mostly as a wingman).  In 2015, only 2 non-centre players bucked the trend to finish in the top 20; Todd Goldstein at 10 and Aaron Sandilands at 15. The highest ranked forward was Jeremy Cameron from the GWS giants with 12 votes (19 behind medal winner Nat Fyfe).

While the playing position of the Brownlow winner is relatively predictable, accurate prediction of the eventual vote count is a much more difficult proposition. There is little publicly available literature on prediction of Australian sports in general, much less so the Brownlow medal.  The official AFL website runs an Brownlow Predictor, updated weekly throughout the year and the guys at Phantom run a great blog dedicated to this very question. The AFL website doesn’t reveal their prediction mechanisms, and Phantom uses the average results of “Expert Voting” for theirs. Only two researchers (that we can find) have publicly investigated the prediction of the Brownlow using statistical methods – Michael Bailey in 2005, and Robert Nguyen in 2014 (Unpublished, but links to his work in the media here and here).

bailey-prediction

A snapshot of Michael Bailey’s predictions from 2000 and 2001. (He got Shane Woewodin drastically wrong, but we can’t hold that against him)

As part of his thesis for Swinburne University in 2005, Michael investigated the potential for predicting the Brownlow using an ordinal logistical regression model and the standard suite of statistics that champion data doesn’t hog (and obtain themselves, in fairness). An ordinal logistical regression is a relatively simple prediction technique, summarised very elegantly here. Bailey essentially identified the optimal combination of statistical variables for predicting the likelihood of a player receiving 3, 2 or 1 votes in a match.

There was a lot of work that went into Michael’s model and anyone interested in this field should definitely read the entire thesis. Three important and interesting points to come out of it were:

  • Players with distinctive appearances were twice as likely to vote as a “non-descript” person (0.24 votes per game instead of 0.12 votes per game). For his thesis, Michael described distinctive appearance as any player with red or blond hair, or with significantly darker or lighter skin. Had he done his thesis more recently, he would have definitely included shaved heads, heavy tattooing and what I call “Mitch Robinson Head”.
  • It is statistically difficult to predict the top ten’s vote count within a reasonable accuracy. However, for the remainder of players, the problem becomes much more reliable. To quote Michael: “By accurately predicting 66% of players to within one vote of their actual total, and 90% of players to within three votes of their total, the modelling process provides an objective assignment of probabilities that has many benefits.”
  • The prior 3-4 years is the optimal training period, providing the most accurate prediction in any given year. If trained for longer, the performance of the model actually decreased. This is likely due to numerous factors:  changes in game style, media attention or even implicit directions by the AFL to punish certain football clubs for at best circumstantial offenses*.

In general, the approach we used was similar to Michael Bailey’s previous methods. An analysis of variable importance was carried out, as well as experiments into the ideal length of training data (how many seasons to use). We did not use the distinctive features variable as it seems that it might add some subjective bias, and was also beyond the limits of our lazy meter. We incorporated the use of Dream Team scores as a variable, which applied a scaling of importance to particular variables (6 points for a goal, 4 for a tackle and so on). Dream Team score actually ended up being the most important variable, which would come as no surprise to those who are avid fantasy football coaches.

dreamteam_probability_histogram

An equalised histogram showing the likelihood of scoring a Brownlow Vote relative to Dream Team Score. The blue bars represent a count of players who scored votes, and the red players who did not. The points with error bars show the likelihood of a vote or a non-vote occurring. You will note that at around 125 Dream Team points the chance of you scoring votes goes above 50% – i.e. you are more likely than not. 

Variable_Importance.png

The above image shows the relative importance of each input variable for predicting Brownlow Votes. Unsurprisingly, Dream Team score is the strongest predictor and how many free kicks against a player gets is of no influence. Interestingly how many free kicks for a player gets is more important than contested marks – Hello Joel Selwood. 

While a logistical regression is a fairly simple technique, Michael Bailey still managed to predict Brownlow votes to a reasonable accuracy, particularly for lower vote-getters  We decided to trial the use of some more complex prediction algorithms to determine whether we could at least emulate Bailey’s work, and potentially improve it. Using the common machine learning technique, Random Forest, and publicly available statistics data (1,2) we attempted to predict the 2016 Brownlow Medal.

A very brief summary of the theory:A Random Forest is basically a more complex, iterative version of the easier to understand decision tree. A decision tree works out a set of “questions” to ask the data, to determine the optimal way to split the training data into categories – in this case, “No Votes” or “Votes”. The example below is an extremely simple version of a decision tree – the Random Forest would do thousands of these using different subsets of training data and variables to calculate the most robust prediction model possible. Once a set of “rules” has been established, test data (i.e. 2016 data) can be fed into the algorithm. See here and here for more detailed and excellent descriptions of how a Random Forest works.

decision-tree-simple-1

An example (and potentially over-simplified) decision tree showing the underlying logic behind the Random Forest Algorithm.

The algorithm was tested over several years to ensure it was predicting within an acceptable range. In general, our findings were consistent with Bailey in that the algorithm was much more accurate in the lower ranked players than the top 10-20. We have included the results of that in a link at the bottom, but for interest’s sake, here is the top 20 from last year:

2015_Prediction.png

2015 Modeled Vote Distributions versus Actual Results (Top 20 Players). For an example of how to read a Box Plot, click here.

 Comments on 2015:

  • The model picked Nat Fyfe and Matt Priddis to be going head to head at the end, which turned out to be correct. Fyfe had a tighter range of possible scores than Priddis, but Priddis had a higher average.
  • Lachie Neale is apparently statistically a Brownlow gun, but the umpires haven’t realised yet. His differential of 13 votes is the biggest difference between predicted and modelled in the entire prediction.
  • Most bookies had Scott Pendlebury to poll more votes than Dane Swan. If the model had been followed, there was money to be made there. In addition, the model nailed Pendlebury’s predicted votes.
  • The Model correctly identified that Zak Dawson would not poll a vote.

We were fairly happy with this result, so we included last year’s data in the training algorithm and ran the numbers on 2016 statistics. The expected number one vote-getter should surprise no one – Patrick Dangerfield finished on top in no less than 100% of the iterations, by a minimum of 7 votes. For this model to be correct, Dangerfield would need to obtain the most votes ever by a player in a season, so there is room for skepticism, however what is clear is that he has had a statistically superior year over all his peers.

Without further ado, we humbly put our 2016 Brownlow Medal Top 20 predictions in the public domain:

2016.png

2016 Modeled Vote Distributions (Top 20 Players).

Comments on 2016 (More to come as the day approaches):

  • Patrick Dangerfield has had an amazing year. Put your house on it. Put your neighbours house on it. You might only make 30 bucks profit, but it’s safe money**.
  • Sam Mitchell and Joel Selwood were (at the time of writing), both 11 dollars to win the Brownlow in the Danger free market at Sportsbet. Sam Mitchell seems to have the most upside of the two, particularly when you take into account the Danger effect.
  • It is worth reiterating, that the value in using an algorithm like this is in the players who score between 10 and 20 votes. The error in the model drops significantly after the top 10 to 15 players. With this in mind, markets such as “Most at the club” or “Will this player improve on their total from last year” are likely to provide the best value.

We will report back shortly after the Brownlow to either bathe in glory, or eat humble pie. If anyone picks up any errors, has any suggestions or questions, wants different data or just to chat about methods please hit us up. The main reason behind creating this blog was to engage different people in the community and develop a dialogue in this niche area of interest. Also, none of our friends will listen to us anymore and we need a new audience.

Detailed results can be found here:

*Allegedly

** Do NOT make any bets using this information that you aren’t prepared to lose. This is a summary of an algorithm and the results – not gambling advice. D.Y.O.R.

Notes:

  • We use oddschecker.com.au for our up to date betting odds over multiple agencies.
  • Interestingly, its very difficult to find historical betting odds. If anyone has a source that would be excellent.
  • The results shown in the box plots are over 100 iterations of the entire process – the Random Forest algorithm itself has its own iterative improvement process.
  • We also applied measures to limit the stratification problem of the input data containing many 0 vote games, and significantly less vote scoring games. The models were run iteratively to give a potential range of votes a player might receive, which at a practical level gives a confidence measure of the player’s likelihood of scoring a particular number of votes. We also switched our model to only predict the likelihood of a player scoring a three vote game (3-2-1 was not included). This removes the diluting effect of 1 and 2 vote games and gave a statistically more robust method in cross-validation.