Introducing a new CricViz model: Wicket Probability.
“It was good enough for the No.11, it probably would have done for many of the top-order batsmen too”. “His bowling figures don’t do him any justice at all”. “He’s very lucky to still be batting after that spell”. These are some common remarks we often hear commentators make during matches. But just how lucky was a batsman, and how unjust was it? With CricViz’s new Wicket Probability model we can begin to break this down.
Our aim is to estimate the probability of a specific delivery resulting in a dismissal of the batsman. Using our database we can look at nearly every possible aspect of each delivery, from both the batsman’s and bowler’s perspective, as inputs for our model. We first consider basic details such as the bowler type, left/right-handedness, and batting position. Perhaps most important are the granular characteristics of the delivery itself which we get from official ball-tracking providers: the release speed, line and length, amount of swing and deviation of the ball off the pitch etc. The averages of, and variation in, speed, line, length, swing and deviation are also included to account for how the pitch has played in recent deliveries.
Finally, we consider some additional context around the delivery with regards to the match and the specific batsman. For example, how long the batsman has been in for? How have they performed so far in terms of their false shot rate? In limited-overs matches we would look at things like the stage of the innings, the required run-rate and the batsman’s dot ball percentage.
To train our Test match model to learn the complex interdependencies of all these factors, we feed it over 700,000 deliveries from 378 matches over the last 12 years. This is a representative subset of the Tests that have taken place in this period covering all the major teams, and all sorts of match situations and conditions.
Our algorithm of choice is XGBoost, commonly used in many machine learning problems, trained with 1,000 trees. Predicting on a 20% test set, we achieve an AUC value of 0.742 meaning our model is able to differentiate between wicket taking and non-wicket taking deliveries quite well i.e. 74% of the time.
Let’s have a look at the output of the model. The chart below shows the distribution of the estimated wicket probabilities.
The first thing to note is that it is heavily weighted towards quite low values. That is because wickets are pretty rare; the average dismissal rate in Test matches over the last 20 years is about 1.5% or equivalently, a strike rate of 65 balls per wicket. So a delivery which has a wicket probability of 4.5% is three times as likely to take a wicket than normal. In fact, 90% of all deliveries have a wicket probability value of less than 3%. Think about how many times batsmen get away with absolute jaffas – there won’t be many balls where the wicket probability will be even close to 100%. The model also considers the current situation, for example new batsmen are more likely to be dismissed than set batsmen; batsmen who have been scratchy (having a high false shot rate) and are struggling are more likely to be dismissed regardless of the inherent quality of the delivery.
Using this model, we can analyse what features are most influential in determining whether a particular delivery will be a wicket or not. The plot below shows the ten most important features for Test matches. On average, the highest impact feature is the line of the ball when it passes the stumps (as you’d expect). Then, the deviation off the pitch which includes both seam movement and spin, followed by the height at the stumps which is indicative of the length.
Whilst it is useful to look at these results in aggregate, it is more insightful to drill down into each individual delivery. We can analyse which attributes of the delivery made it so likely to get a wicket (or otherwise). This is done by extracting Shapley values from the output of our algorithm. Let’s take a look at a few examples to illustrate this.
The highest wicket probability value for any delivery in the recently concluded England vs India series was 43.3%, which was Shami’s wicket of Root in England’s first innings of the second Test. The most important reason the value was so high was the fact it was hitting off stump. A seam movement of 1.28 degrees into the batsman was the second most important factor followed by the fact the ball kept quite low, projected to hit the stumps at just 58cm above the ground (about three-quarters of the way up the stumps). For additional context, this delivery was in the top 0.005% of all deliveries in terms of wicket probability, meaning we would only expect to see a delivery as good as this every 20,000 balls, or once every 8 Tests.
On the other end of the scale we have Ravindra Jadeja’s dismissal of Keaton Jennings in the first innings of the final Test. This had a wicket probability value of only 0.61% (a wicket every 164 deliveries). The main reason it was so low was the line of the ball – heading down the leg-side and over the stumps. Jennings had every reason to be disappointed.
As a final example let’s find the highest wicket probability delivery that didn’t result in a wicket. This was Bumrah facing Stokes in the last innings of the fourth Test which Bumrah managed to play out. It was adjudged 39.5% likely to take a wicket as it was dead straight, full and deviated in 1.0 degrees, as well as the fact Bumrah came in at No. 11.
The model can also calculate wicket probabilities if we look purely at the tracking attributes i.e. strip away the surrounding context and just consider the inherent quality of the delivery. In this case the tracking only wicket probability was 10.8% – still quite high but emphasises how likely it is to take a wicket against a No. 11 who has just come in.
The wicket probabilities generated by this model will allow sophisticated analysis of players and conditions which, when given due context on broadcast or in analysis, can make for a powerful dissection of a player’s career, bowler’s spell or team’s performance. We’re also experimenting in using these to assess how difficult batting is. Is it easy or hard? Is it getting much harder than the last few overs or levelling off? Keep an eye on our blog for more on this during the Asia Cup and the winter in Australia.