Does momentum exist in matches that go to the decisive third set? Justin Stocks-Smith, UTR’s Data Scientist, digs into the data to find the factors that may predict which player will come out on top. Introduction: Heading into their fourth round matchup in the 2018 US Open, Serena Williams had a UTR of 13.21 and Kaia Kanepi had a UTR of 13.12. Serena won the first set 6-0 and Kanepi won the second set 6-4. At that point in time, I found myself wondering which player was favored to win the match. Did winning the most recent set give the edge to Kanepi? What other factors should be considered? Analysis: My hypothesis was that momentum does exist, so the player who loses the first set and wins the second set will end up winning the match more often than not. I tested this hypothesis using data from 20,000 singles matches where both players had a UTR of 12.00 or higher. The data included three explanatory variables: Winner_of_Second_Set: A binary (0 or 1) variable for whether a given player won the second set. This variable is equal to 1 for Kanepi and 0 for Serena. Game_Advantage: The difference in games won through two sets for a given player. This variable is equal to +4 for Serena and -4 for Kanepi. UTR_Difference: The UTR of a given player minus the UTR of their opponent. This variable is equal to +.09 for Serena and -0.09 for Kanepi. Using gradient boosted trees, a machine learning technique that combines many weak models to create a strong one, I was able to estimate the relationship these three variables have on the likelihood of winning the third set.
Results: As shown below, the three variables had varying levels of predictiveness. There did appear to be a slight momentum effect, as the winner of the second set had a 53% chance of winning the match. A game advantage through two sets also led to a higher likelihood of winning. Most important, however, was the UTR difference. This indicates the better player usually prevails in the final set.
Did the model correctly predict the winner of the Serena/Kanepi third set? Given Serena did not win the second set (-3%), had a four-game advantage (+5%), and had the better UTR by about 0.10 (+3%), the model gave her a 55% chance of winning. Serena did end up winning the third set 6-3.
Conclusion: Result data shows that in matches that go to the final third set, there may be a small momentum advantage. That effect is dwarfed, however, by overall form. In the end, the better player usually comes out on top.