Easier or Harder Tennis Matches?

A Simple Analysis of ATP Scores in 2019

Introduction

Even if you are not a tennis fan, you might have heard that tennis is one of the toughest sports in the world, and men’s singles may be the hardest among all professional tennis tournaments. The recorded longest match between men’s single players lasted for 11 hours, 5 minutes over three days, with a final score of 6–4, 3–6, 6–7, 7–6, 70–68.

Sometimes however, lopsided matches occur as well, when one player excels the other too much. The recorded shortest match between men’s single players lasted for only 18 minutes, with a score of 6-0, 6-0.

So, exactly how tough are the recent matches? Are matches trending harder or easier? These two questions rise in my heart, and lead to this simple analysis of the latest ATP scores.

Concepts

Don’t know much about tennis? No worries! Basic tennis concepts are shown below. Just scroll down and feed your curiosity!

Point: Smallest unit of measurement. Points increment from Love(0)-15-30-40-game.

Game: Each games consist of 4 points, and is won when a player reaches 4 points with at least a 2 point advantage.

Set: A set consists of 6 games, and is won by the player who reaches 6 games first with least a 2 point lead.

Match: A match is usually played as best of 3 or best of 5 sets.

Tie-break: If a set score of 6-6 is reached and tie-break set rules are used, players must play a tie-break game, in which a player/team must reach 7 points with a two point advantage to win. In this case, the score of this set will be shown as 7-6(number of points the set loser got in the tie-break game).

Bagel: A set ends with a score of 6-0.

Reference: http://protennistips.net/tennis-rules/

Grand Slam: The Grand Slam tournaments are the four most important annual tennis events: Australian Open, French Open(aka Roland Garros), Wimbledon and US Open.

ATP: The abbreviation for Association of Tennis Professionals.

WTA: The abbreviation for Women’s Tennis Association.

Data

Because of the Covid-19 pandemics, many tournaments were delayed or cancelled in 2020, and thus the main dataset used in the analysis is from 2019. For comparision, I also use data of 2016-2018.

Source

I find the needed datasets from Jeff Sackmann. Actually, Jeff also hosted ATP match data of other years and WTA match data.

License

We are allowed to share the dataset, adapt it and build our work upon it, as the dataset is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, which in short is:

  • Attribution is required;
  • Non-commercial use only.

Size and Location

The main dataset named atp_matches_2019.csv is 608KB in size and contains 2781 rows and 49 columns. It can also be viewed online here.

Other datasets used can be viewed here:

For the purpose of comparison, the four datasets are merged together by the author.

Column Meaning

In my analysis, I use only some columns whose data type and meaning are presented as belows, so I remove unused columns from the files. Please browse Jeff’s dictionary if you want to know more about other columns.

  • surface: court surface, including clay, hard and grass.
  • tourney_level: ‘G’ = Grand Slams, ‘M’ = Masters 1000s, ‘A’ = other tour-level events, ‘F’ = Tour finals and other season-ending events, and ‘D’ = Davis Cup.
  • tourney_date: eight digits, YYYYMMDD, usually the Monday of the tournament week.
  • score
  • minutes: match length, when available.
  • tie-break(coded): the amount of sets with a score of 7-6 or 6-7.
  • bagel(coded): the amount of sets with a score of 6-0 or 0-6.
  • Year(coded): the year of the ATP season the match belongs to

A quick look into the Merged Dataset (Trimmed)

tourney_id
tourney_name
surface
tourney_level
tourney_date
score
round
minutes
winner_rank
loser_rank
tie-break
bagel
year
2016-M020
Brisbane
Hard
A
20160104
6-2 7-5
R32
84
65
61
0
0
2016
2016-M020
Brisbane
Hard
A
20160104
6-0 6-3
R32
67
197
76
0
1
2016
2016-M020
Brisbane
Hard
A
20160104
6-4 6-3
R32
69
18
71
0
0
2016
2016-M020
Brisbane
Hard
A
20160104
6-3 6-2
R32
67
87
813
0
0
2016
2016-M020
Brisbane
Hard
A
20160104
4-6 6-3 7-5
R32
143
78
117
0
0
2016
2016-M020
Brisbane
Hard
A
20160104
6-4 6-4
R32
82
16
37
0
0
2016
2016-M020
Brisbane
Hard
A
20160104
3-6 6-4 6-3
R32
113
20
120
0
0
2016
2016-M020
Brisbane
Hard
A
20160104
4-6 6-3 6-2
R32
103
69
129
0
0
2016
2016-M020
Brisbane
Hard
A
20160104
7-6(8) 6-4
R32
102
51
60
1
0
2016
2016-M020
Brisbane
Hard
A
20160104
6-3 7-6(10)
R32
125
28
15
1
0
2016
Loading...

Analysis

Average match length

An effective way to assess if matches in 2019 were easier or harder than before is to compare the average match length in different years.

Although the match length differs among different matches and is impacted by the court type, as tournaments’ level, frequency, location and court type are rather steady from year to year,the average match length still tells the story.

Overall, a longer average match length indicates harder matches, while a shorter one implies easier ones.

The bar chart above shows that longer average match length occurred in 2019, when compared to records in 2016-2018.

From this aspect, matches in 2019 were harder.

Tie-breaks and Bagels

The other two criteria for assessing the overall difficulty of matches in 2019 are the number of tie-breaks (7-6/6-7) and the number of bagels (6-0/0-6).

Although the numbers of matches in different years are not exactly the same, the difference is small in recent years. Thus, the comparison of tie-break counts and bagel counts can still tell us something.

Generally, more tie-breaks imply a higher level of match difficulty; more bagels imply a lower difficulty.

(Instruction: choose a year and a type below for the visualization to explore the data.)

Year: Type:

2019’s tie-break count is more than average in the recent four years, while the bagel count is less than average. This seems to correspond with the result above, that 2019 indeed viewed tougher matches.

Grand Slam Tie-breaks and Bagels

In any sport field, the competitions between top players are always the focus in the audience’s eyes. When things come to tennis, of course we will ask: What about those Grand Slam matches? Are they showing a same trending?

Find the answer below!

(Instruction: choose the Grand Slam to explore the data.)

Grand Slam:

In 2019, during the Australian Open and the US Open, the two Grand Slams on the Hard Court, matches seemed to be harder than before ones.

On the other side, in Roland Garros (Clay Court) and Wimbledon (Grass Court), matches seemed to be easier than those in previous years.

Conclusion

Learning from the above visualizations, my answer to the question:

ATP Matches in 2019: Easier or Harder?

is:

Harder’,

although for some specific courts, the answer may be different.

What do you think?