#all those derivatives and summation formulas are so real | Explore Tumblr posts and blogs

fbitennis · 7 years ago

Text

Popcorn Score

A couple of days ago I posted something about toy stats, encouraging fans to come up with some on their own. I can hardly ask someone else to do something without trying one myself.

As a reminder, here are the criteria I set for a good toy stat:

Be relatively easy to calculate, which means only addition, subtraction, multiplication and division, assuming use of a basic calculator (real or computer-based) with a memory function

Use stats that are readily available to anyone with an Internet connection

Be scaled so that someone seeing the toy stat will have a sense of how good, bad or average the measured performance is

Tell us something that we would not otherwise know using an already-existing stat

Bonus: Have a clever name that is descriptive and easy to promote

The purpose of the Popcorn Score is to potentially identify good matches to watch that you wouldn’t necessarily recognize on your own. Of course we don’t need a toy stat to tell us we should watch Roger play Rafa. But setting aside your personal favorites, sometimes it isn’t so obvious. Take this weekend’s grass semifinal matches, for example, in Antalya and Eastbourne respectively.

Here is Saturday’s lineup:

1. Monfils vs Mannarino

2. M. Zverev vs Kukushkin

3. Lacko vs. Cecchinato

4. Vesely vs. Dzumhur

There’s no Fed vs Roger in there. I put those in the order of my own personal preference, for subjective reasons. Monfils is entertaining and has a contrasting style with Mannarino. Zverev might serve and volley to make that match a little more interesting. Cecchinato is having a great summer, but is basically new to the grass. Not sure what might be appealing about the last match: Dzumhur is fast and might slide under the net?

Notice that none of my reasons are related to how well the players play tennis, or how close the match is expected to be. That doesn’t seem right.

Popcorn Score Concepts

Enter the Popcorn Score to quantify which matches maybe I should be watching that I otherwise wouldn’t. Maybe the Popcorn Score will show me a hidden gem, or more appropriately, a hidden kernel.

The Popcorn Score is focused on two concepts: (1) the quality of the players in the match and (2) how competitive the match ought to be. But we have to keep things relatively simple, to stay within my toy stat principles.

We could use the official rankings (readily available) to determine the quality component, but rankings frequently are not an accurate summation of how good a player is, particularly on a surface-specific basis. For more accurate portrayals, we have surface-specific ELO. That would violate the ease of calculation test, except that Tennis Abstract now publishes surface-specific ELO ratings for about 175 men and 175 women, so that makes them readily available.

Still, I don’t want to dismiss the rankings entirely, for two reasons: (1) there will be matches for which the ELO rankings of both players are not yet published and (2) the public is conditioned to thinking of rankings as the standard of quality, and we can’t have a toy stat that purports to identify the most watchable matches but ignores public perception.

The measure of competitiveness is a bit easier. You can use any forecast you want, whether its your own gut feel, a published forecast from your favorite website, or even just the derived forecast from the decimal odds published by your favorite bookmaker (but remember to take out the juice).

Calculating the Popcorn Score

So let’s get to the calculation. First I decided that the score ought to be half-defined by the quality of the players and half-defined by the competitiveness of the match. The weights could be different, of course, but 50/50 seems to work pretty well. However, since I’m using two measures for the quality of player (ranking and ELO), I need to have separate weights for that half of the equation. After experimenting a bit, I decided it works pretty well if player ranks count 15%, player ELO counts 35% and competitiveness counts 50%.

We’ve got to get those all on the same scale, though. Lower ranks should increase Popcorn Scores. Yet higher ELO ratings should produce higher Popcorn Scores. And bigger margins of forecasted win percentage should produce lower Popcorn Scores. So you can’t just add this stuff together.

Ultimately, I want to get a Popcorn Score on a scale of 0 - 100, with 0 being “I’d rather mow the lawn,” 50 being “maybe if I don’t have something else to do” and 100 being “I will not hear anything my wife says for the next two-point-five hours, and I will have a popcorn stomach ache this evening.”

It is possible to have a negative Popcorn Score, but I’d be shocked if that manifests in any main draw match. It’s also possible to exceed 100, which will happen occasionally, but that’s not something to worry about with a toy stat. In fact, it kind of makes it sexier.

The Ranking Portion (15%)

(150 - Average Rank of the two players)/150 * 100

Why average the ranks? Because watching a highly ranked opponent play a very low-ranked opponent does not make me want popcorn. If you believe watching a great player play a horrible player still is a popcorn match, then your version of this stat might say to use the highest ranked player in the match. That’s not my idea of popcorn, but I do sorta make an allowance for that situation at the very end.

Where did the 150s come from? The first 150 is a very round approximation of the back-end player ranks that will be subjected to the calculation (e.g., decent qualifiers at a Grand Slam). The second 150 represents the difference between the best player we could evaluate (#1) and the baseline player at #150, rounded up a bit.

I am confident the 150s should not be higher, as higher baselines and divisors overinflate the quality of the players in the Popcorn Score. The 150s may be too low, but in tinkering with it, it seemed to work best at 150, because any lower and you get too many negative scores.

And yes, you can get a negative score for this component. If #175 is playing #176, that’s a -17, as well it should be. If #90 plays #98, you get 37. Suppose we used 100 instead of 150 in the equation. The #175/#176 match is -75. The #90/#98 match is only a 6, and that’s before weighting it at 15%. I don’t think the player quality is that low.

The ELO Portion (35%)

(Average ELO of the two players - 1200)/800 *100

I’m averaging the two players’ surface-specific ELOs for the same reason I averaged the ranks.

I fiddled around with the 1200 number as the baseline. Intuitively it seems too low to me, because I don’t expect many players with an ELO of 1200 will be having a Popcorn Score calculated on them. I originally conceived this with 1300 as the baseline, but Jeff Sackman recently re-did his ELO formula at Tennis Abstract (to account for player absences), and all the ELO scores dropped, so I dropped my baseline to 1200 to account for that.

Why not raise it so it makes intuitive sense? There is a lot of ELO clustering in the 1500s. The ATP/WTA rankings are linear, but ELO ratings are not. So if you raise the 1200 baseline to 1400, for example, you have a ton of matches between decent players that are treated as having very low quality players for the Popcorn Score, and that violates my sense of justice.

The 1100 divisor is the spread between the baseline of 1200 and the best players that will be in the equation with ELOs around 2300. The greatest players have occasional peaks above 2300, but that’s not going to happen often. (As an aside, my own Popcorn Calculations use a baseline of 1300 and a divisor of 1000, because my own ELO Ratings are higher in the middle than TA’s, although it doesn’t make much practical difference in the forecasting context.

One more note about the ELO component. If you don’t have a reliable (and by that, I mean surface-specific) ELO score for a player, then you should learn to calculate it yourself. I’m kidding. In that instance, just weight the ranking portion at 50% instead of 15%, and ignore the ELO portion.

The Competitiveness Portion (50%)

(.99 - Difference between the two players expected win percentage)/.99 *100

This one’s a lot easier, because no one is going to have a 100% or 0% win percentage. And, the narrower the difference between their forecasted win percentage, the more competitive the match is expected to be.

Putting it all together

We have the three components, so let’s put them all together. Note that I multiplied each component by 100 so that we wouldn’t have any numbers below 1 when doing the individual components, but since we are adding this stuff together, we can just add the three components and multiply by 100 at the end. So we’ve got:

Step One: (150-Avg Rank)/150 * .15, plus

Step Two: (Avg ELO - 1200)/800 * .35, plus

Step Three: (.99 - Difference in expected win percentage)/.99 * .50

Step Four: Multiply the sum by 100 to get it on the 0-100 scale.

Examples

Let’s do each of the Antalya and Eastbourne semifinals. For this I’ll use Tennis Abstract’s version of Grass ELO, and the implied odds from the average decimal odds listed on Oddsportal.com, with the juice extracted. (No one drinks juice with popcorn, do they?)

These are listed in the same order I originally ranked them.

Well maybe I picked a bad week to do this, because they are all kind of the same, and none have super high Popcorn Scores. But that’s not surprising...look at the lineup. The Popcorn Score isn’t going to turn Lacko vs. Cecchinato into a 100. What they all having going for them is that they are reasonably competitive, and what works against them is that there are no stars, although Monfils has some separate entertainment appeal. (Note: The calculations above used a divisor of 1100 on the ELO portion, which I have since dropped to 800 if you use TA’s ELO ratings).

In any event, maybe I should be more interested in the Vesely-Dzumhur match than I originally was. It features the highest ranked player of the four matches and is forecast to be the most competitive. And apparently I overestimated Zverev-Kukushkin.

Wait, this Popcorn needs some spice

In running this formula on various random combos of players, I noticed all sorts of situations where I had minor disagreements between my subjective idea of popcorn matches and the Popcorn Score. That’s okay, this isn’t science, and data isn’t going to fully account for the appeal of Gael Monfils.

BUT, in some instances, the differences really bothered me, and those had something in common: The match had one superstar, but the Popcorn Score was dragged down too far by a weak opponent. Think of it this way: At some level, a match with Federer, Nadal or Serena always has appeal, even if the opponent is terrible. It doesn’t usually hold all the way through the match, and I don’t think they automatically should get Popcorn Scores of 100 just because of who they are and regardless of opponent, but I do think that certain players should have enough weight to carry a poor opponent to a somewhat higher Popcorn Score.

Yet it won’t work to say “if Federer is playing, boost the Popcorn Score by X amount.” Otherwise you have to constantly change your list of who gets a boost, and that’s too narrowly tailored.

Initially, I decided that if a match has one player ranked #5 or better, you add 15 points to the Popcorn Score (15 points to Hufflepuff. Y’know, Popcorn. Puff. Get it?) You do NOT add that bonus if both of the players in the match are Top 5 or better, because the Popcorn Score already accounts for the high quality of that match.

Nadal and Fed clearly qualify. Sascha Zverev probably should. Del Potro and Cilic are harder sells, but whatever. On the women’s side you’ve got a nice Top 5 group that I’m comfortable with, but Serena matches don’t currently get a bonus, so that seems weird. There are other situations where the Popcorn Score is off because of player layoffs, etc. (e.g., last week’s Murray-Wawrinka got a low Popcorn Score).

Update: Jeff Sackman from Tennis Abstract sent me the idea of basing the bonus on peak rank, not current rank, so that you can pick up the appeal of someone like Serena or Murray. He also suggested scaling the bonuses, for higher peaks, which nicely differentiates between the star power of a #1 (or former #1) vs the “star” power of #5 (or former #5).

Jeff’s suggestion was to take 6 minus the peak ranking (treat negative numbers as zero), raised to the power of 3, and then scale it appropriately. I did that, and took the natural log of the results for purposes of scaling it to a top bonus of 20 points for peak #1s. (I’m not going to pretend to know what the natural log of anything is, but I know you can sometimes use it for scaling so I tried it). As it turns out, when you do that, you can get pretty close to the same result by taking 6 minus the peak ranking, and multiplying it by 4.

So, a peak #1 would get a 20 point bonus, a peak #2 would get a 16 point bonus, and so on down to 4 points for a peak #5. The log numbers would have been 19, 17, 13, 8 and 0. I’d be fine with this scale and with a 0 for a peak #5 actually, but using logs doesn’t fit with the simplicity principle in the toy stat rules, and maybe using exponents doesn’t either. The exponent is just multiplication, which fits our rule, but it seems scarier to some people, and while a very basic calculator probably has x^2, it may not let you choose a different exponent. Plus, you don’t really need to remember the 6 minus peak times 4 rule...you can just do 20 points for peak #1 and work your way down to 4 points for peak #5s in a linear fashion.

I kind of like not giving a bonus for peak #5s, but let’s see who that involves. I tried 50 searches for male players who might have peaked at #5, and had zero hits (who knew Grosjean peaked at #4?). Try it, it’s harder than you think. On the women’s side, I got to Ostapenko in five tries, and she deserves a 4 point bonus, so I’m leaving the #5 bonus in there.

There still will be anomalies with the scaled bonus structure, of course. As Jeff pointed out, Zvonareva would get a good-sized bonus (16 points in my formulation), which is probably okay for the first comeback match at Wimbledon based purely on curiousity, but will oversell her future matches. There are ways to address that (e.g., only take peak ranking in the last X years), but I don’t want to over-complicate it.

Because of this update, you will apply the bonus calculation to the highest peak between the two players, rather than applying it only if one player was Top 5. So, if Federer (peak #1) plays Nishikori (peak #4), it’s a 20 point bonus.

#Tennis #Toy Stats #ATP Tennis #wta tennis

0 notes