The horizontal axis shows possible performances of team 1, based on a normal distribution centered at 0 (indicating an exactly average performance for team 1 based on its average strength). The vertical axis shows possible performances of team 2, again a normal distribution around 0, the average-strength performance.
Let's say that team 1 is significantly stronger than team 2. In order for team 2 to win, it must have a much better than average performance -- and team 1 would have to have a much worse than average performance. In other words, only some of the possible results in quadrant 2 would result in a team 2 win, like so:
The red cases highlight the upsets. Rare indeed, because team 1 must underperform and team 2 must overperform. As an alternative, consider the scenario that team 1 and team 2 are evenly matched. In this world, team 2 wins about 50% of the time:
Mathematically, it is simple to model this with a logistic function. If difference = team 1 strength - team 2 strength, then the formula for the probability of team 1 winning is
where k depends on the units in which strength is measured and just how variable the teams' performances are. The value of k is an empirical research question that could change from season to season. The logistic function looks like this:
As the difference gets larger, team 1 is stronger and more likely to win, approaching 100%. As the difference turns negative, team 1 is weaker and less likely to win, approaching 0%. And at a difference of 0, the teams are even, and the odds are 50-50.
I analyzed the 2010-2011 season for open/varsity policy debate for CEDA/NDT data. I looked at each team's strength, using the easy-to-understand measure of weighted wins, expressed as an expected win percentage for a season (so, 62% means that a team is expected to win 62% of its rounds in an entire season, adjusted slightly from its actual win percentage by schedule strength). Then I analyzed all the rounds that happened, based on the difference in the two teams' strengths, as either wins (for the higher rated team) or upsets (for the lower rated team).
I found that about 20% of rounds were upsets. This is close to football's 25% or so. But of course, most of the upsets occur when the teams are fairly close in rating. Here are the results:
So, for example, when the difference in the ratings was greater than 0.5 but less than 0.55, the higher rated team won 97.3% of the time. This is obviously a significant difference in the teams' strengths: a team rated at 82% weighted wins versus a team weighted at 30% weighted wins! It is hardly surprising that this is such a lock. At the other extreme, when the difference in the ratings is greater than 0.1 but less than 0.15, the higher rated team only wins about 59% of the time. These are close rounds, nearly toss-ups. A difference of 0.2 seems to be the tipping point: above this, there are few upsets.
Here is the same data in graph form:
A line of best fit is modeled. Using the formula above, my best guess is that k is about 6.5.
I analyzed the 2010-2011 season for open/varsity policy debate for CEDA/NDT data. I looked at each team's strength, using the easy-to-understand measure of weighted wins, expressed as an expected win percentage for a season (so, 62% means that a team is expected to win 62% of its rounds in an entire season, adjusted slightly from its actual win percentage by schedule strength). Then I analyzed all the rounds that happened, based on the difference in the two teams' strengths, as either wins (for the higher rated team) or upsets (for the lower rated team).
I found that about 20% of rounds were upsets. This is close to football's 25% or so. But of course, most of the upsets occur when the teams are fairly close in rating. Here are the results:
So, for example, when the difference in the ratings was greater than 0.5 but less than 0.55, the higher rated team won 97.3% of the time. This is obviously a significant difference in the teams' strengths: a team rated at 82% weighted wins versus a team weighted at 30% weighted wins! It is hardly surprising that this is such a lock. At the other extreme, when the difference in the ratings is greater than 0.1 but less than 0.15, the higher rated team only wins about 59% of the time. These are close rounds, nearly toss-ups. A difference of 0.2 seems to be the tipping point: above this, there are few upsets.
Here is the same data in graph form:
A line of best fit is modeled. Using the formula above, my best guess is that k is about 6.5.
No comments:
Post a Comment