Monday, March 2, 2009

Debate tournament math

Here's how a high school or college debate tournament works: for the first two rounds of debating, each team is randomly assigned an opponent; for the third round, the winners of the first two rounds are assigned other winners as opponents, while losers debate losers. This system continues for several preliminary rounds, "power matching" teams against opponents with the same record of wins and losses, until the top "brackets" with winning records (7-0s and 6-1s, for example) move on to elimination rounds. Thus, the preliminary rounds are a type of Swiss system tournament, a format that is used in chess competition, too. The number of teams in each final bracket follows a perfect binomial distribution (plus or minus one team or two for odd numbers that require a team to be "pulled up" from a lower bracket).

This approach is generally felt to be fair, although it is a recognized problem that a good team could lose the first or second round and would have easier opponents all the way through. How often does this happen? A visualization helps:


Click on image for more detail.

These are the preliminary varsity policy debate results at the 2009 Harvard invitational high school tournament. (I took out names because I don't want to seem like I'm ragging on any school; I'm really just interested in the math.) Each row represents a different bracket -- the 7-0 at the top, 6-1s one row down, etc., and the 0-7 at the bottom -- and each row is sorted best speaker points (left) to worst speaker points (right) in that bracket. Each arrow represents one actual debate between two teams, pointing to the winner but in the loser's row color. Every single round is there, but I bolded the rounds that the top nine teams won. You can see how differently the top teams (the 7-0 and 6-1s) got that record. Some 6-1s, circled, defeated at least three 5-2 or better teams. Other 6-1s, in squares, defeated only one or no 5-2s. The 6-1 on the far left defeated not one team in the top 20%. Perhaps they could have, but they never even faced off against one. They made it into elimination rounds on the basis of an easier schedule than any other 6-1.

Let me make it absolutely clear, I'm not criticizing the folks who run the Harvard tournament. They do a fine job. The problem is not with their execution. I'm sure that at every point, the 6-1s were given proper opponents for their records; the problem is that some of those opponents went on to lose many of their remaining rounds and revealed their weakness. The problem is the method, which is only as good as the current record of each team accurately reflects its true strength. Since this information can't be known in advance, the only solution so far has been to repeat the process many, many times to thoroughly test and properly rank each team in preliminary rounds. Potentially, what you're looking at above is a raw sort that still contains some errors, like ABCEDLFGJIMNP... it's getting better, but there's still a need for further sorting. Consider it this way: the first round is supposed to determine whether a letter is in the first half of the alphabet or not, by picking up two letters at the same and determining which comes first. Generally speaking, this works, and A, B, C, etc., are likely to end up in the first-half pile. But what happens if the letters you pick up to compare are T and W? T will be misleadingly placed in the first-half pile, and you hope that this doesn't happen two, or three, or seven times in a row, but clearly, it can and did happen, and a team made into the top 6% without ever facing an opponent in the top 20%.

Randomness isn't enough. There needs to be an element added to power-matching that controls for strength of schedule. If you need further convincing, here are the 5-2s highlighted:

Click on image for more detail.

The circled 5-2s defeated at least one other 5-2. (It's hard to see those blue arrows, so click on the image for expansion first.) The 5-2s in squares defeated only one or two 4-3s or better -- that is, they made it into the top 20% and elimination rounds on the basis of defeating only one or two teams in the top 40%. That's quite a disparate schedule: debating other 5-2s and several 4-3s, or debating a few 4-3s and then several teams that are weaker.

4 comments:

  1. As someone who does debate tournament administration often I just wanted to say thanks for exploring this! Very interesting.

    However, given what you've demonstrated, the real problem becomes - what do we replace it with? That's the big question and one that, I feel, would have to be very persuasive to convince a lot of people we should be doing things differently. We, ultimately, really only have two criteria to use - wins and speaker points. Beyond that, all other statistics are just derivatives of these two (High/low speaker points take out the outliers, opp wins just counts wins of opponents, etc...).

    You say briefly that - "There needs to be an element added to power-matching that controls for strength of schedule." Do you mean opponent's wins? I've done some preliminary research for graduate schools that indicated when you correlate opponent wins (strength of schedule) with elimination round performance it was one of the last predicative variables. By which I mean, the teams with the MOST opp wins were the least likely to make it to the final round.

    I hope to have that study published soon. An element I hope to add to that study is suggestion for alternative criteria/more data to collect about each round. I did a preliminary study of another data point to record - strength of win. We'll see how that works out.

    Great post - thanks for your sharing your perspective!

    ReplyDelete
  2. Danny, thanks for the comment. Here's my thought on controlling for strength of schedule: in brackets, cross-pair opponent wins against speaker points. For example, let's say a 2-0 has low opp wins. So, for round 3, it is due an opponent with high speaks. Or, vice versa, for a 2-0 with high opp wins, it's due an opponent in the bracket that has low speaks. (This sounds easy to do in practice but requires a computer to run through teams^2 possible match-ups to find the overall ideal pairing. Basically, it's an optimization matrix.) The idea is to avoiding giving good teams the easiest schedules and instead to balance the strength of schedule in each bracket as much as possible for each round.

    I'm fascinated to see the results of your study. So, is there actually a negative correlation? If so, it makes sense: teams that had tough prelims are worn out by elims. If instead you mean that there's no correlation, r=0, then that would make sense, too: since the program currently ignores schedule strength and leaves it to random chance, it's arbitrary.

    If I might suggest, you might consider looking at more than just finals. What about making your y variable teams at tournament/teams in elim round in which your team was eliminated? For example, if there were 56 teams at a tournament, and a team made it to quarters, you'd assign them a y value of 7 points (meaning that they were in the top eighth).

    Best wishes, please keep me posted on your research! Good luck.

    ReplyDelete
  3. i recently suggested the below as a tie breaker. i think a more sophisticated version of the below could be used to actually seed teams rather than using their win-loss record (though, as danny notes, that would be very controversial).

    a decent number of people are interested in opp records as a tie breaker but it has its problems. which is the better team in this case?
    a team that hits 3 8-4 teams and 3 2-10 teams (opp record 28) and goes 6-6
    OR
    a team that hit 4 5-7 teams, 1 9-3 team, and 1 2-10 team (opp record 31) and goes 6-6?

    i'd expect the first team to go 6-6 (losses to the 8-4 teams and wins against the 2-10 teams). the second team, i'd expect to go 10-2 because they should beat 5-7 and 2-10 teams.

    here's a quick brainstorm that might address the problems with opp records.

    it seems to me that instead of opp record--we could look at who a team beat, split with, and lost to.

    if you lose a ballot to a team that you have a better record than, you lose a tie breaker point for each difference in record. (eg you are 6-6 and lose a ballot to a 4-10 team--you lose 2 points)

    if you win a ballot against a team that is equal or better than you, you get a tie breaker point for each difference in record (eg you are 6-6 and win a ballot against a 9-3 team, you get 3 points; also, i'm giving .5 points to win against a team with equal numbers of ballots)

    add them up and that counts as your "win-loss quality" tie breaker.

    example:

    TEAM A went 6-6 and did this:
    2-0 vs a 1-11, expected win
    2-0 vs a 6-6, 1 unexpected ballot +.5 (expect to split with 6-6 team)
    1-1 vs a 7-5, 1 unexpected ballot +1 (expect to lose to a 7-5 team)
    0-2 vs a 7-5, expected loss (expect to lose to a 7-5 team)
    1-1 vs an 8-4, 1 unexpected ballot +2 (expect to lose to an 8-4 team, bump up 2 for being 2 ballots better)
    0-2 vs a 9-3, expected loss

    net +3.5 unexpected ballots--+3.5 win-loss quality tie breaker points

    TEAM B went 6-6 and did this:
    2-0 vs 5-7, expected win
    1-1 vs 5-7, 1 unexpected loss -1
    1-1 vs 5-7, 1 unexpected loss -1
    1-1 vs 5-7, 1 unexpected loss -1
    1-1 vs 10-2, 1 unexpected winning ballot +4 (expect to lose to a 10-2 team, bump up 4 for being 4 ballots better))
    0-2 vs 11-1 expected loss

    net +1 unexpected ballots: +1 win-loss quality tie breaker points

    TEAM A WOULD BE SEEDED HIGHER (3.5 versus 1 win-loss quality tie breaker points)

    and, i think team a should be seeded higher--they beat better teams.

    ReplyDelete
  4. This is an interesting suggestion. An alternative, which I think accomplishes the same goal but more simply, is a "weighted win" system: if you defeat a team, you receive 1 point for every one of their wins; if you lose to a team, you receive -1 point for every one of their losses.

    I ran the method on the examples you gave: team A had 29 weighted wins and 25 weighted losses, for 4 total points; team B had 35 weighed wins and 25 weighted losses, for 10 total points. Now, this would indicate a different outcome than your method, but I might disagree with your outcome. Team B might be a more impressive team; they took a ballot off a 10-2. Team A's best performance was taking a ballot off an 8-4. I don't know. I'm sure the weighted win idea could be tweaked to deliver your outcome, if you play with the values slightly.

    ReplyDelete