I recently saw A Numbers Game's charts of debate tournaments, which look like modified network graphs, and it inspired me to post about some thoughts I'd had a few years ago about graph theory and debate tournaments. (Network) graph theory is a fascinating branch of mathematics, and it is directly applicable to debate tournaments. Basically, a network graph is anything that maps all the pathways (edges) between some nodes (vertices). A network graph of a debate tournament shows the match-ups (edges) between the teams at the tournament (vertices). An edge without an arrow would just show a match-up between two teams; an edge with an arrow would indicate the winner of it. A network graph can show the results of all six (or eight or whatever) preliminary rounds simultaneously. You can see a lot of interesting patterns that would be hard to notice any other way; I've used network graphs as a way to illustrate how a tournament played out.
You might think, since graphs are a complete mapping of all the preliminary round results, that they could be used to rank all the teams at a tournament from first seed to worst. However, if you tried to use a graph this way, you'd run into three different problems. Graphs are best for illustrative purposes (and one alternative use I'll suggest at the end).
The first problem with trying to use a graph to seed the teams is that you still need a lot of tiebreakers. Here's a simple example:
As indicated by the graph, team A beat B and C, and teams B and C both beat D. The problem is that B and C can't be ranked against each other from this information alone. Given the results, the final rankings must contain the sequences {A, B, D} and {A, C, D}, but both {A, B, C, D} and {A, C, B, D} satisfy this. The problem is that, except for round robins, there will always be teams that didn't meet at the tournament, and thus, pairs of vertices without edges.
Ok, so, you still need tiebreakers, but a network graph as a ranking mechanism presents a second, bigger problem: "contradictions." Let's say we run a really small tournament, and we must force A and B to debate twice. Here's one possible outcome that creates a contradiction:
A (aff.) beat B, but B (aff.) beat A. If we try to use the network graph to rank these two teams, we'd be forced to give two contradictory statements. (It's worth remembering that a team's strength is not invariant: teams can be better on one side than the other, they can have off rounds or lucky rounds, and judges also play a factor.) This is simple enough to resolve: we can use speaker points to rank one team higher than the other. Here's a slightly more complex version of the same problem:
A beat B (rd 1), B beat C (rd 2), and C beat A (rd 3). There's a contradiction here also. It will have to be resolved by "upsetting" one result, that is, a team must be ranked below an opponent whom they beat. For example, a final ranking might be {A, B, C}, which means that C's victory over A was "upset" or "overruled" because A had more overall wins, better speaker points, etc., than B or C. Bear in mind that doing a graph didn't create this problem; it merely exposed it. We're mostly oblivious to it because cume sheets show wins, points, etc., and don't provide a network graph. Look carefully through the results of nearly any tournament, and I'd be willing to beat you can find at least a few teams who are ranked below (based on overall record, speaks, etc.) an opponent they beat. Here's another example:
Any such type of contradiction, no matter how many teams, is known as a simple cyclical graph, if there's only one cyclical pathway. The good news is that, however long a simple cyclical graph, only one result has to be upset: {A, B, C, D} only upsets D's victory over A. (A complex cyclical graph would contain multiple, intersecting cycles and be a much bigger headache, but my hunch is that they are extremely rare.)
The third problem with using a network graph as a ranking mechanism is perhaps its biggest, if not mathematically, then for the debate community to stomach:
Notice the problem? There are no cycles; everything flows in only one direction. The rankings, according to the graph, are unambiguously {A, B, C, D, E, F}. Here it is again:
C beat D, but at any tournament, D would be ranked higher, because C is 1-2 and D is 2-1. In the situation I graphed, C is the better team but had a tougher schedule. (It's also possible to create an example in which D is the better team but just had an off-round or a crazy judge.) There's no cycle in the graph, so there's no need to have an upset, but I believe every debater and coach would say D should be ranked higher. Total wins as the first method of ranking is sacrosanct to the debate community, and with good reason. Total wins is clear and unambiguous; a network graph is abstract and complex. I know I would never want a tournament to start interpreting results with a network graph; it would create too much room for error when deciding breaks.
This last problem is insurmountable. I think the lesson is that we need to take every effort to make schedule strength more equitable, but there's no mathematical way to jigger results to somehow weight or re-interpret an unfair set of pairings. The cat's out of the bag, and we just have to say, "Sorry, tournaments aren't always fair," at that point. The bottom line is that network graphs will never and ought not be used for tournament rankings. But there's is an alternative use for graph theory: as a tiebreaker at round robins.
Let's say a six-team round robin has these final results:
Team Record
A 4-1
B 3-2
C 3-2
D 3-2
E 2-3
F 0-5
There's a three-way tie for second place. My suggestion is that round robins can use an idea from graph theory as the first tiebreaker, before resorting to speaker points. I believe that direct results ought to be respected as much as possible, and in round robins, you have direct results for every head-to-head match up. Who cares that team C spoke prettier than B or D if C lost both of those rounds? Here's the graph of this hypothetical round robin:
Now, you may notice that C beat A, which suggests that C is the best of the three, or you may focus on the fact that D beat both B and C, which suggests that D is the best. Both may catch your eye, but there's a way to quantify an exact result. But first, you need to turn the graph into an adjacency matrix, like so:
A column represents a team's wins; a row, its losses. Hence, reading down column A, you see that team A beat B, D, E, and F; reading across column A, you see that team A lost to C. The adjacency matrix contains all the information that the graph does; it's merely another representation of the same data. You might also notice that there's an easy way to double-check the matrix:
You can add to make sure you have the right number of wins in the column and losses in the row for each team. (The adjacency matrix is easy to do for multi-ballot round robins; just put the ballots won into the correct spot for each team.)
Now, we need to test possible rankings. There are six possibilities created by the three-way tie: {A, B, C, D, E, F}, {A, B, D, C, E, F}, {A, C, B, D, E, F}, {A, C, D, B, E, F}, {A, D, B, C, E, F}, and {A, D, C, B, E, F}. We create an adjacency matrix for each possible ranking, a matrix of the hypothetical results that perfectly consistent the ranking order with no upsets, ties, or ambiguity. For example, for the possible ranking {A, B, C, D, E, F}, the matrix is:
The matrix shows results that would be perfectly consistent with the ranking. Now we can calculate a score for how well this ranking corresponds to the actual tournament results:
Subtract the possible ranking matrix [R] from the actual tournament results [T] and count the -1s. (You can count the +1s also; they are symmetrical.) According to this ranking, there were four "upsets": that C actually beat A, that D actually beat B, that D actually beat C, and that E actually beat D. If a tournament accepts the ranking {A, B, C, D, E, F}, it "overturns" four direct results.
How do the other rankings compare? Using the same method for each "prediction," the rankings can be compared:
Rank order "Upsets"
{A, B, C, D, E, F} 4
{A, B, D, C, E, F} 3
{A, C, B, D, E, F} 5
{A, C, D, B, E, F} 4
{A, D, B, C, E, F} 2
{A, D, C, B, E, F} 3
The best ranking is obviously {A, D, B, C, E, F}. It upsets the fewest direct results. In fact, the two that are upset make perfect sense:
that C (3-2) actually beat A (4-1) and that E (2-3) actually beat D (3-2). In a sense, these were unavoidable (or you might even say real) upsets, not artifacts created by a poor ranking. Anyway, the point is, the three-way tie could be broken in this case without resorting to speaker points. This algorithm would be relatively easy to program into a tab program for round robins.
Of course, some ties are cannot be resolved by this method, namely, if the ties are really contradictions created by cycles. There's still a place for speaker points and other tiebreakers.
It occurred to me that there's an easier way to describe the same procedure that I used for breaking a round robin tie.
ReplyDeleteUsing the same example as before, there's a three-way tie between B, C, and D. Imagine that these three teams were your entire tournament. B's record against C and D is 1-1, C's record against B and D is 0-2, and D's record against B and C is 2-0. Your rankings would go {D, B, C}. That is the same method I described before, just an easier way to conceptualize what it is doing.
Hi Russell, Wow! I've been thinking about this problem awhile and I'm glad to see that you and others are attacking it. Great work! A few comments. First, I think it would be helpful to generate a mock tournament and assign strength values to your mock contestants. Then ask your algorithm to generate strength values after your matching. You can then compare the difference between the hidden starting strength and your algorithm's estimation. This would provide with a quantitative measure of your algorithms. For example, I can intuitively understand why you would want to normalize strength of schedule by the end of the tournament but this may not be the most efficient for ranking (or strength estimation). It should be tested and I'm not sure it is best to rely upon actual tournament results because there is always uncertainty if the better team won any specific round.
ReplyDeleteAlso, your virtual tournaments should not be entirely deterministic, meaning that a contestant with a strength of .9 vs one with .45 should have a probability of winning 2/3 of the time not winning 100% of the time.
Also, My second suggestion would be to look at the Colley matrix. http://www.colleyrankings.com/matrate.pdf This seems like the most straightforward method to assign values to a network based upon a binary output (win or loss). You seem fairly fluent in math and programming so I hope you might be able to implement this.
If we assume the Colley matrix is the best ranking system; the big question is what is the best way to match the later rounds. (Does anybody know this?!)
Implementing the first suggestion will give you comparable results.
Lastly, this seems like a very academic mathematical exercise. Have you been able to find any good math papers on this topic? Pubget.com is a good place to start.
Also, I'm not very math literate. If you could decode the most relevant math papers, It'd be incredibly appreciated.
Wow! Good luck Russell! I hope you have the energy and enthusiasm (and time!) to see this through. :)
For those who are interested in this topic, I suggest reading A Numbers Game's page: http://code.google.com/p/anumbersgame/wiki/TournamentCharts. There's a lot of different considerations of how to best "resolve" upsets.
ReplyDeleteI think Anonymous raises a good suggestion; once I have a working program, I need to run some simulations to test it out.
Just to clarify, I think schedule strength should be relatively equalized within a bracket, not across the whole tournament. Teams in the top bracket should have higher opp. wins than teams at the bottom. My whole point is if the opp. wins are fairly equalized within a bracket, then we can be more sure of the reliability of a win-loss record as a good indicator of a team's strength.
Thanks for the Colley matrix paper!