Sunday, March 24, 2019

Scheduled elimination tournament calculator

Single-elimination tournaments have one particular flaw: The tournament can only break powers of 2--i.e., 2, 4, 8, 16, 32, etc., teams can make it into the elimination rounds, unless the tournament decides to do a partial elimination round. For example, let's say the tournament decides to break 20 teams. So, the partial elimination round will involve eight teams debating, with twelve teams sitting around and waiting for two hours. The eight teams debating turns into four teams advancing to the first full elimination round, plus the twelve teams that sat around, making for a perfect bracket of sixteen. It works, but... meh. I don't like that the majority of the elimination-qualified teams did nothing for a whole round--it's kind of unfair that they could scout, plan a new strategy, or go get a nice meal and relax. And I especially don't like it, as a tournament director, that instead of just doing one big elimination round right after preliminary rounds and being done with it, I've got to drag it out into two smaller rounds. Let me explain this one a bit.

Ideally, a tournament breaks exactly one-third of its preliminary teams into elimination rounds. This is the ideal because preliminary rounds use one judge, elimination rounds use three, and well, you get the math. Assuming I have just enough judges for prelims, then breaking one-third of teams will use up all my judges perfectly for the first elimination round. The tournament director can say to every judge, "You must stick around for at least the first elimination round. I need everyone. Then I will start to dismiss judges whose schools have been eliminated." It works out brilliantly if every judge is used in elim round 1, half are needed in elim round 2, a quarter are needed in elim round 3... Smooth and simple.

Now consider the 20 teams breaking to elimination rounds problem. That means I have about 60 teams in prelims, and therefore 30 judges. In the partial elimination round, eight teams debate, so that is four rounds... therefore I need to use twelve of my 30 judges. In the first full elimination round, sixteen teams debate, so eight rounds, meaning I need 24 of my 30 judges. Notice how awkward and weird this has become? Some judges must judge both elim rounds--whom to pick? No one can go home until the first full elimination round is through, so that requires every judge to stay an extra two hours. Many of those judges will have nothing to do for the first two hours--I don't need them for a round. They just have to wait. Is there a better way to break a number of teams that isn't a power of 2?

Double-elimination tournaments can take on any even number of teams, so the above case of 20 is no particular problem, but they run into a different problem very quickly: the double elimination rule usually produces odd numbers of teams during the tournament for some rounds. If the tournament is run with brackets, then you can see how the math works quite easily. With 20 teams to start, ten will be undefeated and ten once-defeated after round 1. After round 2, five will be undefeated, ten will be once-defeated, and five twice-defeated and eliminated. That leaves fifteen teams and the perennial problem: some team has got to get a bye in round 3. Round 3! This is less fair than a team getting a bye in the partial elimination round in the single-elimination tournament. Is there no other option?

My proposed solution is the scheduled-elimination tournament. The plan is quite simple:
  1. Do not eliminate teams that are undefeated.
  2. You must eliminate teams that are twice-defeated.
  3. Decide which once-defeated teams to keep based on speaker points or preliminary seed.
  4. Always keep an even number of teams.
  5. Undefeated teams must debate undefeated teams; once-defeated teams must debate once-defeated teams; one pull-up is allowed.* (see note below for fun substitution!)
In practice, a scheduled-elimination tournament would look quite similar to a double-elimination tournament. Any (even) number of teams could break. There's an undefeated and once-defeated bracket going on in elimination rounds, just like in a double-elimination tournament.* (not necessarily--see note below!) But in many ways, the scheduled-elimination tournament is more similar to a single-elimination tournament: losing one round makes a team eligible for elimination. The tournament could decide to keep most of the once-defeated teams around, or eliminate most of them. It's up to the tournament. A once-defeated team might stick around to win the tournament, but only if the team had high enough speaker points or preliminary seed in order to never be eliminated. This being a mathematical problem, I made a graph to illustrate.

The single-elimination option (green) and the double-elimination option (red) create a lower and upper boundary on possibilities for the scheduled-elimination tournament (anything in the gray area--and yes, I chose gray for its symbolism). So long as the tournament keeps the remaining number of teams in the gray zone, then it has abided by condition 1 and 2 that I specified above. The gray zone represents all the once-defeated teams in the tournament. If the tournament cuts closer to the green curve, it eliminates most of the once-defeated teams. If the tournament goes closer to the red curve, it keeps most of the once-defeated teams. As you can see, the red curve is flat at first--no one in a double-elimination tournament is eliminated after only one round.

You might wonder what the two marked points, (3.32, 2) and (6.16, 2), are. This represents how many rounds each type of tournament will need to have, because two teams remaining leads immediately into the final round. The single-elimination tournament needs to have 3.32 rounds, plus the championship round. About one-third of teams participate in the partial round (eight out of 20), thus the 0.32, then the first full elim round will be sixteen teams, the second round will be eight, the third round will be four, and the fourth round will be two teams--the championship round. Four rounds, plus a partial. Similarly, the double-elimination tournament will need to have 6.16 rounds, plus the championship round. That means you're looking at seven or eight total rounds, including the championship round, depending on how the byes and pull-ups go. The scheduled-elimination tournament would have more than the four plus partial (so really five) elimination rounds of the single-elim tournament but fewer than the seven elimination rounds of the double-elim tournament.

There is considerable choice in that gray zone for how to run a scheduled-elimination tournament. One option would be to run what is almost a single-elimination tournament--but with no partial elimination round to start. Here's an example:

The tournament starts with 20 teams entering: (0, 20). Ten team remain after round 1: (1, 10). (The round itself is really the line segment from (0, 20) to (1, 10): starting with 20 and ending with 10 is round 1's effect.) After round 2, there are five undefeated teams, but the tournament keeps one once-defeated team for a total of six teams: (2, 6). There could be as many as three undefeated teams after round 3, but the tournament keeps four teams on just in case: (3, 4). After round 4, only two teams remain (4, 2), and then round 5 will be the championship round between those two teams. By keeping on perhaps two teams that are once-defeated (after rounds 2 and 3), this method eliminates having the partial elimination round where twelve teams sit around pointlessly, and it has made managing the judging pool much, much more predictable. The tournament will use: 100% of its judging pool for round 1, 50% for round 2, 30% for round 3, 20% for round 4, and 10%--the final three judges--to decide the championship round.

Another option is that a tournament could basically hew as close as possible to running a double-elimination tournament, yet avoid the problem of byes, by using a scheduled-elimination tournament. Here's an example:

As you can see, this tournament eliminates no team after round 1 (all twenty remain), eliminates six teams after round 2 for fourteen remaining, eliminates four teams after round 3 for ten remaining, eliminates four after round 4 for six remaining, eliminates two after round 5 for four remaining, and eliminates two more after round 6 for two remaining. The championship round will be round 7 between the two final teams. (In practice, because of pull-ups, this could potentially violate my second condition in the list above--some twice-defeated teams might stay in. The tournament should probably not cut quite so close to the red curve if it wants to respect this condition. The sequence 20-20-12-8-4-2 might be better in this regard.)

This option lengthened the tournament by two rounds compared to the previous option, but the trade-off is that this tournament eliminated almost no once-defeated teams. In general, it's possible that a once-defeated team goes on to win a scheduled-elimination tournament (e.g., they defeat the remaining undefeated team in the final round and beat them on points) if somewhat unlikely. It's also possible that a once-defeated team survives the cut after round 3, yet even though it wins round 4, the team doesn't survive that post-round 4 cut because its speaker points aren't high enough. This outcome seems reasonable enough to me. Being eliminated based on one loss and low points seems fine to me, although I would generally want my tournaments to stay closer to the two loss and done side of the gray zone. But--it's up the tournament to decide what makes sense for their goals and available time and judges.

I think I would add one other condition to a scheduled-elimination tournament, for a total of six. Condition #6 is: "Once the tournament begins eliminating teams, never increase the number of teams cut after a round above how many were cut after the previous round." In other words, the curve of cuts should flatten out. In the example graph immediately above, the cuts go: -6, -4, -4, -2, and -2. This seems reasonable and straightforward. It seems beyond silly to have the cuts go: -6, -8, -4, -2, -4, -2. Put them in a more sensible order.

I've made the applet available for you to use here: You can change the number of teams, and move the number of teams remaining after each round up or down. The line segments between each round will only show if you meet my sixth condition of eliminating fewer (or the same) number of teams after each round than the previous round.

* Fun addendum: This condition can be swapped out for a different one. The teams do not need to debate within brackets--i.e., several undefeated teams could debate once-defeated teams--so long as no cut is ever more than one-half of the teams remaining (which just seems reasonable and fair). The "no-more-than-half" rule can be substituted for the bracket condition without any risk of violating the first condition to not eliminate undefeated teams. The proof of this is fairly elementary, but let's think through an example first. Say 100% of the teams still in the tournament are undefeated (because you eliminated all the once-defeated teams). After one additional round, 50% will be once-defeated. That turns out to be the worst possible case, so the "no-more-than-half" rule keeps us on the happy side of condition #1.

Let's do this more conclusively with a bit of algebra. Say x% of the teams were undefeated and y% were once-defeated (and obviously x+y=100). If x>y, then y undefeated teams might debate y once-defeated teams, leading to y% of teams remaining undefeated. The remainder of undefeated teams, x-y, will have to debate themselves. So (1/2) (x-y) will also be undefeated through that pathway. That means we have y + (1/2) (x-y) undefeated teams, which simplifies to (1/2) x + (1/2) y, or (1/2) (x+y). Since x+y=100, that means 50% will be undefeated. The worst case scenario is that as many as 50% of the teams are undefeated. Never eliminate more than half of the teams remaining after a round, and the bracket condition can be dropped. It's a huge benefit to be able to drop it! This makes many more rounds possible--so you can avoid schools debating themselves, or opponents debating each other multiple times, until the very end of the tournament. Yay!

Thursday, March 21, 2019

Daylight Savings Time

Why do we bother with daylight savings time? And... why do we call it "daylight savings" anyway? There's a simple answer that becomes apparent when one looks at a graph of available daylight throughout the year.

The data below is pulled from wunderground for Portland, Oregon, but adapted by me. I looked up the statistic of actual time day length for each month (on the 21st day), which roughly tells us how many hours of daylight there are (it's sunrise to sunset, so it ignores twilight time). I adapted it by imagining that solar noon--when the sun is at its highest point of the day--is clock noon--when the clocks say 12:00. Solar noon is NOT usually clock noon, but here's what it would look like if it were:

The solid line in the center is both solar and clock noon.

As you can see, this is going to work out fairly well in the late fall and winter months. Sunrise in October is just a bit before 7 am and sunset is after 5 pm. Dark December has a sunrise just before 8 am and a sunset just after 4 pm--at least young children can go to and return from school before it is dark. Given that there's so little daylight, it's hard to imagine how we'd want to split things up differently in winter. What should we do with December? Have sunrise at 9 am so we can have sunset after 5 pm? This seems absurd. Have sunrise at 7 am and then have sunset at 3 pm? Even more ridiculous. Solar noon = clock noon seems like the best solution.

The solar noon = clock noon is called STANDARD time, and it is what we do in the winter in the U.S. On December 21st, the sunrise was at 7:49 am, the sunset was at 4:30 pm, which means solar noon was at 12:10 pm. Easy peasy, so why not keep STANDARD time all year?

Look again at the chart above, specifically at June. Does it make sense to have the sunrise at nearly 4 am? Are many people going to enjoy the extra hours of sunlight then? Probably not many. It makes more sense to move the clocks forward one hour so 4 am sunrise on June 21st becomes instead a 5 am sunrise--but that also shifts sunset from 8 pm to 9 pm, an extra hour of sunlight in the evening when people are awake and can enjoy it. This is called daylight SAVINGS time because we're "saving" the time from the morning, when few are awake to enjoy it, and then "spending" the time in the evening, when nearly everyone can benefit from it. In truth, Portland, Oregon is far enough north that we could benefit from two hours of a clock shift--5 am sunrise to 7 am and 8 pm sunset to 10 pm sunset--but the one hour of clock shift is a compromise with more southerly states.

Because the day lengths change very little near the equator, it makes no sense for countries located in the tropics to do anything other than standard time. Sunrise will be approximately 6 am, and sunset will be approximately 6 pm, month after month. For lower/mid-latitude countries, a one-hour shift in the summer--so the extra sunlight is "SAVED" for the evening hours--makes some sense. For higher/mid-latitude countries, perhaps two hours or more of a shift makes sense, but beyond perhaps two hours, you've maxed out the benefit. Who cares if daylight goes to 11 pm or midnight? Very few people would want to be awake to enjoy the sunlight.

In fact, beyond even a certain high latitude, even shifting clocks an hour is pointless. If you're inside the Arctic or Antarctic Circles, you're going to get 24 hours of sunlight. Why change the clocks at all from standard time? You don't need to "save" morning sunlight for the evening--you're going to get sunlight all evening anyway.

To sum up, if people were to decide this rationally based on latitude alone:

Equator - standard time all year
Mid-latitude - standard time in winter, one or two hour later shift in summer
Arctic circle - standard time all year

Wednesday, December 12, 2018

Tabulation software

Hi all,

I've been thinking about how to run tournaments for many years and publishing articles on it. My published ideas have ranged from geographic mixing, logit scores, and new methods for strength-of-schedule pairing and constrained side equalization assignment.

I've finally gotten around to putting all the ideas into a single, programming-ready document. I'm putting it out there as a Creative Commons Attribution (BY) license, version 4.0. Please feel free to use any ideas contained herein, as long as you attribute me.

Tuesday, August 7, 2018

Approval voting and primaries

California and Washington both use the top-two, open primary method in their elections: voters get to pick from, regardless of party, any primary contender to go onto the general election. The top two vote getters in the primary, regardless of party, move on to the general. One consequence of this is that a party could get "locked out" of a particular race if none of its candidates qualify for the top two spots. As a result, the parties have been especially concerned with having too many candidates in a race and splitting its voters into too-small factions, thus depriving any of the party's candidates from making it onto the general ballot. See this article for a description of the problem.

There's a very easy, very simple fix for the second part of this problem: approval voting. Here's my two-sentence description of approval voting:

Each voter can put a check next to as many candidates as they approve of, leaving disapproved-of candidates blank. The candidate(s) with the most votes win(s).

That's it. Ballots look the same. It's not complicated to explain. And approval voting lends itself to virtually no strategic voting (i.e., faking your preferences on the ballot to try to induce your desired outcome to happen).

In the top-two primary, everything would work the same, except that voters wouldn't get one choice; they could vote for as many candidates as they like. ABBAs could vote for every ABBA candidate, and BeeGees could vote for every BeeGee candidate. Or ABBAs could vote for most ABBA candidates and some centrist BeeGees. Or a centrist could vote for some centrist ABBAs and some centrist BeeGees. Let's imagine a scenario in which a district is 51% ABBA voter and 49% BeeGees. Let's say each side nominates three candidates: A, B, and C for the ABBAs, and X, Y, and Z for the BeeGees.

In a hyper-partisan environment, 100% of ABBA voters approve of A, B, and C, and 0% approve of X, Y, and Z; vice versa for all the BeeGee voters. Because there are slightly more ABBA voters (51-to-49), therefore the top two candidates will always be some combination of A, B, and C (more on this in a second). The BeeGee would be locked out. However, this lock-out has nothing to do with how many candidates the BeeGees nominated. It would have happened whether they nominated two, three, four, or a hundred candidates. The lock-out is the result of the hyper-partisan environment, not the number of candidates nominated splitting the vote. No matter how many candidates the BeeGees nominate, they all get 49% of the vote and fall short of the general ballot.

Let's go back to that issue of which two of A, B, and C make the general ballot in the hyper-partisan environment. If it is truly a tie--all three got exactly the same number of votes--then some tie-breaking mechanism would have to be employed. They could draw straws, or the ABBA party chairperson could decide because all of the candidates are its own. But this three-way tie seems fairly unlikely. Would primary ABBA voters be so united in support of all three candidates that they give 100% approval to each? I guess this is an argument that such hyper-partisanship seems unlikely; it's more likely A gets 95% approval from ABBA voters, B gets 90%, and C gets 70% or some such split. If there aren't at least two of ABBA's candidates that get 100% approval from ABBA voters, it opens up the possibility that a BeeGees candidate can make it to the general election.

Furthermore, it seems unlikely that there are no unaffiliated voters exist and that none of the partisans ever cross-over. Even in today's highly partisan environment, people can and do split tickets, switch parties, and cross-over. (I reserve hyper-partisanship to mean zero behavior exists.) Some ABBA voters might approve of A, B, and Z. Some centrist voters might approve of B and X. Having unaffiliated voters and cross-over votes doesn't guarantee ABBA candidates or BeeGee candidates won't be locked out--but it does make it less likely. Even in a highly partisan environment, candidates with cross-over appeal might be at somewhat of a practical advantage. Winning 99% of 51% of the total votes (almost all ABBA voters) is 50.49% of the total; winning 90% of 51% of the total votes (most ABBA voters) and 10% of 49% of the total votes (a smattering of BeeGee voters) is 50.8% of the total votes. As a real-world matter, I think it's harder to get complete party unity behind a candidate (that is, 99%) than it is to attract a couple percentage points from the other party. Maybe I'm wrong, but look at this graph of presidential approval ratings. Of the twelve presidents of the modern era, nine were able to pick up more support from the opposition party than they lost from their own. Only two were better able at holding their own party together than at attracting opponents (Barack and the Donald). The twelfth case, Jimmy, did dismally with both parties. The average trend is that pulling in opposition is easier than preventing any defections.

In an approval voting scheme for a top-two primary, it's possible that a party gets locked out, but the cause would not be how few or many candidates they nominate. A party would get locked out if (1) the other party had more voters and had at least two candidates they completely unified behind or if (2) the other party had at least two candidates with cross-over appeal. Scenario 1 seems unlikely as an empirical matter; scenario 2 seems like it fulfills the exact purpose of top-two primaries of selecting the two best candidates overall--who just happen to be from the same party, but expanded their support beyond it.

By the way, the approval voting scheme makes sense for regular primaries too, or any time voters have more than two choices they need to whittle down. I use it in meetings whenever we have more than two options to consider to find out where the general consensus lays.

There's not much an individual voter can do to vote strategically. Some might consider giving an approval vote to the candidate I find least objectionable from the party I disagree with, if I think it's inevitable that the other party will get one candidate in. (In other words, it's inevitable, so chose the weakest opposition.) This seems an unlikely scenario, however, and a risky strategy. When do I know the other party is nearly guaranteed a spot in the general election? Only when my party has nominated only one candidate or only one strong candidate (so, unlikely). And it's a risky strategy: my approval vote for the weakest opposition might be enough to push TWO of the opposition party's candidates into the general election, excluding my candidate entirely. Let's say the standings look like this, including my vote for my candidate but not yet voting for the opposition candidate:

My candidate - 51%
Opposition candidate I hate - 49%
Opposition candidate I would prefer - 49%

In this case, I do get to decide which opposition candidate we face in the general election. But the scenario could just as easily be this:

My candidate - 49%
Opposition candidate I hate - 51%
Opposition candidate I would prefer - 49%

In this case, the opposition candidate I hate is inevitable. My vote for the opposition candidate I prefer knocks out my candidate. (At least before, my candidate might have won the tie-breaker for second place.)

Who can say which scenario is likely to happen in a close race before the voting is done? This strategy is incredibly risky when everyone votes before votes are tallied.

Friday, July 27, 2018

Random matching in debate tournaments

Every debater knows the predicted number of teams with each record when power-matching is used:

and so on. But how would it work without power-matching? What if teams were paired at random? The easy part is using the laws of probability to figure out which matches happen by chance. That's listed in column F.

The hard part is figuring out which team wins. If both teams have the same record, then whichever team wins, the outcome is the same. For example, in round two, the 25 teams in 1-0 vs. 1-0 rounds (ignore the fact that this is odd--it makes no difference in the end) and the 25 teams in 0-1 vs. 0-1 rounds guarantees that 12.5 teams will have a 2-0 record; 25 will be 1-1; and 12.5 will be 0-2. These guaranteed outcomes are listed in column I.

But what happens if the two teams have different records? One possibility is that there are no upsets at all. For example, in round two, of the 50 teams in 1-0 vs. 0-1 rounds, exactly half are 1-0s. These 25 teams might all win--no upsets--and become 2-0s. The 25 teams that are 0-1s all become 0-2s. These no-upset results are listed in column J.

The other possibility is that all rounds with mixed records have upsets. In round two, of the 50 teams with 1-0 vs. 0-1 rounds, the 25 teams that are 0-1s could all win, becoming 1-1s, while the 25 teams with 1-0s all lose, become 1-1s. Thus all 50 teams end up 1-1. These all-upset results are listed in column K.

Of course, neither no-upsets or all-upsets is realistic. From other research I've done, it turns out the upset rate is more like 20%, so I blended the two results 80:20 no-upsets:all-upsets in column L. As you can see, the ultimate outcome is that each record is nearly balanced with the others, though slightly more in the mediocre results. For example, after three rounds, a 20% upset rate results in about 17 teams that are 4-0s; 22 teams that are 3-1s; 23 teams that are 2-2s; etc.

Yet the 20% upset rate is probably conservative. It is unlikely that an 0-3 team has a 20% chance against a 3-0 team. As the teams are farther apart in record in later rounds, the overall upset rate must drop. If this is so, the final outcomes flatten. It turns out that if the upset rate is 1/6 for round two, drops to 1/8 for round three, and further drops to 1/10 for round four, then the final outcome is that exactly 20 teams are 4-0s; 20 are 3-1s; etc.

What happens if teams are paired at random? It depends on the upset rate. If it's exactly 50% (which is far too high), then the final outcomes look exactly like it would with power-matching:

If the upset rate is a more realistic, empirically justified 20%, then the outcomes are much flattened and nearly equally distributed:

Here's the sheet for anyone who'd like to play around with it.

Wednesday, May 16, 2018

Why debate tournaments have been doing side assignment wrong

Side assignment is easy, right? In odd rounds, assign teams to sides at random. In even rounds, assign each team to the opposite side as the previous round. What could be easier?

The problem is that this makes even rounds harder to pair. Any tournament director can tell you that even rounds often "lock up" and that one has to break brackets to make matches. I know I've sat at a screen, wishing the two 5-0s that are both due Aff could hit, instead of each getting a pull-up.

I stumbled on an alternative, what I call the constrained side equalization (C.S.E.) method. Instead of balancing Aff-Neg rounds at the end of even rounds, this method works its magic at the end of odd rounds. Here's the C.S.E. in action:

Rd 1 - paired at random
Rd 2 - paired at random, ignoring sides. If both teams were Aff in round 1, or both Neg in round 1, it's a computer flip-for-sides. If one team was Aff and the other was Neg, then the sides are equalized.

At the end of round 2, about 25% of teams will have two Affs, 25% two Negs, and 50% will be balanced. (It depends on the random pairings.)

Rd 3 - Teams with two Affs must go Neg; teams with Negs must go Aff. The balanced teams are not assigned to either side. If a balanced team is matched against a two-Aff team, then the two-Aff team goes Neg. Likewise, if a balanced team is matched against a two-Neg team, then the two-Neg team goes Aff. If a two-Aff team is matched against a two-Neg team, then the sides are equalized. And if a balanced team is matched against a balanced team, then it's a computer flip-for-sides.

At the end of round 3, every team will either have had two Affs and one Neg, or two Negs and one Aff. In other words, at the end of an odd round, the sides are "equalized."

The cycle repeats. Round 4 is paired at random, ignoring sides. Round 5 has the constraint that teams with three Affs must go Neg and teams with three Negs must go Aff; otherwise, any team can be paired against any other. If the tournament ends on an odd round, there's no special other consideration. If the tournament ends on an even round, you'd want to pair teams in the typical way for the final prelim.

Mathematically, it is as simple as this rule:
If the Aff rounds - Neg rounds is 2 or -2, then the team is assigned a side first, then paired with an opponent; otherwise, a team is assigned an opponent first, then assigned a side (to equalize if necessary).
This works in odd or even rounds.

But why go to all this bother? The reason is simple: constraints.

 OddEven Avg. 
 Trad. 100%50% 75% 
Alt.  87.5%100% 94% 

In a traditional method, in odd rounds, 100% of possible matches-- 0.5 * (n (n - 1)) --could be considered. There are no side constraints in odd rounds, so anyone could be matched against anyone. But in an even round, a tournament is limited to a fourth of (n (n - 1)). A due-Aff team can only be matched against a due-Neg team. This is a huge constraint.

Using the C.S.E. method, in odd rounds, teams with more Affs must go Neg and vice versa. Aside from this small constraint (only about one-eighth of possible matches ruled out), nearly anyone can debate anyone. And in even rounds, it's 100% of possible matches that can be considered. The C.S.E. method has much lower overall constraints than the traditional method.

In other words, the odd C.S.E. round is considerably easier to pair than the even traditional round (21 times better odds of finding a good pairing, in fact). If a side assignment for C.S.E. happens to not turn up a suitable pairing, why, you can reshuffle the teams--switching some randomly selected teams' side, excepting the couple side-constrained teams--and try again. This works whether it's an odd or an even round. In the traditional method, you can only reshuffle with an odd round. You're stuck with the even round side assignments you get with the traditional method. This inability to reshuffle the teams means the tournament can lock up. In the C.S.E. method, because any round can be reshuffled, there's always another chance to find a good pairing.

I worked out an example here. At the end of five rounds of C.S.E., every team had either two or three Affs. The method yielded side "equivalence."

But, intriguingly, the teams took different paths to get there. Some went Aff two times in a row. Some alternated. Although all the paths end with one of two correct results--two or three Affs--there were more path types to get there and thus more options to pair the teams. More paths = more flexibility. We've been doing side assignment the hard way!

Saturday, April 7, 2018

Experimental verification of the logit score

One method for ranking teams that I introduced to the debate community is the logit score. The logit score is derived from a logistic regression. The logit score combines a team's record, speaker points, and its opponents' strength into a single number. Because the logit score factors in record and points, it is performance-based, but that record is adjusted by opponent strength, making the logit score more fair than record alone. A win against a good team is "worth" more than a win against a weak team. If you take the worst opponent a team beat and the best opponent it lost to, and average those together along with the team's average speaker points, then you're approximating the team's logit score. Due to how the logit score is calculated, it is the likeliest team strength that explains its results: its record and its points.

I had previously looked for empirical support for the logit score in a college debate season. I took the real results for the entire season and used them to calculate each team's logit score. I then used those to retrodict the winner in every single match-up that had actually happened, with the higher ranked by logit score team retrodicted to win the round. The logit score did this better than every other ranking method I also tested, slightly edging out median speaker points, and doing better by a goodly margin than the win-loss record. Despite this success, there was the nagging concern that the logit score was being derived from an entire season's worth of information. This empirical support could not show if the logit score would work for a single tournament.

Therefore I set out to do an experiment. I created a simulation tournament in a program, and ran and re-ran it hundreds of times. I tested various tournament conditions, from random prelims to a typical method of power-matching to pre-matching (like a round robin). I looked to see whether in these kind of conditions--using only the information available in a tournament--the logit score fared as well in comparison to record-based rankings and to speaker point-based rankings.

The results are that, in any condition, the logit score is a vast improvement on the win-loss record, but not quite as good as speaker points. It may surprise people to realize that speaker points, even though they vary considerably from judge to judge, are the best information to rank teams. A team's median speaker points isn't affected too much by one judge. Speaker points are rich data when you only have six or eight rounds to rank a team.

However, I believe many in the community would not prefer to use speaker points alone. If nothing else, ignoring wins and losses gives a perverse incentive to teams to speak pretty and ignore winning key arguments. The logit score is a solid, thoughtful compromise. The logit score is based on both wins and points, so there's no perverse incentive to ignore key arguments--nor is there an incentive to ignore effective, mellifluous communication. Although the logit score is slightly less accurate for a single tournament than speaker points alone, the logit score is far more accurate than win-loss record is. The logit score is, in other words, a vast improvement on the status quo method--a compromise in name only.