Sunday, March 22, 2009

Strength of schedule pairings work!

I created a new power matching method, a "strength-of-schedule" (S-o-S) power matching method. This method was used to run a hypothetical tournament, to compare the results to an actual tournament. (Thanks to Orion and Joe for the idea of running a hypothetical tournament.)

A major (180+ teams) 2008 national debate tournament was the point of comparison. The actual results of the seven preliminary rounds were available online. For the hypothetical tournament, round 1 used the same pre-set matches and results as the actual tournament. Round 2 and every subsequent even round used the strength-of-schedule power matching method: teams who had had good opponents so far faced the weaker ones in their bracket; teams who had had weak opponents so far faced the stronger ones in their bracket. Round 3 and every subsequent odd round used the traditional, high/low power matching method in TRPC. Once the pairing for a round was set, a "ballot" was entered for every debate. If the teams had met at the actual tournament, the real results were entered. If the teams had not met, each side received the speaker points they had earned in their respective debates for that round (against different opponents) at the actual tournament, and the winner was determined using the final rankings from the actual tournament (the higher ranking team won). Then, I power-matched then next round. Thus, the hypothetical tournament results were meant to replicate the actual tournament results as closely as possible, without bringing in any info that wouldn't have been available to a tab director during the tournament.

And the results are...

First, the strength-of-schedule power matching method generated only the correct, minimum number of pull-ups. Teams pulled up were always those with the weakest opposition record in their bracket. Otherwise, all teams were correctly paired within their brackets.

Second, the results from the hypothetical tournament closely matched those of the actual tournament. Of the 32 teams that made it into elimination rounds at the actual tournament, 30 of them would have made it into elimination rounds at the hypothetical tournament. (At the actual tournament, there was a four-way speaker point tie for 31st place, broken on opponent wins. The 30th and 32nd place teams at the actual tournament had different opponents — and lower opponent wins — at the hypothetical tournament and dropped below the threshold.)

Third, and most important, in every bracket except the 7-0s, the hypothetical tournament using the strength-of-schedule power matching method had narrower ranges for opponent wins and smaller standard deviations for opponent wins:


(With so few 7-0s, the addition of one outlier made a large impact. This is not the most revealing statistic.) The range for opponent wins for 6-1s at the actual tournament was nine, but four for the hypothetical tournament. That standard deviation was nearly halved. The range for 5-2s decreased from 15 to seven. The range for 4-3s decreased from 15 to ten.

Another telling statistic is the comparison for the top 32 teams at the actual and the top 32 teams at the hypothetical tournament:


The ranges were the same, but the standard deviation was far lower at the hypothetical tournament, meaning that more of the data points were closer to the mean opponent wins. The middle 94% of the top 32 teams, eliminating the highest and lowest outlier, is even more telling. The range for the actual tournament, even eliminating the outliers, was still 12; the range for the hypothetical tournament dropped to eight. You can see that the standard deviation is nearly one opponent win lower.

The bottom line is, it worked. There was nothing in the process that required human judgment or that would be difficult to do in a computer program:
  1. Retrieve the relevant statistics from the tabulation program.
  2. Assign byes.
  3. Calculate the z-scores of a team's strength and its opponents' strength.
  4. Populate an optimization matrix.
  5. Solve the optimization matrix using the Hungarian algorithm.
  6. Feed the solution back into the tabulation program as the next round’s pairings.
Alternatively, the new power-matching algorithm could be written into an existing tabulation program, eliminating steps 1 and 6.

Read the full paper for a description of the method I used:


And here's the Java code for the Hungarian algorithm I used: http://www.mediafire.com/?y3nqjtmnim2

Sunday, March 15, 2009

Strength of schedule pairings

This is a comparison of running a debate tournament in two different ways. In the left column are the results are from a tournament I helped to run, using the traditional high-low power-matching methods. In the right column are the results from re-running the exact same tournament using my strength of schedule power-matching method. (Specifically, round 1 is the same random pre-set as the original tournament; round 3 uses a traditional high-low power-match; but for rounds 2 and 4 I used my strength of schedule pairing. For each hypothetical round, i.e., every round after round 1, I referred to the actual rankings to determine the winner). Here's the comparison:


Download the Excel workbook: http://www.mediafire.com/?lanmmnnmlmn

As you can see, the groupings are tighter for 3-1s (now a range from 9 to 6) and 2-2s (now from 9 to 6). What is tougher to see is that for the 3-1s, the standard deviation of opponent wins was cut by 33%, from 1.55 to 1.00, and for the 2-2s, the standard deviation of opponent wins was cut by nearly 50%, from 1.86 to 1.05. In other words, for teams in the middle of the tournament, the schedule strengths were much more evened out.

Now, it's true that the range increased slightly for 4-0s and 1-3s. But for the 4-0s, it didn't increase terribly, and for 1-3s, it would have decreased but for one outlier (the team who had only 6 opponent wins). Furthermore, this was a small tournament with team constraints that created a lot of pull-ups; that the results show this level of improvement given those constraints is impressive.

I'm going to test this out on a much larger scale so that the constraints don't impede the algorithm nearly as much.

Thursday, March 12, 2009

Circles, ellipses, and Cartesian and parametric equations

Here is a great problem to introduce students to the utility of different forms of representation. Imagine two miniature train tracks that overlap: one, a circle of radius 4 inches centered at (1, 3) and the other, an ellipse with a long horizontal axis of 8 inches and a vertical axis of 4 inches centered at (0, 4).

1) At what (x, y) coordinates do the tracks cross?
2) If trains are placed at an arbitrary point on each track but travel at the same speed, will they ever collide?

To answer question 1, the Cartesian equations turn out to be the easiest way to go. The circle can be written as:

The ellipse as:


Or, alternatively, the ellipse could be:

At this point, most students recognize that the circle and ellipse equations can be set equal to each other, and they can solve for x or y (whichever is easier) in terms of the other variable, and substitute back in. And that strategy works remarkably well for these equations because the x squared terms cancel out.

To answer question 2, parametric equations are the only way to go. Indeed, a good representation can make the answer to the question obvious and lets one make an even bolder statement. The circle is easy enough to write:



The ellipse requires a little more effort. The problem says the trains are going at the same speed, but they're running on different length tracks. Specifically, the circle track length is 8 * pi, and the ellipse is approximately (square root 40) * pi in length. That means that the period of the elliptical-track train must be adjusted; that train will complete nearly 1.3 revolutions for every 1 the circular-track train completes.

Now, you have something to graph and see what the paths of the trains look like. Comparison of the x-coordinates --


and the y-coordinates --

shows that the trains will indeed cross paths (where the equations cross at the same time t on each graph; equations crossing on the x graph only or y graph only would show where they were on roughly opposite sides of their respective tracks). It becomes obvious with this representation that of course they cross paths because they have different periods (travel time for one circuit around the track). In fact, this representation should make it obvious that "ever" is too weak a question -- they must cross paths, no matter where their starting places. It is especially neat for students to see that there's no least common multiple of the periods possible; that is, the combination of the two functions is non-repeating. Take t to infinity and you'll never see the two trains at the same time in the same places they were at t = 0. Neat.

Monday, March 9, 2009

Half-round robin

This idea grew out of a conversation with a Michigan coach. What if you had only 12 teams and 6 prelims? Obviously, it's going to be hard to power-match the tournament; the brackets are going to be too small too quickly (basically, nothing but pull-ups after round 3). So why not do a half-round robin? The Excel workbook shows one way to make this happen, and, as it shows, if you can seed the teams before the tournament starts, then you can make a pretty fair pairing: http://www.mediafire.com/?qzediimilke

Friday, March 6, 2009

What are normal opponents wins in a given bracket?

I decided to look at several big, well-run tournaments from this year and last year. I chose big tournaments because there's less of a chance that odd results were created by restrictions given small brackets with too many teams from the same schools. I'm not going to include the tournaments' names -- this is just data. Here are the results for teams that broke:
That means at one tournament, the 6-0 with the hardest schedule faced opponents accumulating 25 wins, while at the same tournament, the 6-0 with the easiest schedule faced opponents accumulating only 22 wins. For the most part, the 6-0s at all these tournaments had reasonably narrow ranges (meaning that all the 6-0s had roughly similar strengths of schedule) except at two tournaments: the 16-25 and 19-24 are unusually broad ranges. A result of 25 opponent wins averages out to a little better than a 4-2 record/opponent; 16 OW averages out to worse than a 3-3 record! The spread for 5-1s, however, looks consistently larger, from better than a 4-2 record/opponent average down to a worse than a 3-3 record/opponent average.

The results are even more surprising when looking at seven round divisions:
The 7-0s' ranges look reasonably small, but the 6-1s and 5-2s faced very different schedules of opponents! At one tournament, the luckiest 5-2 faced opponents racking up only 21 wins (an average of a 3-4 record), while another 5-2 faced opponents racking up a whopping 36 wins (an average of better than 5-2) -- a harder schedule than the best 7-0 faced!

Wednesday, March 4, 2009

A hypothetical worst case debate math scenario...

One other little bit of tournament math... Imagine this hypothetical team, call it team uno, the best team at a tournament, in a tournament using the traditional high-low power-matching system. First and second round are random, so it's possible team uno might hit the worst and second worst teams at the tournament in rounds 1 and 2. These two awful teams go on to a 0-6 win-loss record. Then, team uno, being 2-0 with the best speaker points, will hit the worst 2-0 with the lowest speaker points, which could conceivably lose all its remaining rounds and finish 2-4. The same for the next round, hitting the worst 3-0, which could finish 3-3, and all the way through. It is possible in this way for the top team to face six opponents with a combined record of only 14 wins in six rounds. This averages to just slightly better than a 2-4 record/opponent. (Of course, it will likely be better, but it could even be worse if there are an uneven number of teams in each bracket and the top team receives a "pull-up" and debates an opponent with a one-loss worse record in one or more "power-matched" round.) Now, I think the ideal is for the best team at the tournament to face excellent opponents (this is what power-matching systems, like the Swiss system, are supposed to do!) perhaps averaging 4-2 or 5-1 records, for combined opponent wins in the range of 24-30. A far cry from 14 -- which could easily results from pairings that the current algorithm does create and is powerless to flag or rectify.

Tuesday, March 3, 2009

Rat Attack

On Feb. 24, 2009, NOVA ran a special on the mautaam event in India: http://www.pbs.org/wgbh/nova/rats

Apparently, once every 48 years, the bamboo flowers and produces an enormous quantity of fruit, which in turn causes a population explosion among the black rats in the forest. Once the fruit is gone, this plague of rats devours the rice fields, causing a lot of human suffering.

The math is simple: exponential population growth. What is staggering is the sheer rapidity of rat reproduction when there's no limit to the environmental carrying capacity. This is a great, real example to use with your students of exponential growth.

Here are the details I looked up: black rats have a gestation period of 21 days and can have a litter of about 10 pups at a time; female rats can nurse a litter while also pregnant; and rats reach sexual maturity in 6 weeks and therefore can have their first litter at 9 weeks of age. These facts produce explosive -- yes, explosive -- population growth. You can model the growth by tracking the size of each age cohort (newborns, adolescents, mature adults) in a spreadsheet, like I did in this one:



Or, you can model the growth with a recursive formula:where a is the time in weeks. You can also come close with an explicit formula,where t is the time in weeks. I graphed this explicit formula into the spreadsheet as well. The upshot is that the rat population comes close to doubling every three weeks; in the four month season of mautaam, the population grows nearly 120-fold. That's a real plague of rats.

Monday, March 2, 2009

Debate tournament math

Here's how a high school or college debate tournament works: for the first two rounds of debating, each team is randomly assigned an opponent; for the third round, the winners of the first two rounds are assigned other winners as opponents, while losers debate losers. This system continues for several preliminary rounds, "power matching" teams against opponents with the same record of wins and losses, until the top "brackets" with winning records (7-0s and 6-1s, for example) move on to elimination rounds. Thus, the preliminary rounds are a type of Swiss system tournament, a format that is used in chess competition, too. The number of teams in each final bracket follows a perfect binomial distribution (plus or minus one team or two for odd numbers that require a team to be "pulled up" from a lower bracket).

This approach is generally felt to be fair, although it is a recognized problem that a good team could lose the first or second round and would have easier opponents all the way through. How often does this happen? A visualization helps:


Click on image for more detail.

These are the preliminary varsity policy debate results at the 2009 Harvard invitational high school tournament. (I took out names because I don't want to seem like I'm ragging on any school; I'm really just interested in the math.) Each row represents a different bracket -- the 7-0 at the top, 6-1s one row down, etc., and the 0-7 at the bottom -- and each row is sorted best speaker points (left) to worst speaker points (right) in that bracket. Each arrow represents one actual debate between two teams, pointing to the winner but in the loser's row color. Every single round is there, but I bolded the rounds that the top nine teams won. You can see how differently the top teams (the 7-0 and 6-1s) got that record. Some 6-1s, circled, defeated at least three 5-2 or better teams. Other 6-1s, in squares, defeated only one or no 5-2s. The 6-1 on the far left defeated not one team in the top 20%. Perhaps they could have, but they never even faced off against one. They made it into elimination rounds on the basis of an easier schedule than any other 6-1.

Let me make it absolutely clear, I'm not criticizing the folks who run the Harvard tournament. They do a fine job. The problem is not with their execution. I'm sure that at every point, the 6-1s were given proper opponents for their records; the problem is that some of those opponents went on to lose many of their remaining rounds and revealed their weakness. The problem is the method, which is only as good as the current record of each team accurately reflects its true strength. Since this information can't be known in advance, the only solution so far has been to repeat the process many, many times to thoroughly test and properly rank each team in preliminary rounds. Potentially, what you're looking at above is a raw sort that still contains some errors, like ABCEDLFGJIMNP... it's getting better, but there's still a need for further sorting. Consider it this way: the first round is supposed to determine whether a letter is in the first half of the alphabet or not, by picking up two letters at the same and determining which comes first. Generally speaking, this works, and A, B, C, etc., are likely to end up in the first-half pile. But what happens if the letters you pick up to compare are T and W? T will be misleadingly placed in the first-half pile, and you hope that this doesn't happen two, or three, or seven times in a row, but clearly, it can and did happen, and a team made into the top 6% without ever facing an opponent in the top 20%.

Randomness isn't enough. There needs to be an element added to power-matching that controls for strength of schedule. If you need further convincing, here are the 5-2s highlighted:

Click on image for more detail.

The circled 5-2s defeated at least one other 5-2. (It's hard to see those blue arrows, so click on the image for expansion first.) The 5-2s in squares defeated only one or two 4-3s or better -- that is, they made it into the top 20% and elimination rounds on the basis of defeating only one or two teams in the top 40%. That's quite a disparate schedule: debating other 5-2s and several 4-3s, or debating a few 4-3s and then several teams that are weaker.