This is a comparison of running a debate tournament in two different ways. In the left column are the results are from a tournament I helped to run, using the traditional high-low power-matching methods. In the right column are the results from re-running the exact same tournament using my strength of schedule power-matching method. (Specifically, round 1 is the same random pre-set as the original tournament; round 3 uses a traditional high-low power-match; but for rounds 2 and 4 I used my strength of schedule pairing. For each hypothetical round, i.e., every round after round 1, I referred to the actual rankings to determine the winner). Here's the comparison:
Download the Excel workbook: http://www.mediafire.com/?lanmmnnmlmn
As you can see, the groupings are tighter for 3-1s (now a range from 9 to 6) and 2-2s (now from 9 to 6). What is tougher to see is that for the 3-1s, the standard deviation of opponent wins was cut by 33%, from 1.55 to 1.00, and for the 2-2s, the standard deviation of opponent wins was cut by nearly 50%, from 1.86 to 1.05. In other words, for teams in the middle of the tournament, the schedule strengths were much more evened out.
Now, it's true that the range increased slightly for 4-0s and 1-3s. But for the 4-0s, it didn't increase terribly, and for 1-3s, it would have decreased but for one outlier (the team who had only 6 opponent wins). Furthermore, this was a small tournament with team constraints that created a lot of pull-ups; that the results show this level of improvement given those constraints is impressive.
I'm going to test this out on a much larger scale so that the constraints don't impede the algorithm nearly as much.