I used a matched pairs method: for each team, there is a matched pair of results: that team's win percentage on the affirmative, and that team's win percentage on the negative. If the two results show no difference, then the team did equally well (or equally poorly) on both sides of the topic. If the two results do show a difference, there are three possible explanations: (1) the team isn't equally strong on both sides of the topic, e.g., the 2A isn't as good as the 2N; (2) the team hit an unequal set of opponents on the two sides; or (3) there is side bias on the topic. When one looks at all the teams on a topic, (1) is unlikely because the whole point is to control for team strength by assuming that the average team is equally strong on either side, (2) cancels out when one looks at the entire pool, and (3) is left as the most plausible outcome. Although this method still relies on the assumption of invariant strength (that a team has a fixed strength, the same on both sides of the topic, unchanging throughout the year), so does any other method that attempts to control for team strength. With those disclaimers, here are the results:

The third column shows the mean of the matched pairs computation: for each team, I subtracted its negative win percentage from its affirmative win percentage, and I averaged this score over all the teams that year. The fourth column shows a calculated (not the actual) affirmative win rate. They compare closely to the actual rates A Numbers Game already found. The two that are highlighted differ slightly. The affirmative win rate for the China topic my analysis suggests is slightly lower than the actual rate. The affirmative win rate for the courts topic my analysis suggests that the negative had an advantage, while the actual rate showed an affirmative advantage.

Category 1 is roughly the 0-50th percentile (in terms of rounds of competition); category 2 is 50-75th; category 3 is 75-87th; category 4 is 87-94th; and category 5 is 94-100th. You can see the results clearly in both: the less experienced teams had greater success on the negative; the more experienced teams did better (relatively or absolutely) on the affirmative. The reason why the originally calculated affirmative win rate was too low was because there are so many more less experienced teams that bring down the average matched comparison -- but they debate few rounds, so they do not have a big effect on the total ballot count. (I looked at the same tables for other years, but there were no patterns as clear as '05-06 and '06-07.)
One final note: All of the whole years' analyses are significant to at least the 95% level except for '06-07, which is only significant at about the 85% level. I used a 1-sample t-test, since the distributions are more or less normal:

(This is '08-09, but all the distributions look like this.) In fact, the distribution looks a little tighter than the normal curve (given the population's mean and standard deviation), since so many teams have 0% aff-neg win spread. The cumulative frequency graph is even more persuasive:

The blue line represents a normal CDF, given the population's ('08-09) mean and standard deviation. Anything below the line on the left or above the line on the right is tighter than normal. The reason is that the standard deviation is pulled way out by the teams that competed for few rounds (who had very high variability in the spread, from -1 to 1). For '08-09 for example, the standard deviation for the whole population is 0.31; excluding teams with fewer than 9 rounds experience, 0.22.
Based on A Numbers Game's question, I created a Lorenz curve for '08-09:

A few data points in words:
The top 5% of teams debated 19% of all rounds.
The top 10% of teams debated 34% of all rounds.
The top 20% of teams debated 54% of all rounds.
The top half of teams debated 83% of all rounds.