Tuesday, August 25, 2009

Judge side bias -- the whole field

I've been intrigued by A Numbers Game's information about judge side bias. Rather than looking at particular judges, I wanted to look at the distribution of all the judges. I plotted the (prelim) judging records of all the college tournaments in '04-05 (data from Bruschke's Debateresults.com)

The vertical axis shows how many prelim rounds a judge judged; the horizontal axis shows the percentage of aff wins a judge gave. As you can see, the shape of the dot plot shows a clear pattern: as judges see more rounds, they cluster more tightly around the mean [.4840, according to A Numbers Game], and the plot narrows at the top. This is as it should be: the more rounds you judge, the less likely you get lots and lots of strong aff. teams (or weak aff. teams) by chance. But how to quantify this pattern and ask exactly how it compares to what we'd expect to see by chance?

What makes this challenging is that the sample sizes differ for each judge. Some judges judged 80 rounds, some two. Normally, you'd expect to have a standard sample size. So I calculated the probabilities at various sample sizes, specifically, at a six-round increment to make the lines. The green lines mark the 68% confidence interval (roughly 1 standard deviation above and below the mean ); the yellow lines the 95% confidence interval (roughly 2 standard deviations); and the red lines mark the 99.8% confidence interval (roughly 3 s.d.s). This is an unusual method, so let me explain what I think it shows. If every judge saw 36 rounds, then we would expect 68% of those judges voted aff. between 40 and 60% of the time (you can trace to see that the green lines do indeed pass through these points). So far, so good. My somewhat original assumption is that you can add these slices up: since for x rounds judged, 68% of those judges in that horizontal slice should be between the green lines, then for all the judges in the whole vertical column, 68% should be between the green lines.

It turns out a little worse than expected, at 33.5%, but not by much. And, as A Numbers Game has shown, this is about the same for the other years, too.

1 comment:

  1. I enjoyed reading this post. I have posted a response on the original wiki page, rather than here, so that I can maintain the comment under version control.

    The short version is: I agree that the judges in the charts are just the tails of a normal distribution, rather than outliers. On the other hand, I can't reproduce your results. My tests show that the distribution isn't any tighter than we would expect by chance.