Regular correspondent Mr. Pokery wrote the following after last Sunday's NCAA wrap.
Your data tends to support the strength of the SEC, with eight teams currently headed for, at worst, half-decent bowls. (If anything, you tend to do so less strongly than other pundits, with the SEC returning seven teams in some top 25s on more than one occasion this season, which is apparently remarkably high.)
Readers will know our position on human polls: they are a valid measure of which sides are popular and fashionable, but are of little value to determine which sides are actually best. We would also point out that a lot of the big sides have yet to play each other, and their +6-2 records mostly come from beating up the small sides. This factor will disappear as the season unwinds.
I enjoy comparing your standings with the different polls used in the BCS standings. My gut feeling, without having formally performed comparisons, is that your poll has most in sympathy with the RB computer rankings, whatever they are.
One of the things we did for our own interest late last year was to work out the average difference between Glickoblog rankings and each of the computer rankings used for the BCS. And, indeed, the average difference between each of those rankings. RB is Billingsley, by the way...
(goes off and bumps into Spearman's Rank Correlation).
Two figures for each of the BCS components: the ρ of how much we agree with their complete list, and of how much we agree with their top 25.
Andersen 0.9374 0.4582 Billingsley 0.9640 0.8246 Colley 0.9284 0.5693 Massey 0.9236 0.5040 Sagarin 0.9046 0.4873 Wolfe 0.9229 0.4152
For the latter comparison, we disregard teams that each ranking puts outside its top 25, and all the sides that Glickoblog ranks outside the top 25 are deemed to be of rank 26. Original data from Massey
Yes, it appears that the Glickoblog rankings do correspond much more closely with Billingsley's than any other, including the Sagarin-ELO method, which one might naively expect to closely match. Why is this happening? Billingsley no longer publishes any detail of his algorithm, and the previously-posted description was long on waffle and short on detail. From the description, and observing the progress of Notre Dame (bottoming out in the 50s) we suspect that his formula is approximately replicating a Glicko system, albeit through a different method.
We'll return to this matter at the end of the regular season.
I'd love to see more analysis of just how crazy seasons' results are in comparison to each other, for reasonable-looking craziness metrics other than "number of #2 seeds upset".
At this point we tickle the soft underbelly of the Upset-o-meter, and ask it to fetch some wrapped sweets. Define your definition of upset - we propose 30 points as a minor upset, 35 as a moderate, 40 as a major - and simply count.
Another gut feeling of mine suggests that you may need to increase the number of idle weeks used in the re-rankings between the 2007 and 2008 seasons substantially even over the number used between 2006 and 2007.
We will see when the season ends: maybe reducing the spread to ±1.5σ will also help to produce more credible results. We should remember this: the Glicko system derives most of its data from the last dozen or so matches. At present, the pundits are basing their information on about seven data points per side, as if the previous years counted for nothing. We disagree here: many of the players and coaches remain in place, and successful sides in previous years are able to attract weaker opposition, and schedule more home matches than away ones. This disparity needs to be included, and we reckon the best way is to carry over some hard facts from last year. How much? Maybe less than we did this year, in turn less than we did last year.
A link I've been saving for you for some time, comparing the perceived advantages and disadvantages of computer and human rankings. Another gut feeling (for gut feelings are easy in a way that reasoned arguments are not) is that margins of victory *are* important, somehow, though I'm prepared to concede that this might just be a prejudice against throwing away even a single scrap of data... :-)
The trouble with imposing a margin-of-victory criterion is that it creates an artificial boundary condition, one that simply isn't present in the actual game as it's played on the field. Suppose that a side is leading 31:7 half-way through the third period. Which is the more valuable win: their opponent recovering to win 34:31, the side surviving a recovery to win 31:28, the sides continuing to score as they did, finishing 41:10, or the side extending their lead to win 63:10? Do we decide that games decided by more than two clear tries are worthy of bonus points, as the IRB does? Do we award partial credit for games decided by six or fewer points? Ultimately, margin of victory requires a value judgement: it insists that some wins are more valuable than others, yet there is no consensus on the value matrix to be used. Indeed, there cannot be such a consensus opinion.
We have considered splitting the points for games won in extra time (say 3/4 to the winner, 1/4 to the loser), for this does actually reflect the game as it's played on the field. Discarding margin of victory does reduce the amount of data, but we cannot actually see any value in using that particular datum, and suspect that it might actually be harmful to a sound ranking.
NCAA |
