Rod's Sports Economics: Ruminations on RSD

Monday, April 25, 2016

Ruminations on RSD

This is spurred by a couple of blog posts at Phil Birnbaum's site ("Noll-Scully doesn't measure anything real") and TangoTiger's site ("Trap of Noll-Scully"). And I also hope it helps with some Twitter discussion that didn't work out there (@BMMillsy, @guymolyneux, @dataandme).

Here's what I understand about the Noll-Scully RSD measure plus the best that I can make out about the issues raised above.

Thought process. The standard deviation of final winning percent is useful in relation to competitive balance (which also follows from the underlying distribution of talent in the league). Suppose one version of a perfectly balanced league, given by Pr(win)=0.5 for all teams and all games. Start this version of a completely competitively balanced league at G₁ games. Change it to G₂ games. What happens?

Let ISD bet the standard deviation of this version of a completely balanced league. Fort and Quirk (Journal of Economic Literature, 1992) show that ISD=0.5/sqrt(G) for the binomial without ties. Thus, moving from G₁ to G₂, ISD(G₁) will be different than ISD(G₂) because ISD depends on G. This helps to make clear that in general, the standard deviation of winning percent depends on G as well. Let ASD be any standard deviation of winning percent from the league.

Now let's think about what to do with this knowledge of the distribution of winning percent.

Here is what I get from the discussion/Twitter noted above. Suppose we have a statistic Z that measures the outcome of applying the talent distribution in league play. In a league with season length G₁, we get Z(G₁). If the league changes to season length G₂, we get Z(G₂). Define a successful Z to have the following characteristic: Z(G₁) = Z(G₂) because the underlying talent distribution is the same in either case, just applied in leagues of different season length.

So, how to reconcile Z and ASD? Even in a league of equal playing strength, so that Pr(win)=0.5, ISD changes with G and so will ASD generally.

Let’s consider three alternatives (there may be more).

Alternative 1: Z^* = ASD ± dASD/dG. Now it will be the case that Z^*(G₁) = Z^*(G₂) because the impact of just changing season length will be netted out by calculation of the impact of G on ASD and addition or subtraction. This is sort of like an “inflation adjustment”. The distribution of talent didn’t change and our Z^* provides the comforting result that it is the same for either G₁ or G₂. Of course, this requires knowing dASD/dG.

Alternative 2: RSD = ASD/ISD. It is immediately clear that there is no way that RSD can be a successful Z. It will always change with G and it contains no adjustment that would make it stay the same regardless of G.

Alternative 3: Just dump the standard deviation as useful because you don’t like either of the above. There are other measures of final season competitive balance as a reflection of the distribution of talent. But don’t propose a game-level or playoff access or dynasty alternative since we’re talking about final season competitive balance. There are other measures of those other aspects of balance as well.

So the point is well made that RSD cannot be a candidate for Z. But it was never intended to be such (I know Noll quite well and knew Scully well prior to his death). It really is meant to be the distance comparison measure, 5 steps from my door are farther than 2 steps from your door, so I am farther from my door.

Perhaps there is just a semantics misunderstanding when the literature using RSD states that it "controls" for G? Surely the Z^* measure does this forcefully. But RSD does it relatively, so maybe a better way of saying it is that RSD "recognizes" G in its relative comparison.

Some concluding comments...

So far, I haven’t seen anybody take a crack at calculating dASD/dG. I wonder if the related critics Owen & King (Economic Inquiry, 2015) are actually just simulating dASD/dG in which case they are an ally to those seeking Z^*. One could just use their simulation results rather than trying to determine the derivate, dASD/dG. Or perhaps the Pythagorean discussion in the references to blogs at the top of this post handle this problem already? If not, then there is a ways to go still with Z^* development.

But I’m still not so sure that taking the "inflation adjustment approach" is any more informative than what is done with RSD. Z^* distills the dASD/dG problem to an absolute level. RSD just puts the comparison at a relative level.

It does seem to me that the Z* devotees are not really critiquing RSD as a normalization. They would just prefer to take the direct Z^* approach rather than taking the relativist approach.

And I chose the word “prefer” carefully. While it is easy to see that Z^* is different than RSD, I still don’t see how Z^* is superior to RSD. And it is not enough to just say so. In any event, if Z^* is shown to be better at a later date, future work will be the better for it.

In the meantime, I have competitive balance to compare, within a league where season length changes and across leagues with different season lengths. And I haven't yet been dissuaded on RSD as one useful measure.

16 comments:

Phil BirnbaumApril 25, 2016 at 2:47 PM
Hi, Dr. Fort,

My argument is that there *is* a "successful" Z statistic:

Z = square root of (ASD squared - ISD squared).

ReplyDelete
Replies
Phil BirnbaumApril 25, 2016 at 3:25 PM
Let me try to make the previous "proof" clearer.

ISD is defined as the expected SD of an idealized league where every team is .500 talent.

I say: *Redefine* ISD as the expected SD of all the team's deviations from their expected record based on their talent.

For all teams being .500 talent, the definition is identical, since SD(team record) = SD(team record - constant .500).

For teams being different from .500, you can still use the same formula you're using (sqrt of .25/G). It will be close enough, because even for a .600 team, the actual value is (sqrt of .24/G), and for a .700 team, the actual value is (sqrt of .21/G), which is still close.

If you accept that approximation, and the redefinition of ISG to make it work for non-.500 leagues, then

ASD squared = ISD squared + TSD squared

Where TSD is the (estimated) SD of team talent, and its expected value does not depend on G.
ReplyDelete
Replies
GuyApril 25, 2016 at 4:13 PM
Rod, I’m afraid you’ve started your rumination at what should be the conclusion: the “idealized” SD. Let’s instead start at the beginning: We want to compare competitiveness in two leagues of different season lengths G (or the same league at different times, with intervening change in G). We could simply compare ASDs and say one league is more competitive. But instead we sometimes use RSD. The only conceivable reason to do this is because G influences the ASD -- if it didn’t, we’d just use ASD and be done! So, the purpose of RSD must be to allow a fair comparison of leagues correcting for the influence of different season lengths.

To control for season length, we should adjust ASD in a way consistent with the actual impact of G. If a given increase in G reduced both ASD and ISD proportionately, then your RSD metric would be an elegant solution. Unfortunately, increasing G does *not* increase ASD and ISD proportionately. And thus, dividing ASD by ISD cannot provide an estimate of competitive balance independent of G.

Nor does RSD tell us the “distance” from ideal balance, because the size of the “footsteps” in each league’s RSD are different, depending on G. And since ISD does not change proportionally to the change in ASD as G varies, the differing size of the footsteps makes RSD an apples-to-oranges comparison.

The RSD solution– “hey, let’s just divide by the ISD” – never had any statistically valid justification, which is why none has ever been offered. It was an intuition, but one that turns out on further inspection to take us nowhere. It would never be created today. The only reason to use it is because many have used it in the past. But that is really no reason at all..
ReplyDelete
Replies
GuyApril 27, 2016 at 7:30 AM
Rod: Yes, I am the same "Guy."

As a non-UMich person, I cannot access your link. But in any case, the reason that RSD does not provide a common "footstep" size for comparison is that the "units" are each league's respective ISD, which of course varies by G. And if your answer to that is "that's OK, we want to account for the difference in G," then my reply is your metric should reflect the actual impact of G on the ASD. But RSD does not, as you acknowledge. And if ISD has no relation to the actual affect of G on ASD, then RSD is just ASD divided by an arbitrary and shifting denominator.

Here is an analogy: We know that the dimensions of a particular ballpark adds 5 HR for a player on that team. A 10 HR hitter will hit 15, and a 40 HR hitter will hit 45. To adjust for that in comparing these players to the rest of the league, we should subtract 5 HR from their total. But instead, we say the average player here hits 20 HR (but only 15 HR elsewhere), so we will subtract 25% from each player's HR total when comparing to the league. That is what RSD does: it takes an additive relationship (fewer games adds variance) and pretends it is multiplicative. And that gives you the wrong answer.

And note that RSD is not just "less good" than other competitive balance metrics. It is actively misleading. Berri, for example, uses it to show that the NFL is much more competitive than the NBA, despite similar ASD, once you adjust for schedule length. That is simply a false claim. So the use of RSD is literally reducing sports economists' understanding of these issues.
ReplyDelete
Replies
Rodney FortApril 27, 2016 at 9:22 AM
This comment has been removed by the author.
ReplyDelete
Replies
GuyApril 27, 2016 at 10:06 AM
I'm not sure I follow your exercise. In any case, we may be at an impasse. Our objection is that RSD does not effectively control for length of schedule. I believe that you agree, but say that was not its true purpose. So that is really our disagreement.

Obviously, I can't speak to *your* purpose in using the metric. But I would make two points: 1) it is in fact frequently used by sports economists, explicitly, for exactly that purpose (Vrooman 1995, Berri et. al.). You obviously know that body of work very well. If this is all a huge misunderstanding about the true purpose of RSD, why have you never pointed out this error to your colleagues? And 2) using RSD leads to false conclusions about competitive balance (e.g. that the NFL is highly competitive), undermining the value of work that relies on the metric. I'm not a sports economist, but if I were I think that would trouble me.

But at least we can agree that RSD does not control for schedule length, which is something. And perhaps you are (or will be) persuaded that Phil has correctly calculated "Z" (which he has). So this has been a productive discussion, even if we cannot reach consensus.
ReplyDelete
Replies

Add comment