|
|
Regression Toward the Mean
Regression toward the mean (R2M) is a fascinating concept that is both useful
and easy to apply.
You can get a good working-understanding of R2M by thinking about the most
common examples of it:
- exceptionally tall parents tend to have children who are taller than average, but not as tall
as their parents. (Galton)
- a professional athlete who has has played exceptionally well in one season (or game)
will probably play less well in the next. ("sophomore slump")
- a student who has done very poorly on one test will probably finish closer to average
on the next test.
- a rock group with a top-selling debut CD will probably not do as well on their next recording.
Unfortunately, giving a precise definition of what is common to these phenomena is surprisingly
awkward; and one precise enough to allow us to measure R2M requires mathematics
which is beyond the scope of this course.
Instead, here's a rough-and-ready definition:
extreme results tend to be followed by less extreme results.
More precisely, but also more awkwardly, is the definition from the reading:
"Whenever two variables are imperfectly related, extreme values -- high or low--
on one of the variables tend to be matched by less extreme values on the other."
"Superstition" (p.25)
When should we expect R2M?
First, there must be some variation in the population you are observing: if there is no
variation, then all of your results will be the same (they will all be average), so there
will be no extreme result.
Second, the variation has to be one that produces a bell-shaped
curve. This second requirement rules out cases where the variation "clumps" into
discrete groups (e.g. graphing the sex of the members of a group will produce variation between
two groups: M and F); it also rules out cases where there seems to be no systematic underlying
order to the variation (e.g. the occurrence of particular numbers in Lotto 6/49 seems to be random).
On a deep level, these two conditions may be ways of saying the same thing: if there is no "typical"
result, then there will be no extremes and no mean to regress toward.
"Toward" the Mean
The reason this phenomenon is called "regression toward the mean"
(and not regression-all-the-way-back-to-the-mean) is the assumption that the exceptional result is a
partially accurate measure of the underlying quality (or qualities) that lead to that result.
The idea here is that the exceptional result did not perfectly reflect the constant underlying
qualities it was supposed to (e.g. athlete's ability and hard work, student's knowledge and
preparedness, pop group's talent/good looks/promotional ability), since (among other reasons)
the result depended on more than just the constant (how constant?) underlying qualities.
However, even if the result wasn't perfectly representative, it did
give a rough indication of what those qualities are. So, for instance, the fact that a player wins the rookie-of-the-year award indicates that he's a better-than-average first-year player; but it doesn't show
that over the next few years he will continue to be the best of that crop of players. Similarly, since a
test is not a perfect measure of a student's ability or preparedness (etc.), a student who does very
poorly on one test would be expected to score closer to the mean on the next test, but still below the
mean. [If we didn't observe R2M, would that be a reason to think the test was unfair; that it really
bore very little relation to the students' knowledge and effort?]
Regression in PHIL158d - 1998-99
Here are some close-to-home examples of R2M. Each table compares the results of students who had exceptional results on the 1998-99 year's in-class tests.
|
|