You’ve probably wondered what — if anything — is scientific about a “scientific” poll, which purports to show that if the election were held today, 50 percent would vote for TweedleDee and 50 percent would vote for TweedleDum — with a margin of “error” of plus or minus 5 percent.
The answer is there is nothing scientific about it. In fact, political pollsters turn statistical sampling/analysis — which has no predictive power in the ordinary sense of that term — on its head.
How does a real scientist go about getting a good estimate of the whole by statistically analyzing samples of it?
Well, for example, suppose she wishes to know the temperature of a distant object. She knows all objects radiate heat and light over a relatively large range — spectrum — of energies and that the single-humped shape of that energy spectrum is uniquely and directly related to the absolute temperature of the radiating body.
(It’s called, believe it or not, the Black-Body Spectrum and it was first derived by Max Planck, one of the founding fathers of quantum mechanics.)
So, all she has to do to is to measure the amount of energy emitted per unit time at a few hundred randomly selected places on the spectrum, over and over again, until she has satisfied herself that she effectively knows the shape of its Black-Body Spectrum. Once she knows the shape of the spectrum, she can work backwards to "unfold" the temperature of the distant object that produced it.
The more accurately she wishes to know the temperature, the more sets of samples she will have to take.
The “margin of error” quoted by a scientist for the black-body temperature has the following meaning: if she took many, many samples of the whole spectrum at a thousand randomly selected points, and from that determined that the radiating body had an absolute temperature of 300 degrees Kelvin, with a margin of error of 5 percent, that means — statistically — that if she were to take three more independent sets of samples of the same radiated spectrum, then the temperature determined from two of those samplings would be somewhere between 285 degrees and 315 degrees Kelvin.
However, the temperature determined in one of those statistical samplings would be either greater than 315 or less than 285 degrees Kelvin.
The key point in this scientific sampling/analysis is that if there had not existed — completely independent of what the scientist did or didn’t do — a characteristic black-body spectrum being radiated by the hot object, then there would have been nothing to sample.
Restated, you can’t sample anything unless there is something definitive for you to sample that is not affected by your sampling.
The only way to sample the opinion of 10 million voters on any subject is to first obtain the opinions of those 10 million voters on a specific subject, and, having gotten all those opinions — call them "ballots" — piled on your desk, then u2018sample' the ballots, randomly selecting a thousand or so of those 10 million ballots and tabulating the results.
Suppose that sample of a thousand ballots, when analyzed, results in 500 anticipated votes for TweedleDee and 500 for TweedleDum.
What does a truly scientific pollster now know?
Basically, all she knows is that in that sample, it's 50/50 for Dee/Dum.
What does that one sample tell her about the other 9,999,000 ballots piled on her desk.
But suppose she takes a hundred such samples, and unbelievably, in 50 of those samples it comes out 52/48 Dee/Dum and in the other 50 samples it comes out 48/52 Dee/Dum.
Now we're cookin. Now she can apply statistical analysis to the results of her samples and make predictions with some degree of confidence about what she would expect to find if she took additional samples of the nine-million ballots piled on her desk.
Note that she is still not in a position to make any prediction — with any "margin of error" — of what she would find if she tabulated all the ballots on her desk.
But, you say, what about an exit poll? Surely that is semi-scientific — sort of? Suppose the pollster does wait till election day and asks — as they leave the voting booths — 1,000 randomly selected voters who they just voted for.
Isn’t that sampling, sort of? Aren’t there effectively 10 million opinions now stacked on the pollster’s desk and isn’t she essentially selecting, randomly, a thousand of those?
In terms of popular vote, most recent presidential elections have been fairly close. The odds are that the votes will be about evenly split, nationwide, between the candidate riding the elephant and the candidate riding the donkey.
Of course, not in the District of Columbia. But the elephant/donkey split even there in 2008 is likely to be about the same as it was in 2000, the last time an incumbent President was not seeking re-election.
The problem is that, even in DC, it is statistically possible that the first 1,000 voters exit-polled will tell the exit pollster that they just voted for Ron Paul. Or, since it is not yet a federal crime to lie to a pollster, every one of them may say they voted for Donald Duck.
Well you can bet that, if that is what the exit pollster gets told, the main-stream-media is not going to predict the election of Ron Paul (or Donald Duck) by a landslide — with a margin of error of plus/minus 5 percent.
Now, how scientific is that, throwing out the results you don’t like?
December 10, 2007