Sunday, January 13, 2008

Controlling for employment, age, housing or income data doesn't remove the Diebold effect

While I was spending a quiet Sunday I didn't notice the large amount of activity here. Thanks for all your comments and your data! I thought this analysis was lost in the blogosphere.

Meanwhile other have done their own analyses and it seems that the effect doesn't disappear. They have also published their own data in computer-readable format.

The mainstream media has also picked up the buzz.

I quote:

Lenski said it's all of a piece: Education, income and age -- factors that influence voters' candidate choices, also play into where they choose to live.


So I loaded one of these data sets which includes the following data:

  • Primary results

  • Vote method

  • Sociodemographic data: age distribution, housing units, unemployment rate, median household income, number of single family homes.



I have then attempted to explain Clinton's result by various combinations of these variables, using multivariate linear regression in GNU R.

In all these attempts, voting method remains the most important variable (besides, of course, results of other candidates) explaining Clinton's score, with a F-value of 15 and p < 0.0002. Next we have unemployment rate (with a F-value of 4 and p < 0.04).

Here is the R command:

> model <- lm(
nh$Clinton ~
nh$Obama +
nh$Biden +
nh$Dodd +
nh$Edwards +
nh$Gravel +
nh$Kucinich +
nh$D1H0 +
nh$Votes +
nh$Totalpopulation +
nh$Percapitaincome * nh$Totalemployed * nh$Totalunemployed +
nh$Singlefamilyhomes +
nh$Multifamilyunits +
nh$Medianage +
nh$Percenthighschoolgraduates +
nh$Age5andunder * nh$Age5to19 * nh$Age35to54 * nh$Age55to64 + nh$Age65andup +
nh$Employeesinlargestbusiness +
nh$Municipalwater * nh$Municipalsewer * nh$Totalhousingunits
)


And its results:

nh$Obama 205.6910 < 2.2e-16 ***
nh$Biden 0.3304 0.5663645
nh$Dodd 11.2074 0.0010484 **
nh$Edwards 0.8830 0.3490086
nh$Gravel 3.9644 0.0484338 *
nh$Kucinich 21.2929 8.867e-06 ***
nh$D1H0 15.5007 0.0001299 ***
nh$Votes 3.0066 0.0851407 .
nh$Totalpopulation 0.2982 0.5859160
nh$Percapitaincome 0.1636 0.6865105
nh$Totalemployed 0.0337 0.8546765
nh$Totalunemployed 0.8149 0.3682376
nh$Singlefamilyhomes 0.1271 0.7219829
nh$Multifamilyunits 0.7231 0.3965950
nh$Medianage 1.1115 0.2935781
nh$Percenthighschoolgraduates 7.536e-07 0.9993086
nh$Age5andunder 3.3983 0.0673950 .
nh$Age5to19 1.3064 0.2550066
nh$Age35to54 0.1011 0.7509720
nh$Age55to64 0.1961 0.6585892
nh$Age65andup 0.6695 0.4146369
nh$Employeesinlargestbusiness 0.5505 0.4593790
nh$Municipalwater 0.1546 0.6947648
nh$Municipalsewer 2.6180 0.1079231
nh$Totalhousingunits 3.4622 0.0648983 .
nh$Percapitaincome:nh$Totalemployed 0.1181 0.7316529
nh$Percapitaincome:nh$Totalunemployed 0.0032 0.9548340
nh$Totalemployed:nh$Totalunemployed 4.3509 0.0388181 *
nh$Age5andunder:nh$Age5to19 0.0871 0.7682941
nh$Age5andunder:nh$Age35to54 0.3209 0.5719690
nh$Age5to19:nh$Age35to54 0.1458 0.7031766
nh$Age5andunder:nh$Age55to64 1.1076 0.2944332
nh$Age5to19:nh$Age55to64 0.0133 0.9082441
nh$Age35to54:nh$Age55to64 0.1195 0.7300958
nh$Municipalwater:nh$Municipalsewer 0.9774 0.3245594
nh$Municipalwater:nh$Totalhousingunits 0.5824 0.4466709
nh$Municipalsewer:nh$Totalhousingunits 0.0487 0.8256904
nh$Percapitaincome:nh$Totalemployed:nh$Totalunemployed 2.7098 0.1019924
nh$Age5andunder:nh$Age5to19:nh$Age35to54 0.1625 0.6874457
nh$Age5andunder:nh$Age5to19:nh$Age55to64 0.7882 0.3761842
nh$Age5andunder:nh$Age35to54:nh$Age55to64 1.6764 0.1975536
nh$Age5to19:nh$Age35to54:nh$Age55to64 0.0148 0.9033289
nh$Age5andunder:nh$Age5to19:nh$Age35to54:nh$Age55to64 0.4226 0.5167068
Residuals
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


Now let me repeat that I am not a statistician or a sociologist. I have a Ph.D in theoretical computer science and a slight interest in statistics.

The mantra is "correlation is not causation" and, as claimed in the Associated Press article, there could very well be an unaccounted sociological factor that correlated with the presence of Diebold machines.

However explanations such as "there is no case for concern since precincts with Diebold machines have, strangely enough, always favored such and such a class of candidates" are not very satisfying since it is the very reliability of
these Diebold machines that is under question.

Hence, under the light of the general surprise of the press at Clinton's victory in New Hampshire, the large discrepancies between some polls and the results and Diebold's history.

No comments: