Sunday, January 13, 2008

Archive of R files and data

You can download a .tar.gz file containing the R scripts and the CSV data here.

2 comments:

Unknown said...

Just a brief followup on heteroskedasticity - I ran these quick commands:

> var(Clinton,na.rm=T)
[1] 0.006180386
> var(Clinton[hand==1],na.rm=T)
[1] 0.006538795
> var(Clinton[hand==0],na.rm=T)
[1] 0.003943916
>

Basically, this shows that there is more variance in the Clinton vote in districts with hand-counting than in Diebold districts. The same is true of Obama:

> var(Obama,na.rm=T)
[1] 0.006988419
> var(Obama[hand==1],na.rm=T)
[1] 0.008372794
> var(Obama[hand==0],na.rm=T)
[1] 0.004406466
>

And I'm not going to run it now but I imagine this is just generally the case. This is likely a reflection of the fact that hand districts are smaller and thus a few voters swinging one way or the other results in a large variability in terms of the proportions received by the candidates. Anyway, this is heteroskedasticity, and this is what the robust linear model controls for.

I'll leave it at that for now and let someone more enlightened fix my errors and complete my work, as I need to sleep and get my own work done...

semmelweis said...

You are obviously, unlike me, well-versed in statistics. However before looking at advanced statistical explanations we should check if there is an obvious hidden variable that could explain the discrepancy. It seems that Diebold machines are geographically clustered, however I couldn't find a list of NH towns that has geographic data (say, latitude/longitude coordinates for each town) and that matches the election data.