This data is from: http://www.countyhealthrankings.org/ranking-methods/exploring-data http://opengovdata.pbworks.com/w/page/27141180/County%20Health%20Rankings To get the data into the useful csv files (additional_measures_cleaned.csv and ypll.csv), I: - save each of the sheets of 2011 County Health Rankings.xls to a .csv file - delete the top row in each, since it's a metaheader - reference http://www.countyhealthrankings.org/sites/default/files/CHR%202011%20Data%20Comparability%20Across%20States.pdf to determine the measures that apply across states - realize that z-scores and rankings are within a state, so I can't use any of those metrics to compare counties in different states - look for ratings (per 100k people) or comparable metrics (population, percent children eligible for free lunch) that apply across states - identify YPLL (years of preventable life loss) as a good candidate for the dependent variable - delete all columns that don't apply to the regression from additional_measures_cleaned.csv / ypll.csv - only consider rows where none of the values are 0, and Unreliable is not checked. - use ols.py from https://www.scipy.org/Cookbook/OLS