This section contains datasets and other code needed to complete the lab exercises in the Lectures and Labs section. The files can also be found in this GitHub repository.
Datasets
Original location (as described in the lab handouts): dataiap/datasets
DATASETS | COMMENTS |
---|---|
2008 Presidential Campaign Contributions (ZIP) (This ZIP file contains: 1 .xls file.) | In the public domain, from the Federal Election Commission. |
2011 County Health Rankings (ZIP) (This ZIP file contains: 3 .xls file, 1 .txt file, and 3 .py files.) | 2011 County Health Ranking National Data.xls © County Health Rankings & Roadmaps, ols.py © Vincent Nijs. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/fairuse. |
Enron email dataset (TGZ - 432MB) Subsets of Enron dataset: kenneth.zip (ZIP) (This ZIP file contains: 4166 .txt files.) kenneth_json.zip (ZIP) (This ZIP file contains: 1 .json file.) | In the public domain, from the Federal Energy Regulatory Commission. The history of the Enron dataset is described here. |
Code
Original locations (as described in the lab handouts):
- dataiap/dayX
- dataiap/resources
The zip files for days 3 and 5 include hypothesis_testing.py, regression.py, and mapreduce.py. These are the source files for the lab handouts, and are included here for convenience; the .py files do not provide additional content.
CODE FILES | COMMENTS |
---|---|
Day 3 (ZIP) (This ZIP file contains: 4 .py files.) | ols.py © Vincent Nijs, welchttest.py © Angus McMorland. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/fairuse. |
Day 4 (ZIP) (This ZIP file contains: 1 .py file.) | |
Day 5 (ZIP) (This ZIP file contains: 7 .py files and 1 .gz file.) | |
Resources (ZIP) (This ZIP file contains: 6 .py files and 2 .json files.) |