Session Overview
This lecture covers hierarchical clustering and introduces k-means clustering. This image is from the Wikimedia Commons. This image is in the public domain. |
Session Activities
Lecture Videos
- Lecture 20: More Clustering (00:49:09)
About this Video
Topics covered: Feature vectors, scaling, k-means clustering.
Resources
- Lecture code handout (PDF)
- Lecture code (PY)
- Lecture slides (PDF)
- Lecture data files (ZIP) (This ZIP file contains: 3 .txt files.)
Recitation Videos
-
Recitation 8: Hierarchical and k-means Clustering (00:50:49)
Recitation 8: Hierarchical and k-means Clustering
> Download from iTunes U (MP4 - 113MB)
> Download from Internet Archive (MP4 - 113MB)
About this Video
Topics covered: Unsupervised learning, k-means clustering, distance metric, cluster merging, centroid, k-mean error, holdout set, k value significance, features of k-means clustering, merits and disadvantages of types of clustering.
Check Yourself
How do we use nominal (non-numeric or noncontinuous) categories as features?
› View/hide answer
Convert each possible value to a real number.
Why do we need to use scaling (normalization)?
› View/hide answer
To indicate the relative importance of each feature.
How does k-means clustering work?
› View/hide answer
A number 'k' points are chosen, randomly or otherwise, to be the initial centroids; all other points are assigned to their nearest centroid. A new, better centroid is then chosen for each cluster, and we rinse and repeat until the difference between our current set of clusters and the previous set is insignificant.
Problem Sets
Problem Set 9: Schedule Optimization (Due)
At an institute of higher education that shall remain nameless, it used to be the case that a human adviser would help each student formulate a list of subjects that would meet the student's objectives. However, because of financial troubles, the Institute has decided to replace human advisers with software. Given the amount of work a student wants to do, the program returns a list of subjects that maximizes the amount of value. The goal of this problem set is to implement optimization algorithms.
- Instructions (PDF)
- Code Files (ZIP) (This ZIP file contains: 2 .py files and 2 .txt files.)
Note: Solutions are not available for this assignment.
Problem Set 10 (Assigned)
Problem set 10 is assigned in this session. The instructions and solutions can be found on the session page where it is due, Lecture 22 Using Graphs to Model Problems, Part 2.