Please visit the UW Canvas page for the full course site
Calendar
Week | Date | Topic | Projects |
---|---|---|---|
1 | 6/23 | Overview; linear model review | |
2 | 6/30 | Featurization: numeric, categorical, and text data; Outputs | Project 1 proposal due |
3 | 7/14 | Time series | Project 1 check-in due |
4 | 7/7 | Processing raw data; map-reduce and distributed computing; experimental discipline | Project 1 due |
5 | 7/21 | Interactions, recommendation systems | Project 2 proposal due |
6 | 7/28 | Metrics; models in production | Project 2 check-in due |
7 | 8/4 | Modeling big data sets | Project 2 due |
8 | 8/11 | Model understanding | Project 3 proposal due |
9 | 8/18 | Understanding data with models; anomaly detection | Project 3 check-in due |
10 | 8/25 | Nearest neighbor search; featurizing graphs | Project 3 due |
Interesting data
- Election. Good working example.
- Climate data. On kaggle. See also Berkeley Earth for temperature time series at various granularities.
- Health e.g. chronic disease indicators.
- Social networks and graphs.
- Causes of death from the CDC/kaggle. Might be useful for model understanding.
- Shelter animals. On-going kaggle competition.
- Facebook location check-ins. Good example for the challenge of targets. Also has a temporality challenge.
- Project Gutenberg. Text of many books (e.g. Ulysses); great for text classification.
- Generic data sites:
- Misc text