Course Description
In this course, we will cover both basic and advanced data
mining
techniques in depth (see possible list of topics below).
The course will consist of a mixture of lectures by the instructor and
presentations by the students. Each student is also expected to gain
hands
on experience by carrying out a semester long project on their topic of
choice.
Course Topics
The following is a tentative
list of topics that the instructor will cover:
- Classification (Decision trees, Logistic regression,
Support
vector machines)
- Combining Multiple Learners (Bagging, boosting, cascading,
stacking)
- Clustering (k-means, EM, hierarchical clustering, topic
modeling)
- Dimensionality Reduction (Principal component analysis,
linear
discriminant analysis, subset selection)
- Graphical Models (Bayesian networks, Markov networks)
The following is a tentative
list of topics that the students are expected to pick a paper on and
present:
- Active learning
- Multi-label learning
- Graph mining
- Link prediction
- Data mining in bioinformatics
- Social media analytics
- Privacy-aware data mining
- Viral marketing
- Recommender systems
- Large scale data mining
- Temporal pattern mining
- Stream data mining
- Outlier detection
Course Information
Time and Location: Tue - Thu 1:50 - 3:05pm
in Stuart 220
Professor: Mustafa
Bilgic
Office: Stuart 228C
Email: mbilgic AT
iit.edu
Office Hours: Tue - Thu 11am - 12pm (Other times by
appointment)
Prerequisites
CS422 - Data Mining is a required prerequisite. However, if you have
not taken CS422 and are still interested in taking the 01 section,
please send an email to me. Depending on your background, I might waive
the prerequisite. I will not, however, waive the prerequisite for the
Televised and Internet sections.
Course Format and Grading
I will
lecture the first
half of the semester. In the second half of the semester, each student
is expected to present one to two papers and lead the discussion on a
topic of their choice. Students are
expected to read the required materials, prepare a short write-up about
the reading materials, and participate in the
discussions. Students are also expected to carry-out a semester long
project on a problem of their choice.
- Assignments: 20%
- Midterm: 20%
- Paper Presentation:
20%
- Presenting
one to two
papers and leading the discussion in class.
- Project: 40%
- A
project proposal,
project presentation, and a final report.
- Final: There will
not be
a final exam in this class.
Course Material
There
is a required text
book for this course:
Machine
Learning,
2nd edition, by Ethem Alpaydin
There will be additional reading materials (mostly available
on
the web).
Tentative Schedule
Week 1: Jan 10: Introduction
Week 2: Jan 17: Decision Trees
Week 3: Jan 24: Naive Bayes and Logistic Regression
Week 4: Jan 31: Support Vector Machines
Week 5: Feb 7: Bagging and Boosting
Week 6: Feb 14: Clustering
Week 7: Feb 21: Dimensionality Reduction and Feature Selection
Week 8: Feb 28: Graphical Models
Week 9: Mar 7: Review and Midterm
Week 10: Mar 14: SPRING BREAK
Week 11: Mar 21: Advanced Topics
Week 12: Mar 28: Advanced Topics
Week 13: Apr 4: Advanced Topics
Week 14: Apr 11: Advanced Topics
Week 15: Apr 18: Advanced Topics
Week 16: Apr 25: Project Presentations
Week 17: May 2: FINALS WEEK