• Gady Agam
  • Office: SB-237e
  • Contact: x7-5834,
  • Office hours: Tues, Thur 6:30pm - 7:30pm

Teaching assistant

  • Lin Gan
  • Office: SB-115
  • Contact: x7-5705, $$
  • Office hours: Mon 4:30pm - 6:00pm, Wed 11:00am - 12:30pm


  • CS-422-001: Main campus (LS-111)
  • CS-422-002: Internet
  • CS-422-003: Internet (IN)
Image by Flavio Takemoto


The amount of available data is constantly increasing. The same is true for computational capabilities. This creates the opportunity for automatically analyzing large datasets and discovering new information in them. Applications of data mining are numerous and span a wide range of areas. These include scientific applications where data is either collected or generated from simulations, medical applications where data is collected from patients, biological applications where gene expression and protein interactions are tested and analyzed, financial applications where stock and other financial metrics can assist market analysis, web applications where user information and user interaction can be used, and commerce applications where purchase and bank transactions can be analyzed.

Topics to be covered in cs422 this semester include: overview of data mining, data mining tasks, data mining software, processing and visualizing data, handling attributes, decision trees, overfitting, evaluating performance, comparing classifiers, non-parametric classifiers, Bayesian classifiers, support vector machines, neural networks, ensemble methods, association analysis, cluster analysis, anomaly detection, and network data mining. The course will involve several programming assignments incorporating various applications such as web data mining, text mining, and  protein-protein interactions.

Updated on January 10, 2013