CS-422 - Assignment 6 (5%)

Applications

Due by: May 5, 2013

Assignment Specifications

In this assignment you will apply various data mining algorithm for analyzing stock data and text data. The stock data should be obtained from Yahoo finance. For the text data mining you should use the DDL Reuters-21578 data set and one additional dataset. The data sets should be actual data sets and it is part of the assignment to capture and process them. With permission you may substitute one of the datasets with a dataset that is of special interest to you. The grade for this assignment will be based on your application of the algorithms, the thoroughness of your evaluation, the results you obtain, and the clarity of your report. Make sure to explain the results you obtain and do not unnecessarily repeat similar results. The code you write should be modular and well documented.

  1. Load/capture the datasets: Yahoo finance stock dataset (NASDAQ 100 since 2000), ``DDL Reuters-21578'' dataset, and an additional text dataset. Contact us for permission if you would like to use a substitute data set.

  2. Load the datasets and explore them. Pre-process the data to produce the feature vectors if necessary. The processing of the text files should include white space and case handling, stop words removal, and stemming.

  3. Determine possible data mining tasks for the different datasets and apply data-mining algorithms to accomplish them. Implementing algorithms in this assignment is note required. Use existing implementations. I

  4. Show the results of your analysis and draw conclusions based on the analysis of the data. Your grade will be based in part on the quality and significance of the conclusions you obtain.

General comments



Gady Agam 2013-04-23