CS-422 - Assignment 6 (5%)
Applications
Due by: May 5, 2013
In this assignment you will apply various data mining algorithm for
analyzing stock data and text data. The stock data should be obtained
from Yahoo finance. For the text data mining you should use the DDL
Reuters-21578 data set and one additional dataset. The data sets should
be actual data sets and it is part of the assignment to capture and
process them. With permission you may substitute one of the datasets
with a dataset that is of special interest to you. The grade for this
assignment will be based on your application of the algorithms, the
thoroughness of your evaluation, the results you obtain, and the clarity
of your report. Make sure to explain the results you obtain and do not
unnecessarily repeat similar results. The code you write should be
modular and well documented.
- Load/capture the datasets: Yahoo finance stock dataset (NASDAQ
100 since 2000), ``DDL Reuters-21578'' dataset, and an additional text
dataset. Contact us for permission if you would like to use a
substitute data set.
- Load the datasets and explore them. Pre-process the data to
produce the feature vectors if necessary. The processing of the text
files should include white space and case handling, stop words
removal, and stemming.
- Determine possible data mining tasks for the different datasets
and apply data-mining algorithms to accomplish them. Implementing
algorithms in this assignment is note required. Use existing
implementations. I
- Show the results of your analysis and draw conclusions based on
the analysis of the data. Your grade will be based in part on the
quality and significance of the conclusions you obtain.
- You are advised (but not required) to use R for implementing the
association rules algorithm.
- Write your code in a modular way using functions and make sure to
document it.
- Do not include in the submission large datasets that were provided
by us.
- The assignment contains testing of many algorithms. Try to be
concise and thorough in the way you present your results and make sure
not to include repetitive results. Results you present should have a
purpose.
- Follow the electronic submission instructions of assignment 1.
Gady Agam
2013-04-23