The general theme of the course will be on algorithmic, graph theoretical, and application oriented issues related to large scale complex social networks. The specific focus of the course will be on models and algorithms for social networking structure, information, influence, and belief propagation, privacy and security issues, anonymous and de-anonymous, and data mining issues in social networks.
The course will have a seminar format and it will be based on recently published material. The list of recommended reading will be constantly updated. Students are expected to present some of the selected papers on some topics and also implement some selected projects related to social networking.
The course will start with a review of necessary background topics such as applications of social networking in different fields, the emerging enabling technologies for social networking and big data analytics. Then, the course will discuss different technical issues in social networking such as large scale network modeling, the information propagation, influence modeling and propagation, spam detection, sentiment analysis, privacy and security issues in social networking, anonymous and de-anonymous, clustering, business sides of social networking. Theories, algorithms and protocols will complement the application oriented material of the class. New and emerging topics in both theoretical research and applications will be presented as well.
The goal of the course is to provide students with the necessary foundations to apply social networking, theory, and algorithms in the field of social networking and big data. The focus of this class is to discuss and understand the challenges in emerging social networking systems, and mobile networks.
1) Classroom: Stuart Building 107 (changed from 220 SB)
2) Date/Time: W 6:25pm-9:05pm (see IIT calendars for holidays)
3) Instructor: XiangYang Li; Electronic contact: xli at cs dot iit dot edu; Office: SB 229C; Office hours: M, W: 2-3pm
4) Teaching Assistant: ???, Email:???; Office: SB 019B, Office Hours: Friday 1PM to 3PM.
5) Course Lectures: will be put online (blackboard) when ready
Working knowledge of some programming languages such as C++, Java, Python, and data structures is required. Familiarity with basic algorithmic concepts, probability theory, statistics and linear algebra is also preferred (some are required). Programming projects will require knowledge of C (or C++), java, python.
Courses required: CS 116, CS 330, and CS 331.
Courses recommended: CS 422, CS 430.
There is no mandated textbook. Recommended books:
1. Networks, Crowds, Markets, by D. Easley and J. Kleinberg. See Online version here
2. Think Complex, by Allen Downey, see online PDF version here.
Students who take the course for credit will be required to complete a course project. Completion of the project constitutes 50% of the overall grade. In the course of preparing the project the students will have to do one presentation of several related papers (preferably related to their project). The presentation will be 20% of the credit. Finally, a reaction paper that summarizes the initial thoughts of the students with respect to their topic will be another 20% of the total credit. Finally, active class participation will be the additional 10%.
1. One class presentation 20%
2. One reaction paper 20%
3. One project 50%
4. Class participation. 10%
Incompletes will not be given.
Late Assignment Policy: There will be a penalty of 10% per day, up to three days late. After that no credit will be given.
The course is primarily based on recent material (from the past 5-10 years); this means that most of it exists in the form of papers on the Web, and the existing literature raises a lot of interesting issues that have yet to be explored.
In the course of preparing the project the students will have to do one presentation of some papers (preferably related to their project). The presentation will either be an initial attempt to familiarize the students with the area they are going to be working on for the rest of the semester. A student may decide to do a presentation on a topic irrelevant to his/her project theme.
As a way to get everyone thinking about the research issues underlying the course, there will be a short reaction paper of at least 5 pages in length in IEEE format. The reaction paper should be structured as follows. First, you should read at least two (it is better to read more) closely related papers relevant to a particular section of the course. It is better to read the most recent papers or widely cited papers. You should then write at least 5 pages in IEEE Transactions format in which you address the following points:
1. What is main technical content of the papers?
2. Why is it interesting in relation to the corresponding section of the course?
3. What are the weaknesses of the papers, and how could they be improved?
4. What are some promising further research questions in the direction of the papers, and how could they be pursued?
Reaction papers should not just be summaries of the papers you read; most of your text should be focused on synthesis of the underlying ideas, and your own perspective on the papers. To make this concrete, you should make sure that you devote much of the content to the last bullet above: promising directions for further research. In particular, the reaction paper should contain at least some amount of each of the following types of content:
1. A proposal for a model or algorithm - potentially extending, varying, or improving something in the papers you have read - together with some mathematical analysis of it. You should also show the feasibility of your approach:
1. What are the hypotheses that you want to show? How will you verify your hypothesis? Where will you get your data to test your method? Why will your method work? How will you evaluate your method? What will you do if your method did not work as you expected?
2. The time plan for your project.
2. A test of a model or algorithm (either your own or something from one of the papers) on a dataset or on simulated data.
The reaction paper should be considered as a very good way to explore a potential project topic.
The final piece of the work for the course will be a project. You can work on this in groups of up to 3 students, and it is largely up to you to define the topic and scope of the project.
The first step in the project will be a short proposal. This is meant just to be a brief description of what you are intending for the project - about 2 pages in length, with a discussion of relevant background work and tentative plans for how you would proceed. If your project is based on your reaction paper, then you do not need to repeat things you have said in the reaction paper - it is enough to describe how you plan to turn the ideas from the reaction paper into a larger project.
The basic genres of project are the following:
1. An experimental evaluation of an algorithm, model, or measure on an interesting dataset. The datasets on the course home page suggest some possible domains in which to think about such experiments; but you can also assemble your own data. See the SNAP project from Stanford for some interesting data set.
2. A simple Facebook or twitter, or Weibo application.
3. A theoretical project that considers an algorithm, model, or measure in social network, (checkout the related papers from ACM STOC, IEEE FOCS, ACM SODA, ACM EC, and so on)
4. The area of some course topics, and derive rigorous results about it.
5. An extended, critical survey of one the course topics, going into significant depth and offering a novel perspective on the area.
As with the reaction paper, the project should contain at least some amount of mathematical analysis, and some experimentation on real or synthetic data (it is also recommended for even a survey paper).
The result of the project will typically be a 10-15 page paper (in ACM format, and survey paper will be longer), describing the approach, the results, and the related work. The references on the course home page serve as examples of what such papers tend to look like; of course, the overall form of the paper will depend on the nature of the project.
The final stage will be a presentation of the projects in class by each group. The exact schedule for the project presentations will be worked out later in the semester.
Before the project, every student needs to write code for crawling the data from twitter, or Facebook, or weibo (using our own code, and published code online). Or you can get large scale social networking data from the following sites
you can use the data published online
1. http://snap.stanford.edu/data/index.html (SNAP from stanford)
For network analysis and data visualization, we can use
Stanford Network Analysis Platform (SNAP), a general purpose, high performance system for analysis and manipulation of large networks. See http://snap.stanford.edu/snap/index.html for instructions
1. Sentiment analysis: given a tweet, analyze the sentiment; also analyze the sentiment evolvement (similar to sentiment140, and wisdom project by microstrategy)
2. Belief propagation: given a tweet, capture the belief propagation in social networks
3. Personality analysis of users: given the data collected, analyze the personality of a user, and its evolvement
4. Spam detection in large scale online social networks
5. Online social network user behavior modeling, and capture
6. Online social network user belief modeling and capture
7. Given the data collected, analyze the sentiment and feedback of users towards a certain product (such as camera, or Canon Camera). We need to separate the feedback after the action (e.g, buying), or before the action.
8. Privacy protection in large scale social networks
9. Non-tracking and advertisement in social networks
10. Integrating social networking with mobile computing: mobile social network
11. Integrating social networking with crowd-computing: crowd-based location based services, crowd-based emergency handling and evacuation.
12. How to find useful data in a social network. Here usefulness depends on the applications. For example, for different business model, the focused keywords will be different. For anti-terrorist, it may be interested in some keywords, and patter. For bestbuy, they care about selling products to customer; for Samsung, they only care about their products.
13. Something similar to Wisdom project by Microstragy?
14. Integrate data from various social network sources (facebook, twitter, and phone record) to de-anonymous users.
15. Integrate data from various social network sources to study a user personality? A users value for a commercial application?
David Easley and Jon Kleinberg: Networks, Crowds, and Markets: Reasoning About a Highly Connected World.
Generative models for social networks
1. Michael Mitzenmacher, A Brief History of Generative Models for Power Law and Lognormal Distributions,
2. Bela Bollobas and Oliver RiorDan, The Diameter of Scale-Free Random Graph, Combinatorica 24 (1), 2004. Page 5-34.
3. S. N. Dorogovtsev, A. V. Goltsev, J. F. F. Mendes, Critical phenomena in complex networks
4. Hossam Sharara, Lisa Singh, Lise Getoor, Janet Mann: The Dynamics of Actor Loyalty to Groups in Affiliation Networks. International Conference on Advances in Social Network Analysis (ASONAM) 2009.
5. Elena Zheleva, Hossam Sharara, Lise Getoor: Co-evolution of social and affiliation networks. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) 2009.
6. Silvio Lattazi, D. Sivakumar: Affiliation networks. ACM Symposium on Theory of Computing (STOC), 2009.
7. Jure Leskovec, Christos Faloutsos: Scalable modeling of real graphs using Kronecker multiplication. International Conference on Machile Learning (ICML), 2007.
8. Jure Leskovec, Jon Kleinberg and Christos Faloutsos Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) 2005.
9. M. E. J. Newman: Power laws, Pareto distributions and Zipf law, Contemporary Physics.
10. R. Albert and L.A. Barabasi, Statistical Mechanics of Complex Networks, Rev. Mod. Phys. 74, 47-97 (2002).
11. B. Bollobas: Mathematical Results in Scale-Free random Graphs.
12. D. S. Callaway, J. E. Hopcroft, J. M. Kleinberg, M. E. J. Newman, and S. H. Strogatz: Are randomly grown graphs really random? Phys. Rev. E 64, 041902 (2001).
13. D.J. Watts: Dynamics and Small-World Phenomenon. American Journal of Sociology, Vol. 105, Number 2, 493-527, 1999
14. Watts, D. J. and S. H. Strogatz: Collective dynamics of small-world networks. Nature 393:440-42, 1998
Information propagation / collaboration
1. Michael Mathioudakis, Nick Koudas: Efficient identification of starters and followers in social media. Extended DataBase Technology Conference (EDBT), 2009.
2. Theodoros Lappas, Kun Liu, Evimaria Terzi: Finding a team of experts in social networks. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) 2009.
3. Heikki Mannila, Evimaria Terzi: Finding links and initiators: a graph-reconstruction problem. SIAM Data Mining Conference (SDM) 2009.
4. Deepayan Chakrabarti, Yang Wang, Chenxi Wang, Jure Leskovec, Christos Faloutsos: Epidemic thresholds in real networks. ACM Transactions on Information and Systems Security, 2008.
5. Amit Goyal, Francesco Bonchi, Laks V. S. Lakshmanan: Discovering leaders from community actions. ACM Conference on Information and Knowledge Management (CIKM) 2008.
6. Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, Natalie S. Glance: Cost-effective outbreak detection in networks. ACM International Conference on Knowledge Discovery and Data Mining (KDD), 2007.
7. Hanghang Tong, Christos Faloutsos: Center-piece subgraphs: problem definition and fast solutions. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2006.
8. David Kempe, Jon Kleinberg, Eva Tardos: Maximizing the spread of influence through a social network. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2003.
9. Pedro Domings, Matthew Richardson: Mining the network value of customers. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2001.
Privacy and social networks
1. Barnes, S.B., A privacy paradox: Social networking in the United States, First Monday, volume 11, page 11--15, 2006
2. Rosenblum, D., What anyone can know: The privacy risks of social networking sites, IEEE Security & Privacy, vol 5, number 3, 2007
1. Lei Zou, Lei Chen, M. Tamer Oszu: K-automorphism: a general framework for privacy preserving network publication. Proceedings of Very Large DataBases (PVLDB) 2009.
2. Elena Zheleva, Lise Getoor: To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles. International Conference on World Wide Web (WWW), 2009.
3. Arvind Narayanan, Vitaly Shmatikov: De-anonymizing Social Networks. IEEE Symposium on Security and Privacy 2009.
4. Michael Hay, Chao Li, Gerome Miklau, David Jensen: Accurate Estimation of the Degree Distribution of Private Networks. IEEE International Conference on Data Mining (ICDM) 2009.
5. Kun Liu, Evimaria Terzi: A framework for computing the privacy score of users in online social networks. IEEE International Conference on Data Mining (ICDM) 2009.
6. X. Ying and X. Wu: Graph generation with prescribed feature constraints. Siam Data Mining Conference (SDM), 2009.
7. Kun Liu, Evimaria Terzi: Towards identity anonymization on graphs, ACM International Conference on Management of Data (SIGMOD) 2008.
8. Michael Hay, Gerome Miklau, David Jensen, Don Towsley, Philipp Weis: Resisting structural identification in anonymized social networks. Conference on Very Large Databases (VLDB) 2008.
9. Lars Backstrom, Cynthia Dwork, Jon Kleinberg: Wherefore art thoou r3579x?: anonymized social networks, hidden patterns and structural steganography, International Conference on World Wide Web (WWW), 2007.
Influence propagation and Influencer Determination
3. Kempe, D. and Kleinberg, J. and Tardos, E., Maximizing the spread of influence through a social network, Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 137--146, 2003.
Hartline, J. and Mirrokni, V. and Sundararajan, M., Optimal marketing strategies over social networks, Proceedings of the 17th international conference on World Wide Web, 2008.
Tang, J. and Sun, J. and Wang, C. and Yang, Z., Social influence analysis in large-scale networks, Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, 2009.
Dasgupta, K. and Singh, R. and Viswanathan, B. and Chakraborty, D. and Mukherjea, S. and Nanavati, A.A. and Joshi, A., Social ties and their relevance to churn in mobile telecom networks, Proceedings of the 11th international conference on Extending database technology: Advances in database technology, 2008.
Bakshy, E. and Hofman, J.M. and Mason, W.A. and Watts, D.J., Everyone's an influencer: quantifying influence on twitter, Proceedings of the fourth ACM international conference on Web search and data mining, 2011.
Chen, W. and Wang, C. and Wang, Y., Scalable influence maximization for prevalent viral marketing in large-scale social networks, Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, 2010.
9. Chen, W. and Wang, Y. and Yang, S., Efficient influence maximization in social networks, Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, 2009
Girvan, M. and Newman, M.E.J., Community structure in social and biological networks, Proceedings of the National Academy of Sciences, volume 99, number 12, year 2002, National Acad Sciences.
Newman, M.E.J. and Girvan, M., Finding and evaluating community structure in networks, Physical review E, volume 69, number 2, 2004
Palla, G. and Derenyi, I. and Farkas, I. and Vicsek, T., Uncovering the overlapping community structure of complex networks in nature and society, Nature, volume 435, number 7043, 2005
Clauset, A. and Newman, M.E.J. and Moore, C., Finding community structure in very large networks, Physical review E, 2004
Newman, M.E.J., Fast algorithm for detecting community structure in networks, Physical Review E, 2004
Leskovec, J. and Lang, K.J. and Dasgupta, A. and Mahoney, M.W., Statistical properties of community structure in large social and information networks, Proceeding of the 17th international conference on World Wide Web, 2008.
Mislove, A. and Marcon, M. and Gummadi, K.P. and Druschel, P. and Bhattacharjee, B. ,Measurement and analysis of online social networks, Proceedings of the 7th ACM SIGCOMM conference on Internet measurement, 2007.
Jin, E.M. and Girvan, M. and Newman, M.E.J., Structure of growing social networks, Physical review E, Volume 64, 2001
Kumar, R. and Novak, J. and Tomkins, A., Structure and evolution of online social networks, Link Mining: Models, Algorithms, and Applications, pages 337--357, 2010
McPherson, M. and Smith-Lovin, L. and Cook, J.M., Birds of a feather: Homophily in social networks, Annual review of sociology, Pages 415--444, 2001
Newman, M.E.J., Finding community structure in networks using the eigenvectors of matrices, Physical review E, 2006
Danon, L. and Diaz-Guilera, A. and Duch, J. and Arenas, A., Comparing community structure identification, Journal of Statistical Mechanics: Theory and Experiment, 2005
Backstrom, L. and Huttenlocher, D. and Kleinberg, J. and Lan, X., Group formation in large social networks: membership, growth, and evolution, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 44--54, 2006
Markines, B. and Cattuto, C. and Menczer, F., Social spam detection, Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web, 2009,
Wang, A.H., Don't follow me: Spam detection in twitter, Proceedings of the 2010 International Conference on Security and Cryptography (SECRYPT), 2010.
Benevenuto, F. and Rodrigues, T. and Almeida, V. and Almeida, J. and Zhang, C. and Ross, K., Identifying video spammers in online social networks, Proceedings of the 4th international workshop on Adversarial information retrieval on the web, 2008,
Tseng, C.Y. and Chen, M.S., Incremental SVM model for spam detection on dynamic email social networks, International Conference on Computational Science and Engineering, 2009.
Heymann, P. and Koutrika, G. and Garcia-Molina, H., Fighting spam on social web sites: A survey of approaches and future challenges, IEEE Internet Computing, 2007.
Stringhini, G. and Kruegel, C. and Vigna, G., Detecting spammers on social networks, Proceedings of the 26th Annual Computer Security Applications Conference, 1-9, 2010.
Benevenuto, F. and Rodrigues, T. and Almeida, V. and Almeida, J. and Goncalves, M., Detecting spammers and content promoters in online video social networks, Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, 620--627, 2009
Wang, A., Detecting spam bots in online social networking sites: a machine learning approach, Data and Applications Security and Privacy XXIV, pages 335--342, 2010
Lee, K. and Caverlee, J. and Webb, S., Uncovering social spammers: social honeypots+ machine learning, Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval,435-442, 2010.
Leskovec, J. and Huttenlocher, D. and Kleinberg, J., Predicting positive and negative links in online social networks, Proceedings of the 19th international conference on World wide web, 641--650, 2010.
Yang, W.S. and Dia, J.B. and Cheng, H.C. and Lin, H.T., Mining social networks for targeted advertising, Proceedings of the 39th Annual Hawaii International Conference on System Sciences, 2006. HICSS'06. 2006.
Trusov, M. and Bucklin, R.E. and Pauwels, K., Effects of word-of-mouth versus traditional marketing: Findings from an internet social networking site, Robert H. Smith School Research Paper No. RHS, 2008.
Provost, F. and Dalessandro, B. and Hook, R. and Zhang, X. and Murray, A., Audience selection for on-line brand advertising: privacy-friendly social network targeting, Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, 2009.
Narayanan, A. and Shmatikov, V., De-anonymizing social networks, 30th IEEE Symposium on Security and Privacy, 2009,
Domingos, P., Mining social networks for viral marketing, IEEE Intelligent Systems, pages 80--82, 2005.
Hartline, J. and Mirrokni, V. and Sundararajan, M., Optimal marketing strategies over social networks, Proceedings of the 17th international conference on World Wide Web, 189--198, 2008.