Detecting Fake News
on Social Media
Synthesis Lectures on Data
Mining and Knowledge
Discovery
Editors
Jiawei Han, University of Illinois at Urbana-Champaign
Johannes Gehrke, Cornell University
Lise Getoor, University of California, Santa Cruz
Robert Grossman, University of Chicago
Wei Wang, University of North Carolina, Chapel Hill
Synthesis Lectures on Data Mining and Knowledge Discovery is edited by Jiawei Han, Lise
Getoor, Wei Wang, Johannes Gehrke, and Robert Grossman. e series publishes 50- to 150-page
publications on topics pertaining to data mining, web mining, text mining, and knowledge
discovery, including tutorials and case studies. Potential topics include: data mining algorithms,
innovative data mining applications, data mining systems, mining text, web and semi-structured
data, high performance and parallel/distributed data mining, data mining standards, data mining
and knowledge discovery framework and process, data mining foundations, mining data streams
and sensor data, mining multi-media data, mining social networks and graph data, mining spatial
and temporal data, pre-processing and post-processing in data mining, robust and scalable
statistical methods, security, privacy, and adversarial data mining, visual data mining, visual
analytics, and data visualization.
Detecting Fake News on Social Media
Kai Shu and Huan Liu
2019
Multidimensional Mining of Massive Text Data
Chao Zhang and Jiawei Han
2019
Exploiting the Power of Group Differences: Using Patterns to Solve Data Analysis
Problems
Guozhu Dong
2019
Mining Structures of Factual Knowledge from Text
Xiang Ren and Jiawei Han
2018
iv
Individual and Collective Graph Mining: Principles, Algorithms, and Applications
Danai Koutra and Christos Faloutsos
2017
Phrase Mining from Massive Text and Its Applications
Jialu Liu, Jingbo Shang, and Jiawei Han
2017
Exploratory Causal Analysis with Time Series Data
James M. McCracken
2016
Mining Human Mobility in Location-Based Social Networks
Huiji Gao and Huan Liu
2015
Mining Latent Entity Structures
Chi Wang and Jiawei Han
2015
Probabilistic Approaches to Recommendations
Nicola Barbieri, Giuseppe Manco, and Ettore Ritacco
2014
Outlier Detection for Temporal Data
Manish Gupta, Jing Gao, Charu Aggarwal, and Jiawei Han
2014
Provenance Data in Social Media
Geoffrey Barbier, Zhuo Feng, Pritam Gundecha, and Huan Liu
2013
Graph Mining: Laws, Tools, and Case Studies
D. Chakrabarti and C. Faloutsos
2012
Mining Heterogeneous Information Networks: Principles and Methodologies
Yizhou Sun and Jiawei Han
2012
Privacy in Social Networks
Elena Zheleva, Evimaria Terzi, and Lise Getoor
2012
Community Detection and Mining in Social Media
Lei Tang and Huan Liu
2010
v
Ensemble Methods in Data Mining: Improving Accuracy rough Combining
Predictions
Giovanni Seni and John F. Elder
2010
Modeling and Data Mining in Blogosphere
Nitin Agarwal and Huan Liu
2009
Copyright © 2019 by Morgan & Claypool
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in
any form or by any means—electronic, mechanical, photocopy, recording, or any other except for brief quotations
in printed reviews, without the prior permission of the publisher.
Detecting Fake News on Social Media
Kai Shu and Huan Liu
www.morganclaypool.com
ISBN: 9781681735825 paperback
ISBN: 9781681735832 ebook
ISBN: 9781681735849 hardcover
DOI 10.2200/S00926ED1V01Y201906DMK018
A Publication in the Morgan & Claypool Publishers series
SYNTHESIS LECTURES ON DATA MINING AND KNOWLEDGE DISCOVERY
Lecture #18
Series Editors: Jiawei Han, University of Illinois at Urbana-Champaign
Johannes Gehrke, Cornell University
Lise Getoor, University of California, Santa Cruz
Robert Grossman, University of Chicago
Wei Wang, University of North Carolina, Chapel Hill
Series ISSN
Print 2151-0067 Electronic 2151-0075
Detecting Fake News
on Social Media
Kai Shu and Huan Liu
Arizona State University
SYNTHESIS LECTURES ON DATA MINING AND KNOWLEDGE
DISCOVERY #18
C
M
&
cLaypoolMorgan publishers
&
ABSTRACT
In the past decade, social media has become increasingly popular for news consumption due to
its easy access, fast dissemination, and low cost. However, social media also enables the wide
propagation of “fake news,” i.e., news with intentionally false information. Fake news on social
media can have significant negative societal effects. erefore, fake news detection on social
media has recently become an emerging research area that is attracting tremendous attention.
is book, from a data mining perspective, introduces the basic concepts and characteristics of
fake news across disciplines, reviews representative fake news detection methods in a principled
way, and illustrates challenging issues of fake news detection on social media. In particular, we
discussed the value of news content and social context, and important extensions to handle early
detection, weakly-supervised detection, and explainable detection. e concepts, algorithms,
and methods described in this lecturecan help harness the power ofsocial media to build effective
and intelligent fake news detection systems. is book is an accessible introduction to the study
of detecting fake news on social media. It is an essential reading for students, researchers, and
practitioners to understand, manage, and excel in this area.
is book is supported by additional materials, including lecture slides, the complete set
of figures, key references, datasets, tools used in this book, and the source code of representative
algorithms. e readers are encouraged to visit the book website for the latest information:
http://dmml.asu.edu/dfn/
KEYWORDS
fake news, misinformation, disinformation, social computing, social media, data
mining, social cyber security, machine learning
ix
To my parents and wife Ling.
KS
To my parents, wife, and sons.
HL
xi
Contents
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 An Interdisciplinary View on Fake News . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Fake News in Social Media Age . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.1 Characteristics of Social Media . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.2 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.3 What News Content Tells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.4 How Social Context Helps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.5 Challenging Problems of Fake News Detection . . . . . . . . . . . . . . . . . . 7
2
What News Content Tells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1 Textual Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.1 Linguistic Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.2 Low-Rank Textual Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.3 Neural Textual Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Visual Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.1 Visual Statistical Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.2 Visual Content Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.3 Neural Visual Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Style Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.1 Deception Styles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3.2 Clickbaity Styles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3.3 News Quality Styles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4 Knowledge-Based Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4.1 Manual Fact-Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4.2 Automatic Fact-Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3
How Social Context Helps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.1 User-Based Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.1.1 User Feature Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
xii
3.1.2 User Behavior Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2 Post-Based Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2.1 Stance-Aggregated Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2.2 Emotion-Enhanced Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2.3 Credibility-Propagated Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3 Network-Based Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3.1 Representative Network Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.3.2 Friendship Networking Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.3.3 Diffusion Network Temporal Modeling . . . . . . . . . . . . . . . . . . . . . . . 44
3.3.4 Interaction Network Modeling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.3.5 Propagation Network Deep-Geometric Modeling . . . . . . . . . . . . . . . 48
3.3.6 Hierarchical Propagation Network Modeling . . . . . . . . . . . . . . . . . . . 49
4
Challenging Problems of Fake News Detection . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.1 Fake News Early Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.1.1 A User-Response Generation Approach . . . . . . . . . . . . . . . . . . . . . . . 55
4.1.2 An Event-Invariant Adversarial Approach . . . . . . . . . . . . . . . . . . . . . 57
4.1.3 A Propagation-Path Modeling Approach . . . . . . . . . . . . . . . . . . . . . . 60
4.2 Weakly Supervised Fake News Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.2.1 A Tensor Decomposition Semi-Supervised Approach . . . . . . . . . . . . 62
4.2.2 A Tensor Decomposition Unsupervised Approach . . . . . . . . . . . . . . . 63
4.2.3 A Probabilistic Generative Unsupervised Approach . . . . . . . . . . . . . . 65
4.3 Explainable Fake News Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.3.1 A Web Evidence-Aware Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.3.2 A Social Context-Aware Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
A
Data Repository . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
B
Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
C
Relevant Activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Authors’ Biographies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115