85
A P P E N D I X B
Tools
In this appendix, we introduce some representative online tools for tracking and detecting fake
news on social media.
Hoaxy Hoaxy
1
aims to build a uniform and extensible platform to collect and track misinfor-
mation and fact-checking [125], with visualization techniques to understand the misinformation
propagation on social media.
Data Scraping. e major components included a tracker for the Twitter API, and a set of
crawlers for both fake news and fact checking websites and databases (see FigureB.1). e system
collects data from two main sources: news websites and social media. From the first group we
can obtain data about the origin and evolution of both fake news stories and their fact checking.
From the second group we collect instances of these news stories (i.e., URLs) that are being
shared online. To collect data from such disparate sources, different technologies are used: web
scraping, web syndication, and, where available, APIs of social networking platforms. To collect
data on news stories they use rich site summary (RSS), which allows a unified protocol instead of
manually adapting our scraper to the multitude of web authoring systems used on the Web. RSS
feeds contain information about updates made to news stories. e data is collected from news
sites using the following two steps: when a new website is added to our list of monitored sources,
a “deep crawl of its link structure is performed using a custom Python spider written with the
Scrapy framework; at this stage, the URL of the RSS feed is identified if it is available. Once
all existing stories have been acquired, a light crawl is performed every two hour by checking
its RSS feed only. To perform the “deep crawl, we use a depth first strategy. e light crawl
is instead performed using a breadth-first approach.
Analysis Dashboard. Hoaxy provides various visualization interfaces to demonstrate the
news spreading process. As shown in Figure B.2, we demonstrate the major functionalities on
the analysis dashboard. On the top, users can search any news articles by providing specific
keywords. On the left side, it demonstrates the temporal trendiness of the user engagements
for the news articles. On the right side, it illustrates the propagation network on Twitter, which
clearly convey the information on who spreads the news tweets from whom. In addition, they
also evaluate the bot score for all users with BotMeter [31].
1
https://hoaxy.iuni.iu.edu/
86 B. TOOLS
Social Networks
URL Tracker
(stream api)
RSS Parser
Scrapy Spider
News Sites
Monitors
Analysis Dashboard
Fetch
Database
Store
API Crawler
+
Figure B.1: e framework of Hoaxy. Based on [125].
FakeNewsTracker FakeNewsTracker
2
is a system for fake news data collection, detection,
and visualization on social media [133]. It mainly consists of the following components (see
Figure B.3): (1) fake news collection; (2) fake news detection; and (3) fake news visualization.
Collecting Fake News Data. Fake news is widely spread across various online platforms.
We use some of the fact-checking websites like PolitiFact as a source for collecting fake news
information. On these fact-checking sites, fake news information is provided by the trusted
authors and relevant claims are made by the authors on why the mentioned news is not true.
e detailed collection procedure is described in Figure A.1.
Detecting Fake News. A deep learning model is proposed to learn neural textual features
from news content, and temporal representations from social context simultaneously to predict
fake news. An auto-encoder [67] is used to learn the feature representation of news articles, by
reconstructing the news content, and LSTM is utilized to learn the temporal features of user
2
http://blogtrackers.fulton.asu.edu:3000/
B. TOOLS 87
Figure B.2: e main dashboard of Hoaxy website.
engagements. Finally, the learned feature representations of news and social engagements are
fused to predict fake news.
Visualization Fake News in Twitter. We have developed a fake news visualization as
shown in Figure B.4 for the developing insights on the collected data through various interfaces.
We demonstrate the temporal trends of the number of tweets spreading fake and real news in a
specific time period, as in Figure B.4a.
In addition, we can explore the social network structure among users in the propagation
network (see Figure B.4b for an example), and further compare the differences between the users
who interact with the fake news and the true news.
For identifying the differences in the news content of the true news and the fake news we
have used word cloud representation of the words for the textual data. We search for fake news
within a time frame and identify the relevant data. In addition, we provide the comparison of
feature significance and model performance as part of this dashboard. Moreover, we could see
how fake news is spread around certain areas using the geo-locations of tweets.
88 B. TOOLS
Fake News
Collection
Fake News
Visualization
Fake News
Detection
Twitter Adv.
Search
Crawler
Fact
Checking
Crawler
News
Content
Crawler
Database
Tweet
Engagement
Crawler
Fake News
Detection
Figure B.3: e framework of FakeNewsTracker. Based on [133].
dEFEND dEFEND
3
is a fake news detection system that are also able to provide explainable
user comments on Twitter. dEFEND (see Figure B.5) mainly consists of two major components:
a web-based user interface and a backend which integrates our fake news detection model.
e web-based interface provides users with explainable fact-checking of news. A user can
input either the tweet URL or the title of the news. A screenshot was shown in Figure B.6. On
typical fact-checking websites, a user just sees the check-worthy score of news (like Gossip Cop
4
)
or each sentence (like ClaimBuster.
5
) In our approach, the user cannot only see the detection
result (in the right of Figure B.6a), but also can find all the arguments that support the detection
result, including crucial sentences in the article (in the middle of Figure B.6b) and explainable
comments from social media platforms (in the right of Figure B.6b). At last, the user can also
review the results and find related news/claims.
e system also provides exploratory search functions including news propagation net-
work, trending news, top claims and related news. e news propagation network (in the left
of Figure B.6b) is to help readers understand the dynamics of real and fake news sharing, as
fake news are normally dominated by very active users, while real news/fact checking is a more
3
http://fooweb-env.qnmbmwmxj3.us-east-2.elasticbeanstalk.com/
4
https://www.gossipcop.com/
5
https://idir-server2.uta.edu/claimbuster/
B. TOOLS 89
(a) User interface of trend on news spreading
(b) User interface on news propagation networks
Figure B.4: Demonstration of FakeNewsTracker system.
90 B. TOOLS
Figure B.5: e framework of dEFEND.
grass-roots activity [125]. Trending news, top claims, and related news (in the lower left of
Figure B.6a) can give some query suggestions to users.
e backend consists of multiple components: (1) a database to store the pre-trained re-
sults as well as a crawler to extract unseen news and its comments; (2) the dEFEND algorithm
module based on explainable deep learning fake news detection (details in Section 4.3.2), which
gives the detection result and explanations simultaneously; and (3) an exploratory component
that shows the propagation network of the news, trending and related news.
Exploratory Search. e system also provides users with browsing functions. Consider
a user who doesnt know what to check specifically. By browsing the trending news, top claims
and news related to the previous search right below the input box, the user can get some ideas
about what he could do. News can be the coverage of an event, such as “Seattle Police Begin
Gun Confiscations: No Laws Broken, No Warrant, No Charges” and claim is the coverage
around what a celebrity said, such as Actor Brad Pitt: Trump Is Not My President, We Have
No Future With is....”’ Users can search these titles by clicking on them. e news related to
the users previous search is recommended. For example, news “Obamas Health Care Speech
to Congress” is related to the query “Its Better For Our Budget If Cancer Patients Die More
Quickly.”
B. TOOLS 91
Figure B.6: Demonstration of dEFEND system.
Explainable Fact-Checking. Consider a user who wants to check whether Tom Price has
said “Its Better For Our Budget If Cancer Patients Die More Quickly.” e user first enters the
tweet URL or the title of a news in the input box in Figure B.6a. e system would return the
check-worthy score, the propagation network, sentences with explainable scores, and comments
with explainable scores to the user in Figure B.6b. e user can zoom in the network to check
the details of the diffusion path. Each sentence is shown in the table along with its score. e
higher the score, the more likely the sentence contains check-worthy factual claims. e lower
the score, the more non-factual and subjective the sentence is. e user can sort the sentences
either by the order of appearance or by the score. Comments’ explainable scores are similar to
sentences’. e top-5 comments are shown in the descending order of their explainable score.
92 B. TOOLS
NewsVerify NewsVerify
6
is a real-time news verification system which can detect the credi-
bility of an event by providing some keywords about it [184].
NewsVerify mainly contains three stages: (1) crawling data; (2) building an ensemble
model; and (3) visualizing the results. Given the keywords and time range of a news event, the
related microblogs can be collected through the search engine of Sina Weibo. Based on these
messages, the key users and microblogs can be extracted for further analysis. e key users are
used for information source certification while the key microblogs are used for propagation and
content certification. All the data above are crawled through distributed data acquisition system
which will be illustrated below. After three individual models have been developed, the scores
from the above mentioned models are combined via weighted combination. Finally, an event
level credibility score is provided, and each single model will also have a credibility score that
measure the credibility of corresponding aspect. To improve the user experience of our appli-
cation, the results are visualized from various perspectives, which provide useful information of
events for further investigation.
Data Acquisition. ree kinds of information are collected: microblogs, propagation, and
microbloggers. Like most distributed system, NewsVerify also has master node and child nodes.
e master node is responsible for task distribution and results integration while child node
process the specific task and store the collected data in the appointed temporary storage space.
e child node will inform the master node after all tasks finished. en, master node will merge
all slices of data from temporary storage space and stored the combined data in permanent
storage space. After above operations, the temporary storage will be deleted. e distributed
system is based on ZooKeeper,
7
a centralized service for maintaining configuration information,
naming, providing distributed synchronization, and providing group services. As the attributes
of frequent data interaction, stored, read, we adopt efficient key-val database Redis to handle
the real-time data acquisition task. Redis, working with an in-memory dataset, can achieve
outstanding performance.
Model Ensemble. Different individual models are built to verify the truthfulness of news
pieces from the perspective of news content, news propagation, and information source (see
Figure B.7). e content-based model is based on hierarchical propagation networks [58]. e
credibility network has three layers: message layer, sub-event layer and event layer. Following
that, the semantic and structure features are exploited to adjust the weights of links in the net-
work. Given a news event and its related microblogs, sub-events are generated by clustering
algorithm. Sub-event layer is constructed to capture implicit semantic information within an
event. Four types of network links are made to reflect the relation between network nodes. e
intra-level links (Message to Message, Sub-event to Sub-event) reflect the relations among enti-
ties of a same type while the inter level links (Message to Sub-event, Sub-event to Event) reflect
the impact from level to level. After the network constructed, all entities are initialize with cred-
6
https://www.newsverify.com/
7
http://zookeeper.apache.org/
B. TOOLS 93
Extract Unauthorized Key Elements
UGC News Clues
Propagation Model
UGC Credibility score = S
content
S
propa
S
user
User Model
Key Users Key Tweets
Tweets
Social Network
Content Model
Figure B.7: e framework of NewsVerify system. Based on [184].
ibility values using classification results. We formulate this propagation as a graph optimization
problem and provides a global optimal solution to it. e propagation-based model propose to
compute a propagation influence score over time to capture the temporal trends. e informa-
tion source based model utilize the sentiment and activeness degree as features to help predict
fake news. From the aforementioned models, an individual score is obtained. en a weighted
logistic regression model can be used to ensemble the result and produce an overall score for the
news piece.
Interface Visualization. Figure B.8 illustrate the interface of NewsVerify system. It allows
users to report fake news, and search specific news to verify by providing keywords to the system.
It also automatically show the degree of veracity for Weibo data of different categories. For each
Weibo in the time-line, NewsVerify shows the credibility score to justify how likely the Weibo
is related to fake news. In addition, it allows interested users to click View detailed analysis” to
learn more about the news. As shown in Figure B.9, it mainly demonstrates: (1) the introduction
of the news event including the time, source, related news, etc.; (2) the changes of trends and
topics over time related to the news events; (3) the profiles and aggregated statistics of users who
are engaged in the news spreading process such as the key communicator, sex ratio, certification
ratio; and (4) the images or videos related to the news events.
94 B. TOOLS
Figure B.8: e interface of NewsVerify system.
Figure B.9: Demonstration of detail news analysis of NewsVerify system.