20 2. WHAT NEWS CONTENT TELLS
news veracity. Knowledge-based approaches use sources that are employed to fact-check claims
in news contents. e goal of fact-checking is to assign a truth value to a claim in a particular
context [158]. Fact-checking has attracted increasing attention, and many efforts have been
made to develop an automated fact-checking system. e goal is to assess news authenticity by
comparing the information extracted from to-be-verified news content with known knowledge.
Existing fact-checking approaches can be categorized as Manual Fact-checking and Automatic
Fact-checking.
2.4.1 MANUAL FACT-CHECKING
Manual fact-checking aims to utilize human experts to provide signals manually of annotating
fake news. It heavily relies on human domain experts or normal users to investigate relevant
data and documents to construct the verdicts of claim veracity. Existing manual fact-checking
approaches mainly fall into: expert-based and crowdsourcing-based fact-checking (Table 2.3).
Table 2.3: Comparison of expert-based and crowdsourcing-based fact checking
Expert-Based Crowdsourcing-Based
Fact-checkers Domain-experts Regular individuals (i.e., collective intelligence)
Annotation reliability High Comparatively low
Scalability Poor Comparatively high
Expert-Based Fact-Checking
Fact-checking heavily relies on human domain experts to investigate relevant data and docu-
ments to deliver the verdicts of claim veracity. However, expert-based fact-checking is an intel-
lectually demanding and time-consuming process, which limits the potential for high efficiency.
We introduce some representative and popular fact-checking websites as follows.
PolitiFact:
2
PolitiFact is a U.S. website that rates the accuracy of claims or statements by
elected officials, pundits, columnists, bloggers, political analysts, and other members of
the media. It is an independent, non-partisan source of online fact-checking system for
political news and information. e editors examine the specific word and the full context
of a claim carefully, and then verify the reliability of the claims and statements. e label
types include true, mostly true, half true, mostly false, false, and pants on fire.
Snopes:
3
Snopes is widely known as one of the first online fact-checking websites for
validating and debunking urban legends. It covers a wide range of disciplines including
2
www.politifact.com/
3
http://www.snopes.com/
2.4. KNOWLEDGE-BASED METHODS 21
automobiles, business, computers, crime, fraud and scams, history, and so on. e label
types include true and false.
FactCheck:
4
FactCheck is a nonprofit “consumer advocate webpage” for voters that aims
to reduce the level of deception and confusion in U.S. politics. ose claims and statements
are originated from various platforms, including TV advertisements, debates, speeches, in-
terviews, new releases, and social media. ey mainly focus on presidential candidates in
presidential election years, and evaluate the factual accuracy of their statements systemat-
ically.
GossipCop:
5
GossipCop investigates entertainment stories that are published in maga-
zines and newspapers, as well as on the Web, to ascertain whether they are true or false.
ey provide the score scaling from 0–10, where 0 means fake and 10 mean real.
TruthOrFiction:
6
TruthOrFiction is a non-partisan online website that provide fact-
checking results on warnings, hoaxes, virus warnings, and humorous or inspirationalstories
that are distributed through emails. It mainly focuses on misleading information that are
popular via forwarded emails. And they rate stories or information by the following cat-
egories: truth, fiction, reported to be truth, unproven, truth and fiction, previously truth,
disputed, and pending investigation.
Crowdsourcing-Based Fact-Checking
Fact-checking exploits the wisdom of crowd” to enable people to annotate news content. ese
annotations are then aggregated to produce an overall assessment of the claim veracity. For ex-
ample, Fiskkit
7
allows users to discuss and annotate the accuracy of specific parts of a news
article. As another example, an anti-fake-news bot named “For real” is a public account in the
communication mobile application LINE,
8
which allows people to report suspicious news con-
tent which is then further checked by editors.
2.4.2 AUTOMATIC FACT-CHECKING
Manual fact-checking relies on humans annotation, which is usually time-consuming and labor-
intensive. Instead, automatic fact-checking for specific claims largely relies on external knowledge
to determine the truthfulness of a particular claim. Two typical external sources include the open
web and structured knowledge graph. Open web sources are utilized as references that can be
compared with given claims in terms of both the consistency and frequency [10, 84]. Knowl-
edge graphs are integrated from the linked open data as a structured network topology, such as
4
https://www.factcheck.org/
5
https://www.gossipcop.com/
6
https://www.truthorfiction.com/
7
http://fiskkit.com
8
https://grants.g0v.tw/projects/588fa7b382223f001e022944
22 2. WHAT NEWS CONTENT TELLS
DBpedia and Google Relation Extraction Corpus. Fact-checking using a knowledge graph aims
to check whether the claims in news content can be inferred from existing facts in the knowl-
edge graph [29, 129, 171]. Next, we introduce a standard knowledge graph matching approach
that matches news claims with the facts in knowledge graphs.
Path Finding Fake news spreads false claims in news content, so a natural means of detecting
fake news is to check the truthfulness of major claims in the news article. A claim in news
content can be represented by a subject-predicate-object triple .s; p; o/, where the subject entity
s is related to the object entity o by the predicate relation p. We can find all the paths that start
with s and end with o, and then evaluate these paths to estimate the truth value of the claim.
is set of paths, also known as knowledge stream [130], are denoted as P.s; o/. Intuitively, if
the paths involve more specific entities, then the claim is more likely to be true. us, we can
define a specificity measure S.P
s;o
/ as follows:
S.P
s;o
/ D
1
1 C
P
n1
iD2
log d.o
i
/
; (2.16)
where d.o
i
/ is the degree of entity o
i
, i.e., the number of paths that entity o participates. One
approach is to optimize a path evaluation function: .c/ D max W.P
s;o
/, which maps the set
of possible paths connecting s and o (i.e., P
s;o
) to a truth value . If s is already present in the
knowledge graph, it can assign maximum truth value 1; otherwise, the objective function will
be optimized to find the shortest path between s and o.
Flow Optimization We can assume that each edge of the network is associated with two
quantities: a capacity to carry knowledge related to .s; p; o/ across its two endpoints, and a cost
of usage. e capacity can be computed using S.P
s;o
/, and the cost of an edge in knowledge is
defined as c
e
D log d.o
i
/. e goal is to identify the set of paths responsible for the maximum
flow of knowledge between s and o at the minimum cost. e maximum knowledge a path P
s;o
can carry is the minimum knowledge of its edges, also called its bottleneck B.P
s;o
/. us, the
objective can be defined as a minimum cost maximum flow problem
.e/
D
X
P
s;o
2P
s;o
B.P
s;o
/ S.P
s;o
/; (2.17)
where B.P
s;o
/ is denoted as a minimization form: B.P
s;o
/ D minfx
e
j 2 P
s;o
g, with x
e
indicating
the residual capacity of edge x in a residual network [130].
e knowledge graph itself can be redundant, invalid, conflicting, unreliable, and incom-
plete [185]. In these cases, path finding and flow optimization may not be sufficient to obtain
good results of assessing the truth value. erefore, additional tasks need to be considered in
order to reconstruct the knowledge graph and to facilitate its capability as follows.
Entity Resolution: refers to the process of finding related entries in one or more related
relations in a database and creating links among them [19]. is problem has been exten-
2.4. KNOWLEDGE-BASED METHODS 23
sively studied in the database area and applied to data warehousing and business intelli-
gence. Based on this survey [72], existing methods exploit features in three ways, namely
numerical, rule-based, and workflow-based. Numerical approaches combine the similarity
score of each feature into a weighted sum to decide linkage [39]; rule-based approaches
derive match decision through a logical combination of testing separate rules of each fea-
ture with a threshold; workflow-based methods apply a sequence of feature comparison in
an iterative way. Both supervised such as TAILOR [37] and MARLIN [15], and unsu-
pervised approaches such as MOMA [151] and SERF [13] are studied in the literature.
Time Recording: aims to remove outdated knowledge. is task is important giving that
fake news pieces are often related to newly emerging events. Existing work on time record-
ing mainly utilize the Compound Value Type structure to allow facts incorporating begin-
ning and ending date annotations [17], or adding extra assertions to current facts [52].
Knowledge Fusion: (or truth discovery) aims to identify true subject-predicate-object
triples extracted by multiple information extractors from multiple information sources [36,
79]. Truth discovery methods do not explore the claims directly, but rely on a collec-
tion of contradicting sources that record the properties of objects to determine the truth
value. Truth discovery aims to determine the source credibility and object truthfulness at the
same time. Fake news detection can benefit from various aspects of truth discovery ap-
proaches under different scenarios. For example, the credibility of different news outlets
can be modeled to infer the truthfulness of reported news. As another example, relevant
social media posts can also be modeled as social response sources to better determine the
truthfulness of claims [93, 167]. However, there are some other issues that must be con-
sidered to apply truth discovery to fake news detection in social media scenarios. First,
most existing truth discovery methods focus on handling structured input in the form of
subject-predicate-object (SPO) tuples, while social media data is highly unstructured and
noisy. Second, truth discovery methods cannot be well applied when a fake news article
is newly launched and published by only a few news outlets because at that point there is
not enough social media posts relevant to it to serve as additional sources.
Link Prediction: on knowledge graphs aims to predict new fact from existing facts. is
is important since existing knowledge graphs are often missing many facts, and some of
the edges they contain are incorrect. Relational machine learning methods are widely used
to infer new knowledge representations [97], including latent feature models and graph
feature models. Latent feature models exploit the latent features or entities to learn the
possible SPO triples. For example, RESCAL [98] is a bilinear relational learning model
that explain triples through pairwise interactions of latent features. Graph feature mod-
els assume that the existence of an edge can be predicted by extracting features from the
observed edges in the graph, such as Markov logic programming or path ranking algo-
rithms. For example, Markov Random Fields (MRFs) [129] encode dependencies of facts
24 2. WHAT NEWS CONTENT TELLS
into random variables and infer the missing dependencies through statistical probabilistic
learning.