3.2. POST-BASED DETECTION 33
3.2 POST-BASED DETECTION
Users who are involved in the news dissemination process express their opinions and emotions
via posts/comments. ese user response provide helpful signals related to the veracity of news
claims. Recent research looks into user stance, user emotion, and post credibility to improve the
performance of fake news detection. We begin by introducing stance-aggregated modeling.
3.2.1 STANCE-AGGREGATED MODELING
Stances (or viewpoints) indicate the users’ opinions toward the news, such as supporting, oppos-
ing, etc. Typically, fake news can provoke tremendous controversial views among social media
users, in which denying and questioning stances are found to play a crucial role in signaling
claims as being fake.
e stance of users’ posts can be either explicit or implicit. Explicit stances are direct
expressions of emotion or opinion, such as Facebooks “like” actions. Implicit stances can be
automatically extracted from social media posts.
Probabilistic Stance Modeling Consider the scenario where the stances are explicitly ex-
pressed in like” actions on social media. Let A D fa
1
; ; a
j
; ; a
N
g denote the set of news
articles, and U D fu
1
; ; u
j
; ; u
m
g represent the set of users engaged in like actions. We
first construct a bipartite graph .U [ A; L/, where L is the set of likes actions. e idea is that
users express like actions due to both the user reputations and news qualities. e users and
news items can be characterized by the Beta distributions, Beta
i
; ˇ
i
/ and Beta
j
; ˇ
j
/, respec-
tively. Beta
i
; ˇ
i
/ represents the reputation or reliability of user u
i
. Beta
j
; ˇ
j
/ represents the
veracity of news a
j
. Intuitively, for a user u
i
, ˛
i
1 represents the number of times u
i
likes real
news pieces and ˇ
i
1 denotes the number of times u
i
likes fake news pieces. For a news piece
a
j
, ˛
j
shows the number of likes a
j
receives and ˇ
j
means the number of non-likes a
j
receives.
e expectation values of the Beta distribution are used to estimate the degree of user reputa-
tion (p
i
D
˛
i
˛
i
Cˇ
i
) or news veracity (p
j
D
˛
j
˛
j
Cˇ
j
). To predict whether a piece of news is fake or
not, the linear transformation of p
j
is computed: y
j
D 2p
j
1 D
˛
j
ˇ
j
˛
j
Cˇ
j
, where a positive value
indicates true news; otherwise its fake news.
News Veracity Inference Let the training set consists of two subsets A
F
; A
T
A for labeled
fake and true news, and ˆ
i
D fu
i
j.u
i
; a
j
/ 2 Lg and ˆ
j
D fv
j
j.u
i
; a
j
/ 2 Lg. e labels are set
as y
j
D 1 for all a
j
2 A
F
, and y
j
D 1 for all a
j
2 A
T
, and y
j
D 0 for unlabeled news pieces.
e parameter optimization of user u
i
is performed iteratively by following updating functions:
34 3. HOW SOCIAL CONTEXT HELPS
˛
i
D
˛
C
X
y
i
>0;i2ˆ
i
y
i
ˇ
i
D
ˇ
X
y
i
<0;i2ˆ
i
y
i
y
i
D
i
ˇ
i
/=.˛
i
C ˇ
i
/;
(3.13)
where
˛
and
ˇ
are the prior base constants indicating the degree the user believing the fake
or true news. Similarly, the parameter of news a
j
is updated as
˛
j
D
0
˛
C
X
y
j
>0;j 2ˆ
j
y
j
ˇ
j
D
0
ˇ
X
y
j
<0;j 2ˆ
j
y
j
y
j
D
j
ˇ
j
/=.˛
j
C ˇ
j
/;
(3.14)
where
0
˛
and
0
ˇ
are the prior constants indicating the ratio of fake or true news. In this way,
the stance (like) information is aggregated to optimize the parameters, which can be used to
predict the news veracity using y
j
[147].
We can infer the implicit stance values from social media posts, which usually requires a
labeled stance dataset to train a supervised model. e inferred stance scores then serve as the
input to perform fake news classification.
Users News
Fake News
Real News
Probabilistic
Stance
Modeling
News Veracity
Inference
u
i
~ Beta (α
i
, β
i
)
a
j
~ Beta (α
j
, β
j
)
u
u
u
m
a
a
a
N
... ...
Liking
𝖷
Figure 3.3: e illustration of stance aggregation framework: (1) probabilistic stance modeling
and (2) news veracity inference.
3.2. POST-BASED DETECTION 35
3.2.2 EMOTION-ENHANCED MODELING
Fake news publishers often aims to spread information extensively and draw wide public atten-
tion. Long-standing social science studies demonstrate that the news which evokes high-arousal,
or activating (awe, anger, or anxiety) emotions is more viral on social media [42, 146]. To achieve
this goal, fake news publishers commonly adopt two approaches. First, publishers post news with
intense emotions which trigger a high level of physiological arousal in the crowd. For example,
in Figure 3.4a, the publisher uses rich emotional expressions (e.g., Oh my god!) to make this
information more impressive and striking. Second, publishers may present the news objectively
to make it convincing whose content, however, is controversial which evoke intense emotion
in the public, and finally spreads widely. As another example (see Figure 3.4b), the publisher
writes the post in a unemotional way, while the statement that China ranks second to the
last suddenly bring on tension in the crowd, and people express their feeling of anger (e.g.,
most ridiculous), shock, and doubt (e.g., seriously?) in comments.
(a) Emotion in News Content (a) Emotion in User Comments
Figure 3.4: Two fake news posts from Sina Weibo. (a) A post which contains emotions of aston-
ishment and sadness in news content that easily arouses the audience. (b) A post which contains
no emotion, but raises emotions like doubt and anger in user comments by controversial topics.
Based on [47].
e end-to-end emotion based fake news detection framework (see Figure 3.5) consists
of three major components: (i) the content module mine the information from the publisher,
including semantics and emotions in news contents; (ii) the comment module capture semantics
36 3. HOW SOCIAL CONTEXT HELPS
n
con com
n
n
n
n
n
h
w
h
n
w
h
w
h
n
w
h
e
h
n
e
h
e
h
n
e
w
w
n
e
e
n
e
e
n
w
w
n
se
Softmax
Gate_M
Gate_N
Content Module Comment Module
Gate_C
Bi-GRU
Bi-GRU Bi-GRU Bi-GRU Bi-GRU
Bi-GRU
··················
······
······ ······ ······ ······
············
Figure 3.5: e proposed framework consists of three components: (1) the news content module;
(2) the user comments module; and (3) the fake news prediction component. e previous two
modules are used to model semantics and emotions from the publisher and users, respectively,
while the prediction part fuses information of these two modules and makes prediction. ree
gates (Gate_N, Gate_C, and Gate_M) are used for multi-modal fusion in different layers.
and emotion information from users; and (iii) the fake news prediction component fuses the
features from both news content and user comments and predict fake news.
Learning Emotion Embeddings Early studies primarily use hand-crafted features for repre-
senting emotion of text, which highly rely on sentiment dictionaries. ere are several widely
used emotion dictionaries such as WordNet [62], SlangSD [170], and MPQA [168] for En-
glish and HowNet
4
for Chinese. However, this method may encounter problems of emotion
migration and low coverage on social media, because of the differences of sentiment word usage
on social media and in the real word. In addition, some existing tools such as Vader [55] are
designed to predict sentiment for a general purpose on social media, which may not be specific
for fake news detection and the resultant numeric sentiment score is not easily embedded to
deep learning models.
erefore, we adopt the deep learning emotion prediction model [4] to learn the task-
specific sentiment embedding for both news contents and user comments. Inspired by recent
advancements on deep learning for emotion modeling [4], we train a recurrent neural network
4
http://www.keenage.com/html/e_index.html
3.2. POST-BASED DETECTION 37
(RNN) to learn the emotion embedding vectors. Following traditional settings [54], we first
obtain a large-scale real-world datasets that contain emotions, and use the emotions as the emo-
tion labels, and then initialize each word with one-hot vector. After initiation, all word vectors
pass an embedding layer which project each words from the original one-hot space into a low
dimensional space, and then sequentially fed into a one-layer GRU model. en, through back-
propagation, the embedding layer get updated during training, producing emotion embedding
e
i
for each word w
i
.
Incorporating Emotion Representations We introduce how to incorporate emotion embed-
dings to news contents and user comments to learn the representations for fake news detection.
We can learn the basic textual feature representations through a bidirectional GRU word en-
coder as in Section 2.1.3. For each word w
i
, the word embedding vector w
i
is initialized with the
pre-trained word2vec [90]. e bidirectional GRU contains the forward GRU
!
f which reads
each sentence from word w
0
to w
M
and a backward GRU
f which reads the sentence from
word w
n
to w
0
:
!
h
w
i
D
!
GRU.w
i
/; i 2 Œ0; n;
h
w
i
D
GRU.w
i
/; i 2 Œ0; n:
(3.15)
For a given word w
i
, we could obtain its word encoding vector h
w
i
by concatenating the forward
hidden state
!
h
w
i
and backward hidden state
h
w
i
, i.e., h
w
i
D Œ
!
h
w
i
;
h
w
i
.
Similarly to the word encoder, we adopt bidirectional GRU to model the emotion feature
representations for the words. After we obtain the emotion embedding vectors e
i
, we can learn
the emotion encoding h
e
i
for word w
i
:
!
h
e
i
D
!
GRU.e
i
/; i 2 Œ0; n;
h
e
i
D
GRU.e
i
/; i 2 Œ0; n;
(3.16)
for a given word w
i
, we could obtain its emotion encoding vector h
e
i
by concatenating the for-
ward hidden state
!
h
e
i
and backward hidden state
h
e
i
, i.e., h
e
i
D Œ
!
h
e
i
;
h
e
i
.
e overall emotion information of news content is also important when deciding how
much information from emotion embedding should be absorbed for the words. For a given post
a, we extract the emotion features included in [22] and also add some emotion features. ere
are 19 features regarding emotion aspects of news, including numbers of positive/negative words,
sentiment score, etc. News emotion features of a is denoted as se.
Gate_N is applied to learn information jointly from word embedding, emotion embed-
ding and sentence emotion features, and yield new representation for each word (see Figure 3.5).
e units in Gate_N is motivated by the forget gate and input gate in LSTM. In Gate_N, two
emotion inputs corporately decide the value of r
t
and u
t
with two sigmoid layers, which are
used for manage how much information from semantic and emotion is added into the new rep-
resentation. Meanwhile, a dense layer transfer the emotion inputs to the same dimensional space
38 3. HOW SOCIAL CONTEXT HELPS
of word embedding. Mathematically, the relationship between inputs and output of T_Gate is
defined as the following formulas:
r
t
D .W
r
Œse; h
e
t
C b
r
/
u
t
D .W
u
Œse; h
e
t
C b
u
/
c
e
t
D tanh.W
c
Œse; h
e
t
C b
c
/
n
t
D r
t
ˇ h
w
t
C u
t
ˇ c
e
t
:
(3.17)
Comment module explore the semantic and emotion information from the users in the
event. e architecture of comment module is similar to content module’s except: (1) all com-
ments are first concatenated before fed into BiGRUs; (2) there is no sentence emotion features;
and (3) Gate_C is used for fusion. Gate_C is introduced for fusion in comment module. Dif-
ferent from Gate_N, there are only two modalities. We adopt the update gate in GRU to control
the update of information in fusion process (see Figure 3.5). Two inputs jointly yield a update
gate vector u
t
through a sigmoid layer. A dense layer create a vector of new candidate values, h
e
t
,
which has the same dimension as the w
t
. e final output n
t
is a linear interpolation between
the w
t
and h
e
t
. Mathematically, the following formulas represent the process:
u
t
D .W
u
Œh
w
t
; h
e
t
C b
u
/
c
e
t
D tanh.W
c
h
e
t
C b
c
/
n
t
D u
t
ˇ h
w
t
C .1 u
t
/ ˇ c
e
t
:
(3.18)
Emotion-Based Fake News Detection Here, Gate_M fuse the high-level representation of
content module and comment module, and then yield a representation vector n (see Figure 3.5).
Mathematically, following equations demonstrate the internal relationship of Gate_M:
r D .W
u
Œcon; com C b
u
/
o D r ˇ con C .1 r/ ˇ com:
(3.19)
We use a fully connected layer with softmax activation to project the new vector n into the target
space of two classes: fake news and real news, and gain the probability distribution:
Oy D softmax.W
f
o C b
f
/; (3.20)
where Oy D ΠOy
0
; Oy
1
is the predicted probability vector with Oy
0
and Oy
1
indicate the predicted prob-
ability of label being 0 (real news) and 1 (fake news), respectively. y 2 f0; 1g denotes the ground
truth label of news. b
f
2 R
12
is the bias term. us, for each news piece, the goal is to minimize
the cross-entropy loss function as follows:
L./ D y log. Oy
1
/ .1 y/ log.1 Oy
0
/;
(3.21)
where denotes the parameters of the network.
3.2. POST-BASED DETECTION 39
3.2.3 CREDIBILITY-PROPAGATED MODELING
Credibility-propagated models aims to infer the veracity of news pieces from the credibility
of the posts on social media through network propagation. e basic assumption is that the
credibility of a given news event is highly related to the credibility degree of its relevant social
media posts [59]. Since posts are correlated in terms of their viewpoints toward the news piece,
we need to collect all relevant social media posts and represent a credibility network among them.
en, we can explore the correlations among posts to optimize the credibility values, which can
be averaged as the score for predicting fake news.
Representing a Credibility Network
We can first build a credibility network structure among all the posts C D fc
1
; ; c
m
g for a news
piece a (see Figure 3.6). Credibility network initialization consists of two parts: node initializa-
tion and link initialization. First, we can obtain the initial credibility score vector of nodes T
0
from pre-trained classifiers with features extracted from external training data. e link is de-
fined by mining the viewpoint relations, which are the relations between each pair of viewpoint
such as contradicting or same. e basic idea is that posts with same viewpoints form supporting
relations which raise their credibilities, and posts with contradicting viewpoints form opposing
relations which weaken their credibilities. Specifically, a social media post c
i
is modeled as a
multinomial distribution
i
over K topics, and a topic k is modeled as a multinomial distribu-
tion
tk
over L viewpoints. e probability of a post c
t
over topic k along with L viewpoints is
denoted as p
ik
D
i
ik
. e distance between two posts c
i
and c
j
are measured by using the
Jensen–Shannon Distance: Dis.c
i
; c
j
/ D D
JS
.p
ik
jjp
jk
/.
Figure 3.6: An illustration of leveraging post credibility to detect fake news.
e supporting or opposing relation indicator is determined as follows: its assumed that
one post contains a major topic-viewpoint, which can be defined as the largest proportion of
p
ik
. If the major topic-viewpoints of two posts c
i
and c
j
are clustered together (they take the
40 3. HOW SOCIAL CONTEXT HELPS
same viewpoint), then they are mutually supporting; otherwise, they are mutually opposing. e
similarity/dissimilarity measure of two posts are defined as:
f .c
i
; c
j
/ D
.1/
b
D
JS
.p
ik
jjp
jk
/ C 1
;
(3.22)
where b is the link type indicator, and if b D 0, then c
i
and c
j
take the same viewpoint; otherwise,
b D 1.
Propagating Credibility Values
e goal is to optimize the credibility values of each node (i.e., social media post), and infer the
credibility value of corresponding news items [59]. Posts with supporting relations should have
similar credibility values; posts with opposing relations should have opposing credibility values.
In the credibility network, there are: (i) a post credibility vector T D fo.c
1
/; o.c
2
/; :::; o.c
n
/g with
o.c
i
/ denoting the credibility value of post c
i
; and (ii) a matrix W 2 R
nn
, where W
ij
D f .c
i
; c
j
/
which denotes the viewpoint correlations between post c
i
and c
j
, that is, whether the two posts
take supporting or opposing positions. erefore, the objective to propagate credibility scores
can be defined as a network optimization problem as below:
Q.T/ D
n
X
i;j D1
jW
ij
j
0
B
@
o.c
i
/
p
N
D
ii
e
ij
o.c
j
/
q
N
D
jj
1
C
A
2
C .1 /kT T
0
k
2
;
(3.23)
where
N
D is a diagonal matrix with
N
D
ii
D
P
k
jW
ik
j and e
ij
D 1, if W
ij
0; otherwise e
ij
D 0.
e first component is the smoothness constraint which guarantees the two assumptions of
supporting and opposing relations; the second component is the fitting constraint to ensure
variables not change too much from their initial values; and is the regularization parameter.
en the credibility propagation on the proposed network G
C
is formulated as the minimization
of this loss function:
T
D argmin
T
Q.T/: (3.24)
e optimum solution can be solved by updating T in an iterative manner through the tran-
sition function T.t/ D HT.t 1/ C .1 T
0
/, where H D
N
D
1=2
W
N
D
1=2
. As the iteration
converges, each post receives a final credibility value, and the average of them is served as the
final credibility evaluation result for the news.
3.3 NETWORK-BASED DETECTION
Recent advancements of network representation learning, such as network embedding and deep
neural networks, allow us to better capture the features of news from auxiliary information such