3.2. POST-BASED DETECTION 33

3.2 POST-BASED DETECTION

Users who are involved in the news dissemination process express their opinions and emotions

via posts/comments. ese user response provide helpful signals related to the veracity of news

claims. Recent research looks into user stance, user emotion, and post credibility to improve the

performance of fake news detection. We begin by introducing stance-aggregated modeling.

3.2.1 STANCE-AGGREGATED MODELING

Stances (or viewpoints) indicate the users’ opinions toward the news, such as supporting, oppos-

ing, etc. Typically, fake news can provoke tremendous controversial views among social media

users, in which denying and questioning stances are found to play a crucial role in signaling

claims as being fake.

e stance of users’ posts can be either explicit or implicit. Explicit stances are direct

expressions of emotion or opinion, such as Facebook’s “like” actions. Implicit stances can be

automatically extracted from social media posts.

Probabilistic Stance Modeling Consider the scenario where the stances are explicitly ex-

pressed in “like” actions on social media. Let A D fa

;    ; a

g denote the set of news

articles, and U D fu

;    ; u

g represent the set of users engaged in “like” actions. We

ﬁrst construct a bipartite graph .U [ A; L/, where L is the set of likes actions. e idea is that

users express “like” actions due to both the user reputations and news qualities. e users and

news items can be characterized by the Beta distributions, Beta.˛

; ˇ

/ and Beta.˛

; ˇ

/, respec-

tively. Beta.˛

; ˇ

/ represents the reputation or reliability of user u

. Beta.˛

; ˇ

/ represents the

veracity of news a

. Intuitively, for a user u

, ˛

 1 represents the number of times u

likes real

news pieces and ˇ

 1 denotes the number of times u

likes fake news pieces. For a news piece

, ˛

shows the number of likes a

receives and ˇ

means the number of non-likes a

receives.

e expectation values of the Beta distribution are used to estimate the degree of user reputa-

tion (p

Cˇ

) or news veracity (p

Cˇ

). To predict whether a piece of news is fake or

not, the linear transformation of p

is computed: y

D 2p

 1 D

ˇ

Cˇ

, where a positive value

indicates true news; otherwise it’s fake news.

News Veracity Inference Let the training set consists of two subsets A

; A

 A for labeled

fake and true news, and ˆ

D fu

j.u

; a

/ 2 Lg and ˆ

D fv

j.u

; a

/ 2 Lg. e labels are set

as y

D 1 for all a

2 A

, and y

D 1 for all a

2 A

, and y

D 0 for unlabeled news pieces.

e parameter optimization of user u

is performed iteratively by following updating functions:

34 3. HOW SOCIAL CONTEXT HELPS

D

>0;i2ˆ

D



<0;i2ˆ

D.˛

 ˇ

/=.˛

C ˇ

(3.13)

where 

and 

are the prior base constants indicating the degree the user believing the fake

or true news. Similarly, the parameter of news a

is updated as

D

>0;j 2ˆ

D



<0;j 2ˆ

D.˛

 ˇ

/=.˛

C ˇ

(3.14)

where 

and 

are the prior constants indicating the ratio of fake or true news. In this way,

the stance (like) information is aggregated to optimize the parameters, which can be used to

predict the news veracity using y

[147].

We can infer the implicit stance values from social media posts, which usually requires a

labeled stance dataset to train a supervised model. e inferred stance scores then serve as the

input to perform fake news classiﬁcation.

Users News

Fake News

Real News

Probabilistic

Stance

Modeling

News Veracity

Inference

~ Beta (α

, β

)

~ Beta (α

, β

)









... ...

Liking

√

𝖷

Figure 3.3: e illustration of stance aggregation framework: (1) probabilistic stance modeling

and (2) news veracity inference.

3.2. POST-BASED DETECTION 35

3.2.2 EMOTION-ENHANCED MODELING

Fake news publishers often aims to spread information extensively and draw wide public atten-

tion. Long-standing social science studies demonstrate that the news which evokes high-arousal,

or activating (awe, anger, or anxiety) emotions is more viral on social media [42, 146]. To achieve

this goal, fake news publishers commonly adopt two approaches. First, publishers post news with

intense emotions which trigger a high level of physiological arousal in the crowd. For example,

in Figure 3.4a, the publisher uses rich emotional expressions (e.g., Oh my god!) to make this

information more impressive and striking. Second, publishers may present the news objectively

to make it convincing whose content, however, is controversial which evoke intense emotion

in the public, and ﬁnally spreads widely. As another example (see Figure 3.4b), the publisher

writes the post in a unemotional way, while the statement that China ranks second to the

last suddenly bring on tension in the crowd, and people express their feeling of anger (e.g.,

most ridiculous), shock, and doubt (e.g., seriously?) in comments.

(a) Emotion in News Content (a) Emotion in User Comments

Figure 3.4: Two fake news posts from Sina Weibo. (a) A post which contains emotions of aston-

ishment and sadness in news content that easily arouses the audience. (b) A post which contains

no emotion, but raises emotions like doubt and anger in user comments by controversial topics.

Based on [47].

e end-to-end emotion based fake news detection framework (see Figure 3.5) consists

of three major components: (i) the content module mine the information from the publisher,

including semantics and emotions in news contents; (ii) the comment module capture semantics

36 3. HOW SOCIAL CONTEXT HELPS



con com



Softmax

Gate_M

Gate_N

Content Module Comment Module

Gate_C

Bi-GRU

Bi-GRU Bi-GRU Bi-GRU Bi-GRU

Bi-GRU

··················

······

······ ······ ······ ······

············

Figure 3.5: e proposed framework consists of three components: (1) the news content module;

(2) the user comments module; and (3) the fake news prediction component. e previous two

modules are used to model semantics and emotions from the publisher and users, respectively,

while the prediction part fuses information of these two modules and makes prediction. ree

gates (Gate_N, Gate_C, and Gate_M) are used for multi-modal fusion in diﬀerent layers.

and emotion information from users; and (iii) the fake news prediction component fuses the

features from both news content and user comments and predict fake news.

Learning Emotion Embeddings Early studies primarily use hand-crafted features for repre-

senting emotion of text, which highly rely on sentiment dictionaries. ere are several widely

used emotion dictionaries such as WordNet [62], SlangSD [170], and MPQA [168] for En-

glish and HowNet

for Chinese. However, this method may encounter problems of emotion

migration and low coverage on social media, because of the diﬀerences of sentiment word usage

on social media and in the real word. In addition, some existing tools such as Vader [55] are

designed to predict sentiment for a general purpose on social media, which may not be speciﬁc

for fake news detection and the resultant numeric sentiment score is not easily embedded to

deep learning models.

erefore, we adopt the deep learning emotion prediction model [4] to learn the task-

speciﬁc sentiment embedding for both news contents and user comments. Inspired by recent

advancements on deep learning for emotion modeling [4], we train a recurrent neural network

http://www.keenage.com/html/e_index.html

3.2. POST-BASED DETECTION 37

(RNN) to learn the emotion embedding vectors. Following traditional settings [54], we ﬁrst

obtain a large-scale real-world datasets that contain emotions, and use the emotions as the emo-

tion labels, and then initialize each word with one-hot vector. After initiation, all word vectors

pass an embedding layer which project each words from the original one-hot space into a low

dimensional space, and then sequentially fed into a one-layer GRU model. en, through back-

propagation, the embedding layer get updated during training, producing emotion embedding

for each word w

Incorporating Emotion Representations We introduce how to incorporate emotion embed-

dings to news contents and user comments to learn the representations for fake news detection.

We can learn the basic textual feature representations through a bidirectional GRU word en-

coder as in Section 2.1.3. For each word w

, the word embedding vector w

is initialized with the

pre-trained word2vec [90]. e bidirectional GRU contains the forward GRU

!

f which reads

each sentence from word w

to w

and a backward GRU



f which reads the sentence from

word w

to w

!

!

GRU.w

/; i 2 Œ0; n;





GRU.w

/; i 2 Œ0; n:

(3.15)

For a given word w

, we could obtain its word encoding vector h

by concatenating the forward

hidden state

!

and backward hidden state



, i.e., h

D Œ

!

;



.

Similarly to the word encoder, we adopt bidirectional GRU to model the emotion feature

representations for the words. After we obtain the emotion embedding vectors e

, we can learn

the emotion encoding h

for word w

!

!

GRU.e

/; i 2 Œ0; n;





GRU.e

/; i 2 Œ0; n;

(3.16)

for a given word w

, we could obtain its emotion encoding vector h

by concatenating the for-

ward hidden state

!

and backward hidden state



, i.e., h

D Œ

!

;



.

e overall emotion information of news content is also important when deciding how

much information from emotion embedding should be absorbed for the words. For a given post

a, we extract the emotion features included in [22] and also add some emotion features. ere

are 19 features regarding emotion aspects of news, including numbers of positive/negative words,

sentiment score, etc. News emotion features of a is denoted as se.

Gate_N is applied to learn information jointly from word embedding, emotion embed-

ding and sentence emotion features, and yield new representation for each word (see Figure 3.5).

e units in Gate_N is motivated by the forget gate and input gate in LSTM. In Gate_N, two

emotion inputs corporately decide the value of r

and u

with two sigmoid layers, which are

used for manage how much information from semantic and emotion is added into the new rep-

resentation. Meanwhile, a dense layer transfer the emotion inputs to the same dimensional space

38 3. HOW SOCIAL CONTEXT HELPS

of word embedding. Mathematically, the relationship between inputs and output of T_Gate is

deﬁned as the following formulas:

D .W

 Œse; h

 C b

D .W

 Œse; h

 C b

D tanh.W

 Œse; h

 C b

D r

ˇ h

C u

ˇ c

(3.17)

Comment module explore the semantic and emotion information from the users in the

event. e architecture of comment module is similar to content module’s except: (1) all com-

ments are ﬁrst concatenated before fed into BiGRUs; (2) there is no sentence emotion features;

and (3) Gate_C is used for fusion. Gate_C is introduced for fusion in comment module. Dif-

ferent from Gate_N, there are only two modalities. We adopt the update gate in GRU to control

the update of information in fusion process (see Figure 3.5). Two inputs jointly yield a update

gate vector u

through a sigmoid layer. A dense layer create a vector of new candidate values, h

which has the same dimension as the w

. e ﬁnal output n

is a linear interpolation between

the w

and h

. Mathematically, the following formulas represent the process:

D .W

 Œh

; h

 C b

D tanh.W

 h

C b

D u

ˇ h

C .1  u

/ ˇ c

(3.18)

Emotion-Based Fake News Detection Here, Gate_M fuse the high-level representation of

content module and comment module, and then yield a representation vector n (see Figure 3.5).

Mathematically, following equations demonstrate the internal relationship of Gate_M:

r D .W

 Œcon; com C b

o D r ˇ con C .1  r/ ˇ com:

(3.19)

We use a fully connected layer with softmax activation to project the new vector n into the target

space of two classes: fake news and real news, and gain the probability distribution:

Oy D softmax.W

o C b

/; (3.20)

where Oy D Œ Oy

; Oy

 is the predicted probability vector with Oy

and Oy

indicate the predicted prob-

ability of label being 0 (real news) and 1 (fake news), respectively. y 2 f0; 1g denotes the ground

truth label of news. b

2 R

12

is the bias term. us, for each news piece, the goal is to minimize

the cross-entropy loss function as follows:

L./ D y log. Oy

/  .1  y/ log.1  Oy

(3.21)

where  denotes the parameters of the network.

3.2. POST-BASED DETECTION 39

3.2.3 CREDIBILITY-PROPAGATED MODELING

Credibility-propagated models aims to infer the veracity of news pieces from the credibility

of the posts on social media through network propagation. e basic assumption is that the

credibility of a given news event is highly related to the credibility degree of its relevant social

media posts [59]. Since posts are correlated in terms of their viewpoints toward the news piece,

we need to collect all relevant social media posts and represent a credibility network among them.

en, we can explore the correlations among posts to optimize the credibility values, which can

be averaged as the score for predicting fake news.

Representing a Credibility Network

We can ﬁrst build a credibility network structure among all the posts C D fc

;    ; c

g for a news

piece a (see Figure 3.6). Credibility network initialization consists of two parts: node initializa-

tion and link initialization. First, we can obtain the initial credibility score vector of nodes T

from pre-trained classiﬁers with features extracted from external training data. e link is de-

ﬁned by mining the viewpoint relations, which are the relations between each pair of viewpoint

such as contradicting or same. e basic idea is that posts with same viewpoints form supporting

relations which raise their credibilities, and posts with contradicting viewpoints form opposing

relations which weaken their credibilities. Speciﬁcally, a social media post c

is modeled as a

multinomial distribution 

over K topics, and a topic k is modeled as a multinomial distribu-

tion

over L viewpoints. e probability of a post c

over topic k along with L viewpoints is

denoted as p

D 



. e distance between two posts c

and c

are measured by using the

Jensen–Shannon Distance: Dis.c

; c

/ D D

jjp

Fake News

Real News

Supporting OpposingOriginal Credibility Network

Average

Credibility

Scores

Mining

Conﬂicting

Viewpoints

√

𝖷





















Figure 3.6: An illustration of leveraging post credibility to detect fake news.

e supporting or opposing relation indicator is determined as follows: it’s assumed that

one post contains a major topic-viewpoint, which can be deﬁned as the largest proportion of

. If the major topic-viewpoints of two posts c

and c

are clustered together (they take the

40 3. HOW SOCIAL CONTEXT HELPS

same viewpoint), then they are mutually supporting; otherwise, they are mutually opposing. e

similarity/dissimilarity measure of two posts are deﬁned as:

f .c

; c

/ D

.1/

jjp

/ C 1

;

(3.22)

where b is the link type indicator, and if b D 0, then c

and c

take the same viewpoint; otherwise,

b D 1.

Propagating Credibility Values

e goal is to optimize the credibility values of each node (i.e., social media post), and infer the

credibility value of corresponding news items [59]. Posts with supporting relations should have

similar credibility values; posts with opposing relations should have opposing credibility values.

In the credibility network, there are: (i) a post credibility vector T D fo.c

/; o.c

/; :::; o.c

/g with

o.c

/ denoting the credibility value of post c

; and (ii) a matrix W 2 R

nn

, where W

D f .c

; c

which denotes the viewpoint correlations between post c

and c

, that is, whether the two posts

take supporting or opposing positions. erefore, the objective to propagate credibility scores

can be deﬁned as a network optimization problem as below:

Q.T/ D

i;j D1

o.c

 e

o.c

C .1  /kT  T

;

(3.23)

where

D is a diagonal matrix with

j and e

D 1, if W

 0; otherwise e

D 0.

e ﬁrst component is the smoothness constraint which guarantees the two assumptions of

supporting and opposing relations; the second component is the ﬁtting constraint to ensure

variables not change too much from their initial values; and  is the regularization parameter.

en the credibility propagation on the proposed network G

is formulated as the minimization

of this loss function:



D argmin

Q.T/: (3.24)

e optimum solution can be solved by updating T in an iterative manner through the tran-

sition function T.t/ D HT.t  1/ C .1  T

/, where H D

1=2

. As the iteration

converges, each post receives a ﬁnal credibility value, and the average of them is served as the

ﬁnal credibility evaluation result for the news.

3.3 NETWORK-BASED DETECTION

Recent advancements of network representation learning, such as network embedding and deep

neural networks, allow us to better capture the features of news from auxiliary information such