3.3. NETWORK-BASED DETECTION 41
as friendship networks, temporal user engagements, and interaction networks. Network-based
fake news detection aims to leverage the advanced network analysis and modeling methods to
better predict fake news. We introduce representative types of networks for detecting fake news.
3.3.1 REPRESENTATIVE NETWORK TYPES
We introduce several network structures that are commonly used to detect fake news (Fig-
ure 3.7).
Friendship Networks A users friendship network is represented as a graph G
F
D .U ; E
F
/,
where U and E
F
are the node and edge sets, respectively. A node u 2 U represents a user, and
.u
1
; u
2
/ 2 E represents whether a social relation exists.
Homophily theory [87] suggests that users tend to form relationships with like-minded
friends, rather than with users who have opposing preferences and interests. Likewise, social
influence theory [85] predicts that users are more likely to share similar latent interests toward
news pieces. us, the friendship network provides the structure to understand the set of social
relationships among users. e friendship network is the basic route for news spreading and can
reveal community information.
Diffusion Networks A diffusion network is represented as a directed graph G
D
D
.U; E
D
; p; t/, where U and E are the node and edge sets, respectively. A node u 2 U repre-
sents an individual that can publish, receive, and diffuse information at time t
i
2 t . A directed
edge, .u
1
! u
2
/ 2 E
D
, between nodes u
1
; u
2
2 U , represents the direction of information diffu-
sion. Each directed edge .u
1
! u
2
/ 2 E
D
, between nodes u
1
; u
2
2 U , represents the direction
of information diffusion. Each directed edge .u
1
! u
2
/ is assumed to be associated with an
information diffusion probability, p.u
1
! u
2
/ 2 Œ0; 1.
e diffusion network is important for learning about representations of the structure and
temporal patterns that help identify fake news. By discovering the sources of fake news and the
spreading paths among the users, we can also try to mitigate the fake news problem.
Interaction Networks An interaction network G
I
D .fP; U ; Ag; E
I
/ consists of nodes rep-
resenting publishers, users, news, and the edges E
I
indicating the interactions among them.
For example, edge .p ! a/ demonstrates that publisher p publishes news item a, and .v ! u/
represents news a is spread by user u.
e interaction networks can represent the correlations among different types of entities,
such as publisher, news, and social media post, during the news dissemination process [141].
e characteristics of publishers and users, and the publisher-news and news-users interactions
have potential to help differentiate fake news.
Propagation Networks A propagation network G
P
D .C; a/ consists of a news piece a and
the corresponding social media posts C that propagates the news. Note that different types of
posts can occurs such as reposting, replying, commenting, liking, etc. We will introduce that a
42 3. HOW SOCIAL CONTEXT HELPS
u
u
u
u
u
u
c
c
c
c
c
c
c
c
c
c

c

a
a
a
a
p
p
u
u
u
u
u
t
t
u
u
u
u
t
t
Social Relation
(a) Friendship Network
Publishing PropagatingSpreading
Information Propagation
(b) Diffusion Network
(c) Interaction Network (d) Propagation Network
Figure 3.7: Representative network types during fake news dissemination.
3.3. NETWORK-BASED DETECTION 43
propagation network is treated in a hierarchical view, consisting of two levels: macro-level and
micro-level. e macro-level propagation network includes the news nodes, tweet nodes, and
retweet nodes. e micro-level propagation network indicates the conversation tree represented
by reply nodes.
Propagation networks contain rich information from different perspectives such as tem-
poral, linguistic, and structural, which provides auxiliary information for potential improving
the detecting of fake news.
3.3.2 FRIENDSHIP NETWORKING MODELING
e friendship network plays an important role in fake news diffusion. e fact that users are
likely to form echo chambers strengthens our need to model user social representations and to
explore its added value for a fake news study. Essentially, given the friendship network G
F
, we
want to learn latent representations of users while preserving the structural properties of the
network, including first-order and higher-order structure, such as second-order structure and
community structure. For example, Deepwalk [108] preserves the neighborhood structure of
nodes by modeling a stream of random walks. In addition, LINE [150] preserves both first-
order and second-order proximities. Specifically, we measure the first-order proximity by the
joint probability distribution between the user u
i
and u
j
,
p
1
.u
i
; u
j
/ D
1
1 C exp.u
i
T
u
j
/
; (3.25)
where u
i
(u
j
) is the social representation of user u
i
(u
j
). We model the second-order proximity
by the probability of the context user u
j
being generated by the user u
i
, as follows:
p
2
.u
j
ju
i
/ D
exp.u
j
T
u
i
/
P
jV j
kD1
exp.u
k
T
u
i
/
; (3.26)
where jV j is the number of nodes or contexts” for user u
i
. is conditional distribution implies
that users with similar distributions over the contexts are similar to each other. e learning
objective is to minimize the KL-divergence of the two distributions and empirical distributions,
respectively.
Network communities may actually be a more important structural dimension because
fake news spreaders are likely to form polarized groups [49, 136]. is requires the representa-
tion learning methods to be able to model community structures. For example, a community-
preserving node representation learning method, Modularized Nonnegative Matrix Factoriza-
tion (MNMF), is proposed [164]. e overall objective is defined as follows:
min
M;U;H;C0
k S MU
T
k
2
F
ƒ‚
Proximity Mapping
C ˛kH UC
T
k
2
F
ƒ‚
Community Mapping
ˇt r.H
T
BH/
ƒ‚
Modularity Modeling
s.t. t r.H
T
H/ D m
(3.27)
44 3. HOW SOCIAL CONTEXT HELPS
and comprises three major parts: proximity mapping, community mapping, and modularity
modeling. In proximity mapping, S 2 R
mm
is the user similarity matrix constructed from the
user adjacency matrix (first-order proximity) and neighborhood similarity matrix (second-order
proximity), and M 2 R
mk
and U 2 R
mk
are the basis matrix and user representations. For
community mapping, H 2 R
ml
is the user-community indicator matrix that we optimize to
be reconstructed by the product of the user latent matrix U and the community latent matrix
C 2 R
lm
. For modularity modeling, the objective is to maximize the modularity function [96],
where B 2 R
mm
is the modularity matrix.
Tensor factorization can be applied to learn the community-enhanced news representation
to predict fake news [49]. e goal is to incorporate user community information to guide the
learning process of news representation. We first build a three-mode news-user-community
tensor Y 2 R
N mJ
. en we apply the CP/PARAFAC tensor factorization model to factorize
Y into the following:
Y ŒF; U; H D
R
X
r
D
1
r
f
r
ˇ u
r
ˇ h
r
; (3.28)
where ˇ denotes the outer product and f
r
(same for b
r
and h
r
) denotes the normalized rth
column of non-negative factor matrix F (same for B and H), and R is the rank. Each row of F
denotes the representation of the corresponding article in the embedding space.
3.3.3 DIFFUSION NETWORK TEMPORAL MODELING
e news diffusion process involves abundant temporal user engagements on social media [119,
136, 169]. e social news engagements are defined as a set of tuples to represent the process
of how news items spread over time among m users in U D fu
1
; u
2
; :::; u
m
g. Each engagement
fu
i
; t
i
; c
i
g represents that a user u
i
spreads news article at time t
i
by posting c
i
. For example,
a diffusion path between two users u
i
and u
j
exists if and only if: (1) u
j
follows u
i
and (2) u
j
posts about a given news only after u
i
does so.
e goal of learning temporal representations is to capture the users pattern of temporal
engagements with a news article a
j
. Recent advances in the study of deep neural networks, such
as RNNs, have shown promising performance for learning representations. RNNs are powerful
structures that allow the use of loops within the neural network to model sequential data. Given
the diffusion network G
D
, the key procedure is to construct meaningful features x
i
for each
engagement. e features are generally extracted from the contents of c
i
and the attributes of
u
i
. For example, x
i
consists of the following components: x
i
D .; t; u
i
; c
i
/. e first two
variables and t represent the number of total user engagements through time t and the
time difference between engagements, respectively. ese variables capture the general measure
of frequency and time interval distribution of user engagements of the news piece a
j
. For the
content features of users posts, the c
i
are extracted from hand-crafted linguistic features, such
as n-gram features, or by using word embedding methods such as doc2vec [76] or GloVe [106].
3.3. NETWORK-BASED DETECTION 45
We extract the features of users u
i
by performing a singular value decomposition of the user-
news interaction matrix E 2 f0; 1g
mN
, where E
ij
D 1 indicate that user u
i
has engaged in the
process of spreading the news piece a
j
; otherwise E
ij
D 0.
An RNN framework for learning news temporal representations is demonstrated in Fig-
ure 3.8. Since x
i
includes features that come from different information space, such as temporal
and content features, so we do not suggest incorporating x
i
into RNN as the raw input. us, we
add a fully connected embedding layer to convert the raw input x
i
into a standardized input fea-
tures
Q
x
i
, in which the parameters are shared among all raw input features x
i
; i D 1; :::; m. us,
the RNN takes a sequence
Q
x
1
;
Q
x
2
; :::;
Q
x
m
as input. At each time-step i, the output of previous
step h
i1
, and the next feature input
Q
x
i
are used to update the hidden state h
i
. e hidden states
h
i
is the feature representation of the sequence up to time i for the input engagement sequence.
us, the hidden states of final step h
m
is passed through a fully connected layer to learn the
resultant news representation, defined as a
j
D tanh.Wh
m
C b/, b is a bias vector. us, we can
use a
j
to perform fake news detection and related tasks [119].
Embedding Layer
h
x̃
x̃
i
x̃
m
...
h
i
x
i
x
...
h
m
W a
j
x
m
Figure 3.8: An RNN framework for learning news temporal representations.
3.3.4 INTERACTION NETWORK MODELING
Interaction networks describe the relationships among different entities such as publishers, news
pieces, and users. Given the interaction networks the goal is to embed the different types of en-
tities into the same latent space, by modeling the interactions among them. We can leverage
the resultant feature representations of news to perform fake news detection. e framework is
shown as in Figure 3.9, which mainly includes the following components: a news contents em-
bedding component, a user embedding component, a user-news interaction embedding com-
ponent, a publisher-news relation embedding component, and a semi-supervised classification
component. In general, the news contents embedding component describes the mapping of news
from bag-of-word features to latent feature space; the user embedding component illustrates the
extraction of user latent features from user social relations; the user-news interaction embed-
ding component learn the feature representations of news pieces guided by their partial labels
and user credibilities. e publisher-news relation embedding component regularizes the fea-
46 3. HOW SOCIAL CONTEXT HELPS
News
Content
Embedding
User
Embedding
Linear
Classifie
User-News
Interaction
Embedding
Publisher-News
Relation
Embedding
d
m k
k
k
k
k
n
k
m
m
m + r
nr
nr
m + r
r
r
N = N
× k
×
m
= m
=
= p
l
= r
× n
× k
× k
× k
× k
× k
k d
X
A U
U
o
FB
× k
F
L
F
U
y
U
p
q
F
L
p
y
L
L
T U
T
U
T
F
T
F V
T
Interaction Network
Tri-Relationship Embedding Fake News Prediction
L
B W A
p
p
a
a
a
u
u
u
u
Figure 3.9: e framework for interactive network embedding for fake news detection.
ture representations of news pieces through publisher partisan bias labels. e semi-supervised
classification component learns a classification function to predict unlabeled news items.
News Embedding We can use news content to find clues to differentiate fake news and true
news. As in Equation (3.27) from Section 2.4.1, we use NMF we can attempt to project the
document-word matrix to a joint latent semantic factor space with low dimensionality, such that
the document-word relations are modeled as the inner product in the space. We can obtain the
news representation matrix F.
User Embedding On social media, people tend to form relationships with like-minded
friends, rather than with users who have opposing preferences and interests [140]. us, con-
nected users are more likely to share similar latent interests in news pieces. To obtain a stan-
dardized representation, we use nonnegative matrix factorization to learn the user’s latent rep-
resentations (we will introduce other methods in Section 3.3.2). Specifically, giving user-user
adjacency matrix A 2 f0; 1g
mm
, we learn nonnegative matrix U 2 R
mk
C
by solving the follow-
ing optimization problem:
min
U;T0
kY ˇ .A UTU
T
/k
2
F
;
(3.29)
where U is the user latent matrix, T 2 R
kk
C
is the user-user correlation matrix, and Y 2 R
mm
controls the contribution of A. Since only positive samples are given in A, we can first set Y D
3.3. NETWORK-BASED DETECTION 47
sign.A/, then perform negative sampling and generate the same number of unobserved links
and set weights as 0.
User-News Embedding e user-news interactions are often modeled by considering the re-
lationships between user representations and the news veracity values (
y
Lj
). Intuitively, users with
low credibilities are more likely to spread fake news, while users with high credibility scores are
less likely to spread fake news. Each user has a credibility score that we can infer using his/her
published posts [1], and we use s D fs
1
; s
2
; :::; s
m
g to denote the credibility score vector, where a
larger s
i
2 Œ0; 1 indicates that user u
i
has a higher credibility. e user-news engaging matrix is
represented as E 2 f0; 1g
mN
, where E
ij
D 1 indicates that user u
i
has engaged in the spreading
process of the news piece a
j
; otherwise E
ij
D 0. e objective function is shown as follows:
min
m
X
iD1
r
X
j D1
E
ij
s
i
1
1 C y
Lj
2
jjU
i
F
j
jj
2
2
ƒ‚
True news
C
m
X
iD1
r
X
j D1
E
ij
.1 s
i
/
1 C y
Lj
2
jjU
i
F
j
jj
2
2
ƒ‚
Fake news
;
(3.30)
where y
L
2 R
r1
is the label vector of all partially labeled news. e objective considers two
situations: (i) for true news, i.e., y
Lj
D 1, which ensures that the distance between latent fea-
tures of high-credibility users and that of true news is small; and (ii) for fake news, i.e., y
Lj
D 1,
which ensures that the distance between the latent features of low-credibility users and the latent
representations of true news is small.
Publisher-News Embedding e publisher-news interactions are modeled by incorporating
the characteristics of the publisher and news veracity values (). Fake news is often written to
convey opinions or claims that support the partisan bias of the news publisher. Publishers with
a high degree of political bias are more likely to publish fake news [141]. us, a useful news
representation should be good for predicting the partisan bias score of its publisher. e partisan
bias scores are collected from fact-checking websites and are represented as a vector o. We utilize
publisher partisan labels vector o 2 R
l1
and publisher-news matrix B 2 R
lN
to optimize the
news feature representation learning as follows:
min k
N
BFQ ok
2
2
;
(3.31)
where the latent features of a news publisher are represented by the features of all the news
he/she published, i.e.,
N
BD.
N
B is the normalized user-news publishing relation matrix, i.e.,
N
B
kj
D
B
kj
P
n
j D1
B
kj
. Q 2 R
k1
is the weighting matrix that maps news publishers’ latent features
to corresponding partisan label vector o.
48 3. HOW SOCIAL CONTEXT HELPS
e finalized model combines all previous components into a coherent model. In this way,
we can obtain the latent representations of news items F and of users U through the network
embedding procedure, which is utilized to perform fake news classification tasks.
3.3.5 PROPAGATION NETWORK DEEP-GEOMETRIC MODELING
In [34, 92], the authors propose to use geometric deep learning (e.g., graph convolution neural
networks) to learn the structural of propagation networks for fake news detection. Geometric
deep learning naturally deals with heterogeneous graph data, which has the potential to unify
signals of text and structure information in the propagation networks. Geometric deep learning
generally refers to the non-Euclidean deep learning approaches [20]. In general, graph CNNs
replace the classical convolution operation on grids with a local permutation-invariant aggre-
gation on the neighborhood of a vertex in a graph. Specifically, the convolution works with a
spectral representation of the graphs G
P
and learns the spatially localized filters by approximat-
ing convolutions defined on the graph Fourier domain. Mathematically, a normalized graph
laplacian L is defined as L D I
N
D
1
2
AD
1
2
D EƒE
T
, where D is the degree matrix of the
adjacency matrix A for the propagation network (D
ii
D
P
j
A
ij
). ƒ is the diagonal matrix of its
eigenvalues and E is the matrix of eigenvector basis. Given a node feature c, E
T
c is the graph
Fourier transform of x. e convolutional operation on this node signal is defined as:
g
c D Eg
E
T
c; (3.32)
where g
D diag./ parameterized by is a function of the eigenvalues of L, i.e., g
.ƒ/. How-
ever, convolution in Equation (3.32) is computationally expensive due to the multiplication with
high dimensional matrix E and it is a non-spatially localized filters. To solve this problem, it is
suggested to use Chebyshev polynomials T
k
.c/ up to K
th
order as a truncated expansion to ap-
proximate g
. Equation (3.32) thus is reformulated as:
g
c
K
X
kD0
k
T
k
.
Q
L/c: (3.33)
Q
L D
2
max
L I
N
and
max
is the largest eigenvalue of L. Now,
k
becomes the Chebyshev co-
efficients. If we limit K D 1 and approximate
max
2, with the normalized tricks and weak
constraints used in [68], Equation (3.33) simplifies to:
g
c
Q
D
1
2
Q
A
Q
D
1
2
c; (3.34)
where
Q
A D A C I
N
and
Q
D is the degree matrix of
Q
A. We turn Equation (3.34) to the matrix
multiplication form, for the whole network,
C
0
D ı
Q
D
1
2
Q
A
Q
D
1
2
C W
: (3.35)
3.3. NETWORK-BASED DETECTION 49
e above equation describes a spectral approach of graph convolution layer which analogous to
a 1-hop node information aggregation. In Equation (3.35), C
0
is a graph convolved signal. e
filer parameters matrix W is learned through the back-propagation of deep models.
e model uses a four-layer Graph CNN with two convolutional layers (64-dimensional
output features map in each) and two fully connected layers (producing 32- and 2-dimensional
output features, respectively) to predict the fake/true class probabilities. Figure 3.10 depicts a
block diagram of the model. One head of graph attention was used in every convolutional layer
to implement the filters together with mean-pooling for dimensionality reduction. e Scaled
Exponential Linear Unit (SELU) [70] is used as nonlinearity throughout the entire network.
Hinge loss was employed to train the neural network and no regularization was used with the
model.
Fake
Real
a
c

c
c
c
c
c
c
c

c
c
c
𝖷
GC GC FC FC SMMP
Figure 3.10: e architecture of the graph convolutional network (GCN) framework on mod-
eling propagation network for fake news detection. GC D Graph Convolution, MP D Mean
Pooling, FC D Fully Connected, SM D SoftMax layer.
3.3.6 HIERARCHICAL PROPAGATION NETWORK MODELING
In the real world, news pieces spread in networks on social media. e propagation networks
have a hierarchical structure (see Figure 3.11), including macro-level and micro-level propaga-
tion networks [135]. On one hand, macro-level propagation networks demonstrate the spread-
ing path from news to the social media posts sharing the news, and those reposts of these posts.
Macro-level networks for fake news are shown to be deeper, wider, and includes more social bots
than real news [127, 159], which provides clues for detecting fake news. On the other hand,
micro-level propagation networks illustrate the user conversations under the posts or reposts,
such as replies/comments. Micro-level networks contain user discussions toward news pieces,
which brings auxiliary cues such as sentiment polarities [45] and stance signals [59] to differenti-
ate fake news. Studying macro-level and micro-level propagation network provides fine-grained
social signals to understand fake news and can possibly facilitate fake news detection.
50 3. HOW SOCIAL CONTEXT HELPS
a
c

c
c
c
c
c
c
c

c

c
c
c
c
c
c
c
c

c
c
a
c
c
c
c
c
c
c
Macro-Level
M
icr
o
-
Level
Figure 3.11: An example of the hierarchical propagation network of a fake news piece. It consists
of two types: macro-level and micro-level. e macro-level propagation network includes the
news nodes, tweet nodes, and retweet nodes. e micro-level propagation network indicates the
conversation tree represented by reply nodes.
Macro-Level Propagation Network Macro-level propagation network encompasses infor-
mation on tweets posting pattern and information sharing pattern. We analyze the macro-level
propagation network in terms of structure and temporal perspectives. Since the same textual
information related to a news article is shared across the macro-level network, linguistic analysis
is not applicable.
Structural analysis of macro-level networks helps to understand the global spreading pat-
tern of the news pieces. Existing work has shown that learning latent features from the macro-
level propagation paths can help to improve fake news detection, while lacking of an in-depth
understanding of why and how it is helpful [80, 169]. us, we characterize and compare the
macro-level propagation networks by looking at various network features as follows.
.S
1
/ Tree depth: e depth of the macro propagation network, capturing how far the in-
formation is spread/retweeted by users in social media.
.S
2
/ Number of nodes: e number of nodes in a macro network indicates the number of
users who share the new article and can be a signal for understanding the spreading pattern.
.S
3
/ Maximum Outdegree: Maximum outdegree in macro network could reveal the
tweet/retweet with the most influence in the propagation process.
3.3. NETWORK-BASED DETECTION 51
.S
4
/ Number of cascades: e number of original tweets posting the original news article.
.S
5
/ Depth of node with maximum outdegree: e depth at which node with maximum out-
degree occurs. is indicates steps of propagation it takes for a news piece to be spread by
an influential node whose post is retweeted by more users than any other users repost.
.S
6
/ Number of cascades with retweets: It indicate number of cascades (tweets) those were
retweeted at least once.
.S
7
/ Fraction of cascades with retweets: It indicates the fraction of tweets with retweets
among all the cascades.
.S
8
/ Number of bot users retweeting: is feature captures the number of bot users who
retweet the corresponding news pieces.
.S
9
/ Fraction of bot users retweeting: It is the ratio of bot users among all the users who
tweeting and retweeting a news piece. is feature can show whether news pieces are
more likely to be disseminated by bots or real humans.
Temporal analysis in macro-level network reveal the frequency and intensity of news dis-
semination process. e frequency distribution of user posting over time can be encoded in
recurrent neural networks to learn the features to detection fake news [119, 133]. However, the
learned features are not interpretable, and the explanation of why the learned features can help
remain unclear. Here, we extract several temporal features from macro-level propagation net-
works explicitly for more explainable abilities and analyze whether these features are different
or not. e following are the features we extracted from the macro propagation network.
.T
1
/ Average time difference between the adjacent retweet nodes: It indicates how fast the
tweets are retweeted in news dissemination process.
.T
2
/ Time difference between the first tweet and the last retweets: It captures the life span of
the news spread process.
.T
3
/ Time difference between the first tweet and the tweet with maximum outdegree: Tweets
with maximum outdegree in propagation network represent the most influential node. is
feature demonstrates how long it took for a news article to be retweeted by most influential
node.
.T
4
/ Time difference between the first and last tweet posting news: is indicates how long
the tweets related to a news article are posted in Twitter.
.T
5
/ Time difference between the tweet posting news and last retweet node in deepest cascade:
Deepest cascade represents the most propagated network in the entire propagation net-
work. is time difference indicates the lifespan of the news in the deepest cascade and
can show whether news grows in a burst or slow manner.
52 3. HOW SOCIAL CONTEXT HELPS
.T
6
/ Average time difference between the adjacent retweet nodes in the deepest cascade: is
feature indicates how frequent a news article is retweeted in the deepest cascade.
.T
7
/ Average time between tweets posting news: is time indicates whether tweets are
posted in short intervals related to a news article.
.T
8
/ Average time difference between the tweet post time and the first retweet time: e average
time difference between the first tweets and the first retweet node in each cascade can
indicate how soon the tweets are retweeted.
Micro-Level Propagation Network Micro-level propagation networks involve users conver-
sations toward news pieces on social media over time. It contains rich information of user opin-
ions toward news pieces. Next, we introduce how to extract features from micro-level propaga-
tion networks from structural, temporal, and linguistic perspectives.
Structure analysis in the micro network involves identifying structural patterns in conver-
sation threads of users who express their viewpoints on tweets posted related to news articles.
(S
10
) Tree depth: Depth of the micro propagation network captures how far is the conver-
sation tree for the tweets/retweets spreading a news piece.
(S
11
) Number of nodes: e number of nodes in the micro-level propagation network indi-
cates the number of comments that are involved. It can measure how popular of the tweet
in the root.
(S
12
) Maximum Outdegree: In micro-network, the maximum outdegree indicates the max-
imum number of new comments in the chain starting from a particular reply node.
(S
13
) Number of cascade with with micro-level networks: is feature indicates the number
of cascades that have at least one reply.
(S
14
) Fraction of cascades with micro-level networks: is feature indicates the fraction of the
cascades that have at least one replies among all cascades.
Temporal analysis of micro-level propagation network depicts users’ opinions and emotions
through a chain of replies over time. e temporal features extracted from micro network can
help understand exchange of opinions in terms of time. e following are some of the features
extracted from the micro propagation network.
(T
9
) Average time difference between adjacent replies in cascade: It indicates how frequent
users reply to one another.
(T
10
) Time difference between the first tweet posting news and first reply node: It indicates how
soon the first reply is posted in response to a tweet posting news.
3.3. NETWORK-BASED DETECTION 53
(T
11
) Time difference between the first tweet posting news and last reply node in micro network:
It indicates how long a conversation tree lasts starting from the tweet/retweet posting a
new piece.
(T
12
) Average time difference between replies in the deepest cascade: It indicates how frequent
users reply to one another in the deepest cascade.
(T
13
) Time difference between first tweet posting news and last reply node in the deepest cas-
cade: Indicates the life span of the conversation thread in the deepest cascade of the micro
network.
Linguistic analysis people express their emotions or opinions toward fake news through
social media posts, such as skeptical opinions, sensational reactions, etc. ese textual informa-
tion has been shown to be related to the content of original news pieces. us, it is necessary to
extract linguistic-based features to help find potential fake news via reactions from the general
public as expressed in comments from micro-level propagation network. Next, we demonstrate
the sentiment features extracted from the comment posts, as the representative of linguistic fea-
tures. We utilize the widely used pre-trained model VADER [45] to predict the sentiment score
for each user reply, and extract a set of features related to sentiment as follows.
(L
1
) Sentiment ratio: We consider a ratio of the number of replies with a positive senti-
ment to the number of replies with negative sentiment as a feature for each news articles
because it helps to understand whether fake news gets more number of positive or negative
comments.
(L
2
) Average sentiment: Average sentiment scores of the nodes in the micro propagation
network. Sentiment ratio does not capture the relative difference in the scores of the sen-
timent and hence average sentiment is used.
(L
3
) Average sentiment of first level replies: is indicates whether people post positive or
negative comments on the immediate tweets posts sharing fake and real news.
(L
4
) Average sentiment of replies in deepest cascade: Deepest cascade generally indicate the
nodes that are most propagated cascade in the entire propagation network. e average
sentiment of the replies in the deepest cascade capture the emotion of user comments in
most influential information cascade.
(L
5
) Sentiment of first level reply in the deepest cascade: Deepest cascade generally indicate the
nodes that are most propagated cascade in the entire propagation network. e sentiment
of the first level reply indicates the user emotions to most influential information cascade.
We can compare all the aforementioned features for fake and real news pieces, and ob-
serve that most of the feature distributions are different. In [135], we build different learning
algorithms using the extracted features to detect fake news. We evaluate the effectiveness of
54 3. HOW SOCIAL CONTEXT HELPS
the extracted features by comparing with several existing baselines. e experiments show that:
(1) these features can make significant contributions to help detect fake news; (2) these fea-
tures are overall robust to different learning algorithms; and (3) temporal features are more
discriminative than linguistic and structural features and macro- and micro-level features are
complimentary.