WALNUT: Benchmark on Semi-weakly Supervised Learning for NLU

This benchmark provides a publicly accessible framework for advocating and facilitate research on weak supervision for NLU. We expect WALNUT to stimulate further research on methodologies to leverage weak supervision more effectively. The benchmark and code for baselines are available at Website
  title={WALNUT: A Benchmark on Semi-weakly Supervised Learning for Natural Language Understanding},
  author={Zheng, Guoqing and Karamanolakis, Giannis and Shu, Kai and Awadallah, Ahmed Hassan},
  booktitle={Proceedings of 2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics},
  organization={ACL} }

Graph Neural Networks for Fake News Detection

This repository offers a publicly accessible platform and benchmark for using a series of Graph Neural Network (GNN) based fake news detection models. We welcome contributions of results of existing models and the SOTA results of new models based on our dataset. You can check the benchmark hosted by PaperWithCode for SOTA models and their performances.Benchmark  Github
  title={User Preference-aware Fake News Detection},
  author={Dou, Yingtong and Shu, Kai and Xia, Congying and Yu, Philip S. and Sun, Lichao},
  booktitle={Proceedings of the 44nd International ACM SIGIR Conference on Research and Development in Information Retrieval},
  organization={ACM} }

COVID-19 Data Repository

This repository offers a publicly accessible platform to gather and curate datasets related to COVID-19 with multi-disciplines including spatial-temporal epidemic data, fact-checked content of different types of disinformation (e.g., fraud URLs, false news), social media content and network data from Twitter, scholar articles, etc. The repository also encourages data donation from the research community and promotes collaborations. Github

dEFEND: Explainable Fake News Detection

In recent years, to mitigate the problem of fake news, computational detection of fake news has been studied, producing some promising early results. While important, however, we argue that a critical missing piece of the study be the explainability of such detection, i.e., why a particular piece of news is detected as fake. In this paper, therefore, we study the explainable detection of fake news. We develop a sentence-comment co-attention sub-network to exploit both news contents and user comments to jointly capture explainable top-k check-worthy sentences and user comments for fake news detection. We conduct extensive experiments on real-world datasets and demonstrate that the proposed method not only significantly outperforms several state-of-the-art fake news detection methods. Code and Results.
  title={dEFEND: Explainable Fake News Detection},
  author={Shu, Kai and Cui, Limeng and Wang, Suhang and Lee, Dongwon and Liu, Huan},
  booktitle={Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining},
  organization={ACM} }

Unsupervised Fake News Detection

Most existing methods of fake news detection are supervised, which require an extensive amount of time and labor to build a reliably annotated dataset. In search of an alternative, in this paper, we investigate if we could detect fake news in an unsupervised manner. We treat truths of news and users’ credibility as latent random variables, and exploit users’ engagements on social media to identify their opinions towards the authenticity of news. Code
  title={Unsupervised fake news detection on social media: A generative approach},
  author={Yang, Shuo and Shu, Kai and Wang, Suhang and Gu, Renjie and Wu, Fan and Liu, Huan},
  booktitle={Proceedings of the AAAI conference on artificial intelligence},
  year={2019}   organization={ACM} }

Fake News Detection Data Repository

We released a tool FakeNewsTracker, for collecting, analyzing, and visualizing of fake news and the related dissemination on social media!
The latest dataset paper with detailed analysis on the dataset can be found at FakeNewsNet.
FakeNewsNet is a benchmark data repository fake news detection, which contains information of news content, social context, and spatialtemporal information for studying fake news on social media. Data and APIs are available at Github.
If you use this dataset, please consider cite the following papers:
  title={FakeNewsNet: A Data Repository with News Content, Social Context and Dynamic Information for Studying Fake News on Social Media},
  author={Shu, Kai and Mahudeswaran, Deepak and Wang, Suhang and Lee, Dongwon and Liu, Huan},
  journal={arXiv preprint arXiv:1809.01286},
  year={2018} }
title={Fake News Detection on Social Media: A Data Mining Perspective},
  author={Shu, Kai and Sliva, Amy and Wang, Suhang and Tang, Jiliang and Liu, Huan},
  journal={ACM SIGKDD Explorations Newsletter},
  publisher={ACM} }

Sarcasm Detection with Emojis

Sarcasm detection on social media is important for users to understand the underlying messages. The majority of existing sarcasm detection algorithms focus on text information; while emotion information expressed such as emojis are ignored. We release a new dataset for our SBP paper on sarcasm detection on social media with emoji information. Data and code are available at Github.
If you use this dataset, please cite the following paper:
  title={Exploiting Emojis for Sarcasm Detection},
  author={Subramanian, Jayashree and Sridharan, Varun and Shu, Kai and Liu, Huan},
  booktitle={International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction and Behavior Representation in Modeling and Simulation}
  organization={Springer} }

Cross Media Friend & Item Recommendations

Friend and item recommendation on a social media site is an important task, which not only brings conveniences to users but also benefits platform providers. However, recommendation for newly launched social media sites is challenging because they often lack user historical data and encounter data sparsity and cold-start problem. Thus, it is important to exploit auxiliary information to help improve recommendation performances on these sites. We construct a new dataset that ensure that both source and target sites have the following information: user-item interactions, user-user relations, and item features. Raw Data are available at Book  Movie.
If you use this dataset, please cite the following paper:
@inproceedings{shu2018crossfire,   title={Crossfire: Cross media joint friend and item recommendations},
  author={Shu, Kai and Wang, Suhang and Tang, Jiliang and Wang, Yilin and Liu, Huan},
  booktitle={Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining},
  organization={ACM} }