Advancements in Social Media Analysis and Computational Linguistics

The recent publications in the field of social media analysis and computational linguistics highlight a significant shift towards leveraging advanced machine learning models and multimodal datasets to address complex challenges such as misinformation, toxicity, and event detection. Innovations in this area are increasingly focused on the integration of Large Language Models (LLMs) with domain-specific knowledge graphs and the application of sentiment and toxicity analysis to enhance content moderation and public discourse understanding. The development of comprehensive datasets and tools for scraping and analyzing social media data is also a notable trend, enabling researchers to explore the dynamics of online communities and their impact on societal issues with greater depth and precision.

Noteworthy papers include:

  • A study on enhancing LLM-based toxicity detection with a meta-toxic knowledge graph, demonstrating significant improvements in reducing false positives while boosting detection performance.
  • The introduction of the TikTok 2024 U.S. Presidential Election Dataset, offering a multimodal view of election-related content and insights into TikTok's role in shaping electoral discourse.
  • Research on the impact of content moderation strategies on online eating disorder communities, revealing how moderation practices influence the development of toxic echo chambers.
  • A novel approach to content moderation using generative LLMs to rephrase toxic content, aiming to preserve discourse integrity while reducing toxicity.
  • The development of the Community Sentiment and Engagement Index (CSEI), a tool designed to capture nuanced public sentiment and engagement variations on social media in response to major events.

Sources

An Incremental Clustering Baseline for Event Detection on Twitter

Enhancing LLM-based Hatred and Toxicity Detection with Meta-Toxic Knowledge Graph

Tracking the 2024 US Presidential Election Chatter on TikTok: A Public Multimodal Dataset

Safe Spaces or Toxic Places? Content Moderation and Social Dynamics of Online Eating Disorder Communities

The Content Moderator's Dilemma: Removal of Toxic Content and Distortions to Online Discourse

Research on Violent Text Detection System Based on BERT-fasttext Model

Evaluating the Performance of Large Language Models in Scientific Claim Detection and Classification

TelegramScrap: A comprehensive tool for scraping Telegram data

Quantifying Public Response to COVID-19 Events: Introducing the Community Sentiment and Engagement Index

COVID-19 on YouTube: A Data-Driven Analysis of Sentiment, Toxicity, and Content Recommendations

Leveraging Sentiment for Offensive Text Classification

Faces speak louder than words: Emotions versus textual sentiment in the 2024 USA Presidential Election

Built with on top of