The International Workshop on Computational Methods for Online Discourse Analysis (BeyondFacts) co-located with the 31st ACM International Web Conference aims to further research that uses computational methods to discover data on the Internet, or, in other words, content and sources spreading information on the Internet; this conference features strong interdisciplinary research that can assist these computational methods, such as communication studies and computational linguistics. They aim to strengthen relationships between disparate scientific disciplines, so that web mining can be further improved by interdisciplinary research. The range of topics that the conference features are:

  • Discourse analysis
  • Social we mining
  • Argumentation mining
  • Computational fact-checking
  • Mis-and dis-information spread
  • Bias and controversy detection and analysis
  • Stance / viewpoint detection and representation
  • Opinion mining
  • Rumour, propaganda and hate-speech detection
  • Computational journalism

From 13 to 17 May, 2024, the 4th BeyondFacts conference will be held in Singapore. There Mayor Inna Gurung, Md Monoraual Islam Bhuiyan, Ahmed Al-Taweel, and Dr. Nitin Agarwal will present their research titled “Decoding YouTube’s Recommendation System: A Comparative Study of Metadata and GPT-4 Extracted Narratives.” This research uses AI and natural language processing (NLP)  to examine videos on YouTube about the South China Sea dispute. 

“The study reveals several interesting facets. First, it shows narratives are more meaningful to analyze than just the video titles or descriptions. Second, the study sheds light on the inherent bias in the YouTube recommendation algorithm that is more visible when narratives are analyzed as compared to just the video title and description. These findings are helpful for developing efforts to de-bias the algorithms and assist in understanding strategic manipulation of algorithmic bias,” Dr. Agarwal said.

Notably, they found that recommended videos shifted from neutral to positive sentiments, from negative to positive emotions, and from more to less toxicity. These findings—beyond establishing the nature of the videos themselves—seem to indicate that, the deeper the user’s engagement, the more their toxicity and sentimental and emotional expressions are affected. They also illustrate that evaluating drift solely upon metadata like video titles can lead to incomplete or inaccurate characterizations of the video content, as Mayor Inna explains, saying, “We discovered that relying solely on titles and metadata may not always provide an accurate representation of the content within a video.” Video titles showed little change in sentiment, emotion, and toxicity, while video transcripts showed dramatic changes and trends in sentiment, emotion, and toxicity. “Our findings emphasize the importance of delving deeper into the actual video content rather than solely relying on surface-level metadata,” says Mayor Inna.

“Narrative analysis plays a crucial role in computationally extracting narratives,” explains Mayor Inna. “With the utilization of large language models like GPT, we’ve unlocked the ability to extract narratives from a wide range of content, and an excellent example of the need for such an advanced approach is the South China dispute, a pressing and continually evolving issue.”