The International Conference on Smart Multimedia hosts research and oral presentations on a variety of topics concerned with multimedia applications. This conference contains a significant range of topics, such as:

  • Speech, audio, image, video, and text media processing
  • Multimedia communication
  • Haptic intelligence
  • Security and content protection
  • Multimedia trends
  • Multi-modal integration

At the end of March 2024, the 4th Smart Media conference was held in Los Angeles, California. Niloofar Yousefi, Mainuddin Shaik, and Dr. Nitin Agarwal presented virtually at Smart Media 2024 their study entitled, “Characterizing Multimedia Information Environment leveraging Multi-modal Clustering of YouTube Videos.” Their study falls into this final category of topic, which integrates different modalities of data. Namely, they characterize video content by combining the clustering patterns present in data from the different modalities of audio, visuals, and text, each present in YouTube videos on the South China Sea.

In their study, they identified certain content characteristics common across South China Sea videos. Text modality analysis showed common themes of geopolitical relations and global security, while audio and visual modality analysis showed distinct patterns of kinds of videos like news reporting, educational content, and interviews. These were accomplished by developing novel  methods. First, AI was used to generate transcripts of videos. Second, barcodes were generated that represented the change in colors of visuals over the length of videos to ascertain the color profile of videos. Finally, an audio profile was created for each video using its acoustic signals leveraging advanced signal processing techniques. Each of these different kinds of data from each video were then analyzed for similarities and differences between different videos, revealing common clusters of videos with similar data. By comparing clusters, the researchers identified common themes related to the videos posted on YouTube about the South China Sea dispute. “We found instances of content repurposing by utilizing video barcode analysis, which identified high similarity segments between pairs of videos,” Niloofar explained. “The three clusters for visuals and barcode showed the characteristics of each video, such as educational content, interviews and explanatory segments, and on-site reporting.” For example, Niloofar reports, “We found instances of content repurposing by utilizing video barcode analysis, which identified highly similar segments between pairs of videos. By further examining audio waveforms, we discovered identical or highly similar audio segments, such as introductions and closing remarks in videos posted by the same channel for branding purposes. Together these two signals help identify potentially cloned content.

Dr. Agarwal said, “Studies like these are essential to advance the state of social computing, especially as our information environment is increasingly becoming multimedia-centric. Our study demonstrates that combining various modalities (text, audio, and video) in novel ways enhances the characterization of the information environment for better sense-making. We are thrilled about our publication at Smart Media 2024!”