Skip to content

Newsletter

Help Center

Categories
< All Topics
Print

Cosmos-DB Post Processing

Author: Joseph Kready

Generates the sentiment and toxicity scores for the YouTube, Blog, and Twitter databases on COSMOS-DB. Any future databases should use this repo

Info

Location: 

  • COSMOS-Crawler, 144.167.35.49
  • C:\COSMOS\PostProcessing

Dashboard: http://cosmos-starmap.host.ualr.edu/d/6CO4j–Gz/sentiment-and-toxicity?orgId=1&refresh=10s 

Runs Daily

Uses Database account ‘post_processing’

Design

This script uses python async processing to improve the performance of reading/updating the database. 

Sentiment Analysis: https://textblob.readthedocs.io/en/dev/quickstart.html

Toxicity Analysis: https://github.com/unitaryai/detoxify (multilingual model)

Setup

  1. Have python 3.7 or greater
  2. Install the requirements.txt file (pip install -r requirements.txt)
  3. Run main.py
    1. You can also run the ‘Tox-Sent.bat’ file on windows. This is primarily used in task scheduler to have this task run daily
    2. There is also 1 command line argument: -p , bool, makes the progress bars print pretty if True. Leave False to have the progress bars saved to the log file. 
Table of Contents

© 2024 Collaboratorium for Social Media and Online Behavioral Studies