Help Center
Cosmos-Starmap
Author: Joseph Kready
Introduction
Prometheus is a metric monitoring tool: https://prometheus.io/
- Node Exporter used to get system metrics: https://github.com/prometheus/node_exporter
Grafana is an open source dashboard tool: https://grafana.com/
Loki is log aggerate tool based on Prometheus: https://grafana.com/oss/loki/
- Promtail sends logs to Loki (like node exporter): https://grafana.com/docs/loki/latest/clients/promtail/
Access
Cosmos-Starmap VM
IP Address: 144.167.34.179 (VM located on cosmos-3)
Username: cosmosadmin
Password: Theualrcosmosadminaccount1
Grafana
http://144.167.34.179/ or http://cosmos-starmap.host.ualr.edu
Username: admin
Password: Theualrcosmosadminaccount1
Promethus
144.167.34.179:9090
You can check the status of different nodes at http://144.167.34.179:9090/targets
Paths
Promethus config: /etc/prometheus/prometheus.yml
Loki config: /etc/loki/config-loki.yml
Data Pipeline Reference Table
Hostname | IP address (Instance) | Project | Port | Notes |
Cosmos- Joseph | 144.167.35.211 | Transcript Example | 8006 | |
Cosmos-Crawler | 144.167.35.49 | Sentiment & Toxicity | 8001 | |
Cosmos-Crawler | 144.167.35.49 | Blog Transfer | 8002 | |
Cosmos-Crawler | 144.167.35.49 | YT Daily Crawler | 8003 | |
Cosmos-Crawler | 144.167.35.49 | Twitter Crawler | 8004 | |
BT-dev1 | 144.167.35.125 | Narrative Ingestion | 8001 | |
BT-dev1 | 144.167.35.125 | Blogger Post Processing | 8002 | |
BT-dev1 | 144.167.35.125 | Terms Post-processing | 8003 | |
BT-dev1 | 144.167.35.125 | Blogsites Post-processing | 8004 | |
BT-dev1 | 144.167.35.125 | Entity Sentiment Post-processing | 8005 | |
BT-dev1 | 144.167.35.125 | Language Post-processing | 8006 | |
BT-dev1 | 144.167.35.125 | Sentiment Post-processing | 8007 | |
BT-dev1 | 144.167.35.125 | Toxicity Post-processing | 8008 | |
BT-dev1 | 144.167.35.125 | Clustering Post-processing | 8009 | |
Cosmos-ytdl-1 | 144.167.34.58 | Youtube video/audio downloader | 8001 | |
LIWC (COSMOS-4) | 144.167.35.31 | LIWC | 8001 |
Adding jobs to Prometheus
- Login to cosmos-starmap (ssh cosmosadmin@144.167.34.179)
- sudo nano /etc/prometheus/prometheus.yml
- Follow examples in the file to add a job name and it’s targets
- Save the file by CTR X, Y, ENTER
- sudo service prometheus restart
- sudo service prometheus status
- If all goes well, the status should be green.
Getting Logs to Loki using Promtail
Windows:
- Download the latest release of promtail from https://github.com/grafana/loki/releases/
- Get the version for your OS
- Extract the download to where you want to run Promtail from (I make a folder called Promtail on the root)
- Create a sub directory there called ‘tmp’
- Create a promtail-local-config.yaml file in the same directory (example https://raw.githubusercontent.com/grafana/loki/master/cmd/promtail/promtail-local-config.yaml)
- Set the positions filename to the tmp directory you created earlier
- Like this: C:\COSMOS\Promtail\tmp\positions.yaml
- For ‘clients: – url:’ set it to: http://144.167.34.179:3100/loki/api/v1/push
- Name your project after ‘job:’
- Set ‘__path__’ to the path where your log files are stored
- Make sure you have ‘\*log’ at the end
- *If you have more services on that machine which you want to get logs from, simply copy the promtail yaml file from ‘targets’ down and re-do above.
- Set the positions filename to the tmp directory you created earlier
- Make promtail start when your system starts
- For windows I create a bat file that cd into the directory, then does
- cd C:\COSMOS\Promtail
- start promtail-windows-amd64 –config.file=promtail-local-config.yaml
- Then use task scheduler to make that bat file run on startup with administrator rights
- This guide will get you close https://www.howtogeek.com/138159/how-to-enable-programs-and-custom-scripts-to-run-at-boot/
- Makes sure to use ‘Run whether user is logged on or not’ and ‘Run with highest Privileges’
- For Triggers, ‘Begin the task’ – At startup
- Under Settings, make sure ‘Stop the task if it runs longer than’ is unchekced
- For windows I create a bat file that cd into the directory, then does
Linux:
I am using this guide: https://sbcode.net/grafana/install-promtail-service/. Make sure you replace the version they use with the most recent version found at https://github.com/grafana/loki/releases/
- Follow the instructions above until you have created the config-promtail.yml.
- Edit the file as so:
- For ‘clients: – url:’ set it to: http://144.167.34.179:3100/loki/api/v1/push
- Replace everything under ‘scrape_config’ with the scrape_config found here https://raw.githubusercontent.com/grafana/loki/master/cmd/promtail/promtail-local-config.yaml
- Name your project after ‘job:’
- Set ‘__path__’ to the path where your log files are stored
- Make sure you have ‘/*log’ at the end
- *If you have more services on that machine which you want to get logs from, simply copy the promtail yaml file from ‘targets’ down and re-do above.
- Follow the rest of the instructions in the guide to Configure promtail as a service
- If you see the error in the promtail status msg=”error creating promtail” error=”open /tmp/positions.yaml: permission denied”, run ‘chown promtail:promtail /tmp/positions.yaml’
- Skip the last 2 steps
- Make sure to run this command at the end: sudo systemctl enable promtail.service
- We don’t need to configure the firewall, so stop at that point in the guide. Now you should be able to access your log file by the name you gave it.
*Note:
By default, promtail won’t pick up the progress bar (tqdm) because it needs a new line between each item in the log file. The cheap solution is to add postfix=”\n” to the progress bar. You will
Example: for x in tqdm(video_ids, desc=”Downloading Trans”, file=sys.stdout, also need to set ‘file=sys.stdout’ in the progress bar for it to write to file
postfix=”\n”)]
Python and Prometheus+Grafana
- Follow the instructions laid out in the github repo: https://github.com/prometheus/client_python to add prometheus to your code
- To import metrics into many different modules in your project, I recommend creating a ‘prometheus.py’ file where you store your global variables then import those into each module (referrece: https://stackoverflow.com/questions/15959534/visibility-of-global-variables-in-imported-modules)
- If you want to capture logs, make sure to save those log files to a folder using the .log extension.
- Follow the Getting Logs to Loki using Promtail guide
- Expose ports prometheus will use
- To open ports on Ubuntu, just follow https://stackoverflow.com/questions/30251889/how-to-open-some-ports-on-ubuntu
- On windows Open ‘Windows defender Firewall with Advanced Security’
- Go to Inbound Rules -> New Rule…
- Port
- TCP, {your port you exposed during start_http_server()}
- Allow the connection
- All profiles selected
- Name it Prometheus
- Description; “Ports exposed for monitoring scripts”
- Double click on the new rule that was just created
- Go to ‘Scope’
- For ‘Local IP address’ select ‘Any IP Address’. For ‘Remote IP Address’ select ‘These IP addresses’ and put it to 144.167.34.179
- It might take a minute for prometheus to pickup the new firewall rules. You can check the status at this page. Just wait a minute or 2. http://144.167.34.179:9090/targets
- This should now only expose that port to cosmos-starmap
- If you have multiple scripts that need to expose prometheus on this computer, you can add them to this firewall rule under ‘Protocols and Ports’
- Apply and Okay
Capturing Windows Metrics with Windows Exporter
56147
You can follow the instructions here: https://github.com/prometheus-community/windows_exporter
These instructions pair well with this dashboard: https://grafana.com/grafana/dashboards/6593
- Download the latest release of the windows node Exporter from here (exe file) https://github.com/prometheus-community/windows_exporter/releases
- Put the exe in a folder. Create a folder called ‘tmp’ in that same place
- I like to use C\COSMOS\node_exporter
- Open powershell as an administrator
- Run New-Service -name Windows_exporter -displayName Windows_Exporter -binaryPathName “`”C:\\COSMOS\\node_exporter\\windows_exporter-0.15.0-amd64.exe`” –collectors.enabled=`”cpu,cs,logical_disk,net,os,service,system,textfile,tcp,process`” –collector.textfile.directory=`”C:\\COSMOS\\node_exporter\\tmp`””
- Make sure to update the ‘binaryPathName’ to the location of your exe file (don’t forget to add the extension)
- Update the ‘textfile.directory’ to the location of the tmp folder you created above
- Expose ports node exporter uses
- Node exporter creates a firewall rule already, but we need to change it. On windows open up the ‘Windows defender Firewall with Advanced Security’
- Search for the firewall rule ‘windows_exporter’, open it
- Go to ‘Scope’
- For ‘Local IP address’ select ‘Any IP Address’. For ‘Remote IP Address’ select ‘These IP addresses’ and put it to 144.167.34.179
- It might take a minute for prometheus to pickup the new firewall rules. You can check the status at this page. Just wait a minute or 2. http://144.167.34.179:9090/targets
- Go to ‘Program and Services’ Click on the ‘browse’ button where it says ‘This program’
- Navigate to the node exporter exe file you downloaded and select it (C:\\COSMOS\\node_exporter\\windows_exporter-0.15.0-amd64.exe)
- Apply, ok, close
- Add the new target to cosmos-starmap by following the ‘Adding jobs to Prometheus’ guide above. Put the new target under the ‘windows_server’ section.
Capturing Linux Metrics with Node Exporter
- Steps to follow: https://devopscube.com/monitor-linux-servers-prometheus-node-exporter/
- You will need to update the version of node exporter used in the guide
- You’ll need the ‘linux-amd64’ most likely
- ‘:wq’ to exit and save vi
- You will need to update the version of node exporter used in the guide
- Expose the port, sudo ufw allow 9100
- Add the new target to cosmos-starmap by following the ‘Adding jobs to Prometheus’ guide above. Put the new target under the ‘linux_servers’ section.
Alerting
Great introduction video: https://www.youtube.com/watch?v=n6yZuRr36uI
*Alerting only works with line graphs!
I like to create a separate panel for alerts on each dashboard. There I will configure the alerts and the slack channels to post them on.
Errors
- If you created an alert that is retuning this error: error:”1:35: parse error: missing unit character in duration:” this is because you are using $__interval in your main query. Replace that with a fixed interval, like ‘1d’
Extras
- Use ‘locale format’ as the unit type to give numbers commas https://community.grafana.com/t/format-large-numbers-with-commas/1213/5
Debugging
- Sometimes starmap might be acting strangely. You can check the targets page on grafana and notice the ‘last scrap’ time for the different targets. http://144.167.34.179:9090/targets. The last scrape time should be every few minutes. If you see 10 mins + something is probably wrong
- To fix, just log into the starmap VM (account info at top of doc)
- sudo service prometheus restart
- sudo service prometheus status
- That should get it working again.