Prometheus and Grafana

The Bifrost node and relayer also supports system metric monitoring. This guide will walk you through how to setup Prometheus and Grafana to monitor your node and relayer.

Enable Prometheus

In order to let your Bifrost node and relayer to collect Prometheus metrics, it must be manually enabled. To enable the Prometheus server of your node, the following CLI flags has to be provided and then restarted.

--prometheus-external : This exposes the Prometheus exporter on all interfaces.
--prometheus-port <PORT> : The default port will be set to 9615. However, if port changes are required, then this flag must be provided.

In case that you're operating a full-node, to enable the Prometheus server of your relayer, the following parameters of your configuration YAML file has to be updated as below and restarted.

prometheus_config:
  is_enabled: true
  is_external: true
  port: 8000

If it has been successfully restarted, in both of your services the following log will be printed at the initial launch.

2023-07-14 18:24:54 〽️ Prometheus exporter started at 0.0.0.0:9615

Using Systemd

This section contains how to install and setup Prometheus and Grafana by using Systemd.

Installing Prometheus

First, create the directories required to store the configuration and executable files.

sudo mkdir /etc/prometheus
sudo mkdir /var/lib/prometheus

Then, update your OS and install the latest Prometheus. You can check the latest releases by going to their GitHub repository under the releases page.

sudo apt-get update && apt-get upgrade
wget https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz
tar xfz prometheus-*.tar.gz
cd prometheus-2.45.0.linux-amd64

Copy the executable files to the /usr/local/bin/ directory.

sudo cp ./prometheus /usr/local/bin/
sudo cp ./promtool /usr/local/bin/

Copy the console files to the /etc/prometheus directory.

sudo cp -r ./consoles /etc/prometheus
sudo cp -r ./console_libraries /etc/prometheus

Once everything is done, remove the prometheus directory.

cd .. && rm -rf prometheus*

Installing NodeExporter

Now, install the NodeExporter. You can check the latest releases by going to their Github repository under the releases page.

wget https://github.com/prometheus/node_exporter/releases/download/v1.6.0/node_exporter-1.6.0.linux-amd64.tar.gz
tar xvf node_exporter-*.tar.gz
sudo cp ./node_exporter-*.linux-amd64/node_exporter /usr/local/bin/
rm -rf ./node_exporter*

Installing AlertManager

First, create the directories required to store the configuration and executable files.

sudo mkdir /etc/alertmanager

Next, install the AlertManager. You can check the latest releases by going to their Github repository under the releases page.

wget https://github.com/prometheus/alertmanager/releases/download/v0.25.0/alertmanager-0.25.0.linux-amd64.tar.gz
tar xvf alertmanager-*.tar.gz
sudo cp ./alertmanager-*.linux-amd64/alertmanager /usr/local/bin/
sudo cp ./alertmanager-*.linux-amd64/amtool /usr/local/bin/
rm -rf ./alertmanager*

Install the AlertManager plugins required for Grafana.

sudo grafana-cli plugins install camptocamp-prometheus-alertmanager-datasource

Configure Alert Rules

Create the rules.yml file that will give the rules for the AlertManager.

sudo vi /etc/prometheus/rules.yml

We are going to create 2 basic rules that will trigger an alert in case the instance is down or the CPU usage crosses 80%. Add the following lines and save the file.

groups:
  - name: alert_rules
    rules:
      - alert: InstanceDown
        expr: up == 0
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Instance $labels.instance down"
          description: "[{{ $labels.instance }}] of job [{{ $labels.job }}] has been down for more than 1 minute."

      - alert: HostHighCpuLoad
        expr: 100 - (avg by(instance)(rate(node_cpu_seconds_total{mode="idle"}[2m])) * 100) > 80
        for: 0m
        labels:
          severity: warning
        annotations:
          summary: Host high CPU load (instance bLd Kusama)
          description: "CPU load is > 80%\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

The alertmanager.yml file is used to set the external service that will be called when an alert is triggered. Here, we are going to use the Gmail notification.

For Gmail notification, you will need to generate an app password. We recommend you to use a dedicated email address for your alerts. In order to set-up follow this link.

Create the file in the following path.

sudo vi /etc/alertmanager/alertmanager.yml

And add the Gmail configuration to it and save the file as below.

global:
  resolve_timeout: 1m

route:
  receiver: 'gmail-notifications'

receivers:
  - name: 'gmail-notifications'
    email_configs:
      - to: '' # receiver email
        from: '' # sender(monitoring system) gmail
        smarthost: 'smtp.gmail.com:587'
        auth_username: '' # sender(monitoring system) gmail
        auth_identity: '' # sender(monitoring system) gmail
        auth_password: '' # sender(monitoring system) gmail's app password <https://support.google.com/mail/answer/185833?hl=en>
        send_resolved: true

Example

global:
  resolve_timeout: 1m

route:
  receiver: 'gmail-notifications'

receivers:
  - name: 'gmail-notifications'
    email_configs:
      - to: '[email protected]'
        from: '[email protected]'
        smarthost: 'smtp.gmail.com:587'
        auth_username: '[email protected]'
        auth_identity: '[email protected]'
        auth_password: 'my-auth-password'
        send_resolved: true

Configure Prometheus

In order to start Prometheus, it needs some configuration. Create a configuration yaml file in the following directory.

sudo vi /etc/prometheus/prometheus.yml

The configuration file should look as below.

global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - "rules.yml"
  
alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - "localhost:9093"

scrape_configs:
  - job_name: "prometheus"
    scrape_interval: 5s
    static_configs:
      - targets: [ "localhost:9090" ]
  - job_name: "bifrost_node"
    scrape_interval: 5s
    static_configs:
      - targets: [ "localhost:9615" ]
  - job_name: "node_exporter"
    scrape_interval: 5s
    static_configs:
      - targets: [ "localhost:9100" ]
  - job_name: "bifrost_relayer"
    scrape_interval: 5s
    static_configs:
      - targets: [ "localhost:8000" ]

Starting Prometheus

Next, the Systemd configuration should be set for Prometheus. Create a configuration file in the following directory.

sudo vi /etc/systemd/system/prometheus.service

The configuration file should look as below.

[Unit]
Description=Prometheus Monitoring
Wants=network-online.target
After=network-online.target

[Service]
Type=simple
ExecStart=/usr/local/bin/prometheus \
  --config.file /etc/prometheus/prometheus.yml \
  --storage.tsdb.path /var/lib/prometheus/ \
  --web.console.templates=/etc/prometheus/consoles \
  --web.console.libraries=/etc/prometheus/console_libraries
ExecReload=/bin/kill -HUP $MAINPID

[Install]
WantedBy=multi-user.target

Now, enable and start the service.

sudo systemctl daemon-reload
sudo systemctl enable prometheus
sudo systemctl start prometheus

To test out if it all successfully worked, access YOUR_SERVER_IP_ADDRESS:9090. If the Prometheus dashboard appears, it is good to go.

Starting NodeExporter

The Systemd configuration should be set for the NodeExporter. Crate a configuration file in the following directory.

sudo vi /etc/systemd/system/node_exporter.service

The configuration file should look as below.

[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target

[Service] 
Type=simple
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=multi-user.target

Now, enable and start the service.

sudo systemctl enable node_exporter
sudo systemctl start node_exporter

Starting AlertManager

The Systemd configuration should be set for the AlertManager. Crate a configuration file in the following directory.

sudo vi /etc/systemd/system/alertmanager.service

The configuration file should look as below.

[Unit]
Description=AlertManager Server Service
Wants=network-online.target
After=network-online.target

[Service]
Type=simple
ExecStart=/usr/local/bin/alertmanager \
--config.file /etc/alertmanager/alertmanager.yml \
--storage.path /var/lib/alertmanager \
--web.external-url=http://localhost:9093 \
--cluster.advertise-address='0.0.0.0:9093'

[Install]
WantedBy=multi-user.target

Now, enable and start the service.

sudo systemctl enable alertmanager
sudo systemctl start alertmanager

Installing Grafana

To visualize your Prometheus metrics, you should install Grafana, which queries the Prometheus server. The latest releases can be checked on their download page. Execute the following commands to install the necessary dependencies.

sudo apt-get install -y adduser libfontconfig1
wget https://dl.grafana.com/enterprise/release/grafana-enterprise_10.0.1_amd64.deb
sudo dpkg -i grafana-enterprise_10.0.1_amd64.deb

Starting Grafana

Then enable and start the service with default configurations.

sudo systemctl daemon-reload
sudo systemctl enable grafana-server
sudo systemctl start grafana-server

You can now access it by going to YOUR_SERVER_IP_ADDRESS:3000/login. The default user and password is admin/admin.

Using Docker

This section contains how to install and setup Prometheus and Grafana by using Docker.

Requirements

First, Docker and Docker Compose should be installed in your server. Then you can download the docker-compose.yml file that is provided in Bifrost node's Github repository. Download the file by using the command below. The file will be located in the maintenance directory.

git clone https://github.com/bifrost-platform/bifrost-node.git
cd bifrost-node/maintenance

Configure AlertManager

The alert rules are pre-defined in the maintenance/prometheus/rules.yml file. It contains 2 basic rules that will trigger an alert in case the instance is down or the CPU usage crosses 80%. The file will be provided as below.

groups:
  - name: alert_rules
    rules:
      - alert: InstanceDown
        expr: up == 0
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Instance $labels.instance down"
          description: "[{{ $labels.instance }}] of job [{{ $labels.job }}] has been down for more than 1 minute."

      - alert: HostHighCpuLoad
        expr: 100 - (avg by(instance)(rate(node_cpu_seconds_total{mode="idle"}[2m])) * 100) > 80
        for: 0m
        labels:
          severity: warning
        annotations:
          summary: Host high CPU load (instance bLd Kusama)
          description: "CPU load is > 80%\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

The alertmanager.yml file is used to set the external service that will be called when an alert is triggered. Here, we are going to use the Gmail notification.

For Gmail notification, you will need to generate an app password. We recommend you to use a dedicated email address for your alerts. In order to set-up follow this link.

The file locates in the maintenance/alertmanager/alertmanager.yml directory. Then, add the Gmail configuration to it and save the file as below.

global:
  resolve_timeout: 1m

route:
  receiver: 'gmail-notifications'

receivers:
  - name: 'gmail-notifications'
    email_configs:
      - to: '' # receiver email
        from: '' # sender(monitoring system) gmail
        smarthost: 'smtp.gmail.com:587'
        auth_username: '' # sender(monitoring system) gmail
        auth_identity: '' # sender(monitoring system) gmail
        auth_password: '' # sender(monitoring system) gmail's app password <https://support.google.com/mail/answer/185833?hl=en>
        send_resolved: true

Example

global:
  resolve_timeout: 1m

route:
  receiver: 'gmail-notifications'

receivers:
  - name: 'gmail-notifications'
    email_configs:
      - to: '[email protected]'
        from: '[email protected]'
        smarthost: 'smtp.gmail.com:587'
        auth_username: '[email protected]'
        auth_identity: '[email protected]'
        auth_password: 'my-auth-password'
        send_resolved: true

Configure Prometheus

In order to start Prometheus, it needs some configuration. The configuration file locates at maintenance/prometheus/prometheus.yml. For Full-Node operators who runs the node and relayer both, should manually uncomment the below "relayer" job.

global:
  scrape_interval: 3s
  evaluation_interval: 3s

rule_files:
  - "rules.yml"

alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - "alertmanager:9093"

scrape_configs:
  - job_name: "prometheus"
    scrape_interval: 3s
    static_configs:
      - targets: ["localhost:9090"]
  - job_name: "bifrost_node"
    scrape_interval: 3s
    static_configs:
      - targets: ["host.docker.internal:9615"]
  - job_name: "node_exporter"
    scrape_interval: 3s
    static_configs:
      - targets: ["node-exporter:9100"]
  # - job_name: "bifrost_relayer"
  #   scrape_interval: 3s
  #   static_configs:
  #     - targets: ["host.docker.internal:8000"]

Run Docker Containers

If you have followed every processes above, return to the maintenance directory and execute the following command.

docker compose up -d

You can now access it by going to YOUR_SERVER_IP_ADDRESS:3000/login. The default user and password is admin/admin.

Datasource Configuration

If it is all set, create a new Prometheus datasource and input the URL as http://localhost:9090 and then click “Save & Test” as below.

For Docker users, the URL should be set to http://prometheus:9090.

Then, create a new Prometheus AlertManager datasource and input the URL as http://localhost:9093 and then click "Save & Test" as below.

For Docker users, the URL should be set to http://alertmanager:9093.

Next, the dashboard has to be imported. Access the "Dashboards" tab and click on "New" to import the dashboard as below.

Now, in the "Import via grafana.com" section, input the dashboard ID as 19207 and then click "Load" to continue.

If it has been successfully loaded, set the correct datasources that you have just created before. The Prometheus and the AlertManager has to be set correctly. Then click "Import" to continue.

In the meantime, if your node and relayer is running in the background, the collected metrics will be visualized as below.

PreviousSystem Monitoring NextSentry

Last updated 2 years ago