Add monitoring of Nifi using Prometheus/Grafana

Description

As a developer/sys admin of Casper I want to know how the pipeline is working, and to help me identify when something stops working so that I can have visibility.

 

We think we want:

  • Grafana for visualization and alerting

  • Prometheus for data scraping and querying

    • We might want something like influxdb if the thing being monitored supports sending that data

 

Some functionality we’d like:

  • To know when something stops responding / is down.

  • To have visibility of the compute resources (cpu load, memory, etc) of the servers

  • To be able to chart the time it takes a message to enter the pipeline, move through “stages” and finally be synced.

  • To monitor vital stats on a kafka topic: # of topics and size of each

 

Activity

Show:
Artur Lebiedziński
updated the Workflow
February 8, 2023 at 1:26 PM
Workflow for Project OLMIS v7-19-2016
OLMIS-Core v.1 2023
Philip Garrison
changed the Status
August 22, 2019 at 11:12 PM
QA
Done
Philip Garrison
updated the Resolution
August 22, 2019 at 11:12 PM
None
Done
Philip Garrison
changed the Status
August 20, 2019 at 1:02 AM
In Progress
QA
Philip Garrison
changed the Status
August 19, 2019 at 11:50 PM
To Do
In Progress
Chongsun Ahn
changed the Assignee
August 1, 2019 at 12:49 AM
Chongsun Ahn
Philip Garrison
Chongsun Ahn
changed the Assignee
August 1, 2019 at 12:49 AM
Unassigned
Chongsun Ahn
Chongsun Ahn
changed the Status
August 1, 2019 at 12:49 AM
RoadMap
To Do
Philip Garrison
updated the Description
July 29, 2019 at 9:37 PM
As a developer/sys admin of Casper I want to know how the pipeline is working, and to help me identify when something stops working so that I can have visibility. We think we want: * Grafana for visualization and alerting * Prometheus for data scraping and querying ** We might want something like influxdb if the thing being monitored supports sending that data Some functionality we’d like: * To know when something stops responding / is down. * To have visibility of the compute resources (cpu load, memory, etc) of the servers * To be able to chart the time it takes a message to enter the pipeline, move through “stages” and finally be synced. * To monitor vital stats on a kafka topic: # of topics and size of each
As a developer/sys admin of Casper I want to know how the pipeline is working, and to help me identify when something stops working so that I can have visibility. We think we want: * Grafana for visualization and alerting * Prometheus for data scraping and querying ** We might want something like influxdb if the thing being monitored supports sending that data Some functionality we’d like: * To know when something stops responding / is down. * To have visibility of the compute resources (cpu load, memory, etc) of the servers * To be able to chart the time it takes a message to enter the pipeline, move through “stages” and finally be synced. * To monitor vital stats on a kafka topic: # of topics and size of each
Josh Zamor
updated the Link
July 24, 2019 at 7:49 PM
None
This issue is duplicated by OLMIS-6468
Josh Zamor
updated the Description
July 24, 2019 at 7:47 PM
None
As a developer/sys admin of Casper I want to know how the pipeline is working, and to help me identify when something stops working so that I can have visibility. We think we want: * Grafana for visualization and alerting * Prometheus for data scraping and querying ** We might want something like influxdb if the thing being monitored supports sending that data Some functionality we’d like: * To know when something stops responding / is down. * To have visibility of the compute resources (cpu load, memory, etc) of the servers * To be able to chart the time it takes a message to enter the pipeline, move through “stages” and finally be synced. * To monitor vital stats on a kafka topic: # of topics and size of each
Chongsun Ahn
created the Task
July 24, 2019 at 10:56 AM
Done
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Priority

Time Assistant

Created July 24, 2019 at 10:56 AM
Updated August 22, 2019 at 11:12 PM
Resolved August 22, 2019 at 11:12 PM