OpenLMIS v3 System Monitoring Tools

This page documents a number of system monitoring tools that can be used for implementations. System monitoring tools allow devops teams and site reliability engineers to monitor the activity, capacity and health of servers, hosts and applications in local and cloud deployments. These tools regularly monitor the health of the deployment and are often able to automatically correct, restart or escalate issues to the support team.

System monitoring tools are different than Consul. Consul is the technology that identifies the registered microservices in the OpenLMIS v3 architecture. Every time a microservice activates, it registers with the Consul service so other services can discover and interact with it. Consul has the ability to identify failures in the microservices and spin up new versions of microservices. The system monitoring tools in this discussion are a more broad category that extend to the entire health of the platform.

Implementers need to choose the best monitoring tools for their team and environment. Currently OpenLMIS v3 is integrated with Scalyr and the teams will continue investing in that integration. This page represents a number of alternative solutions that can be used to monitor the implementation, including do-it-yourself solutions.

Hosted Services

Scalyr

Scalyr is a log management, monitoring and analytics service that collects server logs from OpenLMIS v3. All logs are sent to the Scalyr cloud service where they perform real time analytics and have dashboards that define the health of the installation. The Scalyr service has the ability to raise alerts to support staff if thresholds have been reached. Scalyr is a paid service (open source discounts may be available) and is used by the core OpenLMIS development team for online services as well as the Malawi deployment and Mozambique deployment of OpenLMIS v2. Scalyr is already integrated with OpenLMIS v3.

Amazon CloudWatch

"Amazon CloudWatch is a monitoring service for AWS cloud resources and the applications you run on AWS. You can use Amazon CloudWatch to collect and track metrics, collect and monitor log files, set alarms, and automatically react to changes in your AWS resources." Amazon CloudWatch is deployed in Malawi to monitor the health of their cloud hosted environment. CloudWatch is a paid service provided for Amazon cloud implementations.

New Relic

New Relic is a cloud platform for managing multiple areas of the enterprise in a single dashboard. The New Relic Devops platform provides modern dashboards and tool sets to monitor applications, infrastructure, transactions across services, infrastructure changes and utilization. New Relic appears to be a paid solution and is not yet integrated with OpenLMIS v3.

Do-It-Yourself

Prometheus

"Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud." Prometheus allows you to monitor the internals of services to see how they are performing. This would require updating the microservices to have more granular reports and metrics than what we capture with Scalyr. Prometheus is 100% open source and needs to be run on your own.

GrayLog

GrayLog is an open source application that provides enterprise log management, similar to Scalyr. It allows teams to collect and process log data, analyze and research, drill down and visualize and alert support staff. GrayLog needs to be run by the implementation team and is not currently integrated with OpenLMIS v3.

Graphite

"Graphite is an enterprise-ready monitoring tool that runs equally well on cheap hardware or Cloud infrastructure. Teams use Graphite to track the performance of their websites, applications, business services, and networked servers. It marked the start of a new generation of monitoring tools, making it easier than ever to store, retrieve, share, and visualize time-series data." Graphite is usually paired with Grafana, which is a visualization tool for data collected by Graphite. Graphite is not currently integrated with OpenLMIS v3.

Grafana

Grafana is an open source software for analyzing time series information. "Grafana allows you to query, visualize, alert on and understand your metrics no matter where they are stored. Create, explore, and share dashboards with your team and foster a data driven culture." It integrates with other server monitoring tools to display, alert and unify the entire devops portfolio. Grafana is not currently integrated with OpenLMIS v3.

Nagios

Nagios is an open source system for monitoring IT infrastructure including network, server and applications. The Nagios core is extensible with plugins to monitor many different types of applications, servers and network tools. Nagios is not currently integrated with OpenLMIS v3.

OpenLMIS: the global initiative for powerful LMIS software