- Data ingestion using Apache NiFi;
- Data warehousing in the Hadoop File System (HDFS);
- Stream processing using Apache Kafka;
- OLAP database storage using Druid; andstorage and querying using PostgreSQL;
- Visualization using Apache Superset.
Data Ingestion and Coordination Using NiFi
To coordinate the task of retrieving data from OpenLMIS and doing basic transformations, we will use Apache NiFi - an open source tool that makes it easy to reliably process and distribute data.
Real-time Stream Processing using Kafka
Apache Kafka is a real-time stream processor that uses the publish-subscribe message pattern. We will use Kafka to receive incoming messages (data) and publish them to a specific topic-based queue that our data warehouse, Druid, will subscribe to and insert them into its database. Using Kafka we will also store data collected from OpenLMIS into a Hadoop data store (HDFS) for long term storage, facilitating the generation of historical insights.
High performance queries with Druid
Data Storage and Querying Using PostgreSQL
NiFi will write all data from OpenLMIS APIs into a PostgreSQL database. We opted for PostgreSQL due to its usability and ability to perform well at the scale of a typical OpenLMIS implementation.
Visualizing Data in Superset
Below is the initial proposed scope of this reporting engagement; it will be continually reviewed, re-prioritized, and adjusted based on guidance of the OpenLMIS Product Owner during the engagement. Note the below text makes reference to systems that are no longer in the current reporting stack – specifically Kafka, HDFS, and Druid. HDFS and Druid have been replaced by PostgreSQL, and Kafka was not required for the requisitions microservice.
Phase 1 - Data warehouse infrastructure setup
Once we have demonstrated the ability to generate and visualize DISC indicators from static data, we will then begin to work to support ingesting data directly from OpenLMIS APIs. We will start initially by supporting 2-3 API endpoints from OpenLMIS for data ingestion using Ni-Fi into the data warehouse infrastructure. In the process, we will help map out what data transformations/joins are needed. This is a potentially complex process that will require support from the OenLMIS team. The OpenLMIS team will be responsible for providing a running instance of OpenLMIS to access and ensuring it’s its populated with relevant data that can be used for reporting purposes. If time allows, we will continue to work to support additional API data endpoints. The goal of this this final initial phase is to be able to demonstrate an end-to-end working reporting solution.