Reporting Stack Vision Notes

07 May 2019

Where We Are

Nifi batch ingestion, transform, sink
Superset on SQL
Nifi central, plays a big role
What would be required in this setup if we wanted to see in a report, an extra data field found in requisition? A lot of coordination and knowledge of Nifi

Position the Reporting Stack as central for empowering analysis, assisting with integrations, and enabling smart workflows.

Analysis, integrations, smarter workflows
- Tools, infrastructure, community practices for all things reports
- Open standards for integrations
- Smarter workflows, ex: resupply knowing fridge (because of RTM), how much need and by when, sourced from HMIS (e.g. DHIS2)
Match OpenLMIS for upgradeability, reliability, community contribution
- Upgrade, out-of-box report upgrades as OpenLMIS does
- Reliability, stack "just works" when starting, stopping, upgrading, etc.
- An implementation should have clear expectations for how they can build, modify, and share reports with the community

Batch oriented
API
- Requires correct configuration (user, roles)
- API's have been (have to be) modified to support mass data ingestion
Finicky Scheduling
- Since working in batch operations
- Write operations needing coordination, which will become more complex as we add more data management
Schema Management

More boxes, separate out responsibilities
Nifi no longer pulls from reference data and requisition services
Services streaming their changes to Kafka and Connect, with data pumps, PubSub, CDC/Debezium
Nifi processors transforms messages from input topic to output topic
- Moving away from authentication
- Moving away from hitting APIs
Connect sink to reporting db (which now has Flyway)
Schema Registry for schema management
Monitoring (Prometheus and Grafana), how long is the pipeline taking, from data going into requisition, to data being available for reporting
Services now may access reporting db for smarter workflows (like adv. forecasting)

Reporting db currently has materialized views, in this new paradigm, who would be in charge of scheduling aggregations?
- Nifi may still handle the scheduling piece, or it could be in a Kafka topic; just details
- We don't expect to change things wholesale, will likely have an incremental approach to changing architecture
How much will reporting stack be prioritized going forward?
- Likely high up in priority
- Reporting has always been a point of variability in implementations
Any changes with Superset? No
Why Prometheus and Grafana for monitoring and not Scalyr?
- They do different things; scalyr is more for searching logs in the cloud, and the others are more for metrics