Presentation: Josh Zamor Please add link to presentation
Where We Are
Nifi batch ingestion, transform, sink
Superset on SQL
Nifi central, plays a big role
What would be required in this setup if we wanted to see in a report, an extra data field found in requisition? A lot of coordination and knowledge of Nifi
Where We Want To Go
Position the Reporting Stack as central for empowering analysis, assisting with integrations, and enabling smart workflows.
Analysis, integrations, smarter workflows
Tools, infrastructure, community practices for all things reports
Open standards for integrations
Smarter workflows, ex: resupply knowing fridge (because of RTM), how much need and by when, sourced from HMIS (e.g. DHIS2)
Match OpenLMIS for upgradeability, reliability, community contribution
Upgrade, out-of-box report upgrades as OpenLMIS does
Reliability, stack "just works" when starting, stopping, upgrading, etc.
An implementation should have clear expectations for how they can build, modify, and share reports with the community
What's In Our Way
Batch oriented
API
Requires correct configuration (user, roles)
API's have been (have to be) modified to support mass data ingestion
Finicky Scheduling
Since working in batch operations
Write operations needing coordination, which will become more complex as we add more data management
Schema Management
How We're Going To Get There
More boxes, separate out responsibilities
Nifi no longer pulls from reference data and requisition services
Services streaming their changes to Kafka and Connect, with data pumps, PubSub, CDC/Debezium
Nifi processors transforms messages from input topic to output topic
Moving away from authentication
Moving away from hitting APIs
Connect sink to reporting db (which now has Flyway)
Schema Registry for schema management
Monitoring (Prometheus and Grafana), how long is the pipeline taking, from data going into requisition, to data being available for reporting
Services now may access reporting db for smarter workflows (like adv. forecasting)
Q&A
Reporting db currently has materialized views, in this new paradigm, who would be in charge of scheduling aggregations?
Nifi may still handle the scheduling piece, or it could be in a Kafka topic; just details
We don't expect to change things wholesale, will likely have an incremental approach to changing architecture
How much will reporting stack be prioritized going forward?
Likely high up in priority
Reporting has always been a point of variability in implementations
Any changes with Superset? No
Why Prometheus and Grafana for monitoring and not Scalyr?
They do different things; scalyr is more for searching logs in the cloud, and the others are more for metrics