...
- Overview project, goals
- Overview thoughts on technical vision, identify
- Identify where the highest priority questions/ risks are
- Questions
- AOB
Project and goals (incl technical)
Give TZ an upgrade path to v3
- Get v3 reports working in country, where the source system is v2.
- Deliver, in-code (repeatable and testable), an ETL pipeline that can pull data from v2 and get it into the v3 reporting stack.
- Achieve a 0 modification goal to v2 source system
- source system has no discernible impact
- Deliver a working system that has 0 bespoke modifications to v3, so that the v3 components can still be on the continual upgrade path
- Deliver a streaming system, so that the
...
- , and that the pipeline's transformation stages may be repeated at-will.
...
https://docs.microsoft.com/en-us/azure/architecture/patterns/strangler
Technical vision
- Leverage the "reporting stack" to form the basis of this data migration pipeline.
- Dockerized Nifi, Postgres, Kafka, Zookeeper, Debezium
- To reduce hosting costs we hope to leverage the same containers in migration as in reporting, however dependent on network topology and IT security we may need to run some containers in separate instances
- Move the reporting stack back toward a streaming (lambdakappa) architecture (w/ Kafka yet again)
- Introduce Debezium for Change Data Capture, also to move back toward a streaming architecture, and to eliminate (nearly) load on source v2 system.
- Enable replication in eLMIS postgres (load module and change security)
- Debezium needs privileged network access to Postgres replication
- opt 1: Debezium straight to source production eLMIS postgres
- opt 2: Debezium to replicated eLMIS postgres
Reporting original: Nifi (Extract) - > Kafka (store) → Nifi (Transform) → Kafka (store) → Druid / Postgres (reporting snapshot)
Reporting got: Nifi (extract) → Postgres (reporting snapshot)
Move towards: Debezium (CDC streaming) → Kafka (store) → Nifi (transform) → Kafka (store) → Postgres (reporting snapshot)
Risks
- Reporting stack learning curve
- Access to v2 production data, access to discern semantics behind structure
- Effecting availability of eLMIS source system would be very very bad
- Re-introduce Kafka (as in learning curve?)
- Introducing Debezium
- Solve for aggregate root problem
- Stream join
- Aggregate root table
- Solve for aggregate root problem
- Reporting stack doesn't achieve robustness level we'd need
Next steps:
- Josh Zamor create a wiki project page thing
- Josh Zamor to re-word streaming pros for laymen - technical goal section's priorities
- Josh Zamor to add links on Kappa, Debezium aggregate roots, do a once over on doc for links
- Wesley Brown to add links to google docs (Josh Zamor to find what he can in meantime)
- Wesley Brown to followup with project goals, schedule, roles and responsibilities, travel?, etc. Logistics of project!
- Wesley Brown who are decision makers (essentially roles and responsibilities)? (e.g. specifically on streaming architecture, on data access, on processes)
- Chongsun Ahn (Unlicensed) to start on learning plan for reporting stack, Kafka, Kappa architecture, Debezium, challenges we need to solve for