2019-02-14 Ghost Migration Notes
Attendees
@Josh Zamor (Deactivated)
@Chongsun Ahn (Unlicensed)
Goal: Kickoff technical vision, ask questions.
Agenda:
Overview project, goals
Overview thoughts on technical vision
Identify where the highest risks are
Questions
AOB
Project and goals (incl technical)
Give TZ an upgrade path to v3
Get v3 reports working in country, where the source system is v2.
Deliver, in-code (repeatable and testable), an ETL pipeline that can pull data from v2 and get it into the v3 reporting stack.
Achieve a 0 modification goal to v2 source system
source system has no discernible impact
Deliver a working system that has 0 bespoke modifications to v3, so that the v3 components can still be on the continual upgrade path
Deliver a streaming system, so that the pipeline's transformation stages may be repeated at-will.
https://docs.microsoft.com/en-us/azure/architecture/patterns/strangler
Technical vision
Leverage the "reporting stack" to form the basis of this data migration pipeline
Dockerized Nifi, Postgres, Kafka, Zookeeper, Debezium
To reduce hosting costs we hope to leverage the same containers in migration as in reporting, however dependent on network topology and IT security we may need to run some containers in separate instances
Move the reporting stack back toward a streaming (kappa) architecture (w/ Kafka again)
Introduce Debezium for Change Data Capture, also to move back toward a streaming architecture, and to eliminate (nearly) load on source v2 system.
Enable replication in eLMIS postgres (load module and change security)
Debezium needs privileged network access to Postgres replication
opt 1: Debezium straight to source production eLMIS postgres
opt 2: Debezium to replicated eLMIS postgres
Reporting original: Nifi (Extract) - > Kafka (store) → Nifi (Transform) → Kafka (store) → Druid / Postgres (reporting snapshot)
Reporting got: Nifi (extract) → Postgres (reporting snapshot)
Move towards: Debezium (CDC streaming) → Kafka (store) → Nifi (transform) → Kafka (store) → Postgres (reporting snapshot)
Risks
Reporting stack learning curve
Access to v2 production data, access to discern semantics behind structure
Effecting availability of eLMIS source system would be very very bad
Re-introduce Kafka (as in learning curve?)
Introducing Debezium
Solve for aggregate root problem
Reporting stack doesn't achieve robustness level we'd need
Next steps:
Main project document: https://docs.google.com/document/d/19L-zvGrxMmZhXIUjIdMRqfK4cVT1qwguuDqv8vqkg10/edit#heading=h.iucjibmhqt53