/
2019-02-14 Ghost Migration Notes

2019-02-14 Ghost Migration Notes


Attendees


Goal:  Kickoff technical vision, ask questions.


Agenda:

  • Overview project, goals
  • Overview thoughts on technical vision
  • Identify where the highest risks are
  • Questions
  • AOB


Project and goals (incl technical)


Give TZ an upgrade path to v3


  1. Get v3 reports working in country, where the source system is v2.
  2. Deliver, in-code (repeatable and testable), an ETL pipeline that can pull data from v2 and get it into the v3 reporting stack.
  3. Achieve a 0 modification goal to v2 source system
  4. source system has no discernible impact
  5. Deliver a working system that has 0 bespoke modifications to v3, so that the v3 components can still be on the continual upgrade path
  6. Deliver a streaming system, so that the pipeline's transformation stages may be repeated at-will.

https://docs.microsoft.com/en-us/azure/architecture/patterns/strangler


Technical vision

  • Leverage the "reporting stack" to form the basis of this data migration pipeline
    • Dockerized Nifi, Postgres, Kafka, Zookeeper, Debezium
    • To reduce hosting costs we hope to leverage the same containers in migration as in reporting, however dependent on network topology and IT security we may need to run some containers in separate instances
  • Move the reporting stack back toward a streaming (kappa) architecture (w/ Kafka again)
  • Introduce Debezium for Change Data Capture, also to move back toward a streaming architecture, and to eliminate (nearly) load on source v2 system.
    • Enable replication in eLMIS postgres (load module and change security)
    • Debezium needs privileged network access to Postgres replication
      • opt 1:  Debezium straight to source production eLMIS postgres
      • opt 2:  Debezium to replicated eLMIS postgres


Reporting original:  Nifi (Extract) - > Kafka (store) → Nifi (Transform) → Kafka (store) → Druid / Postgres (reporting snapshot)

Reporting got:  Nifi (extract) → Postgres (reporting snapshot)

Move towards:  Debezium (CDC streaming) → Kafka (store) → Nifi (transform) → Kafka (store) → Postgres (reporting snapshot)


Risks

Next steps:

  • Josh Zamor create a wiki project page thing
  • Josh Zamor to re-word streaming pros for laymen - technical goal section's priorities
  • Josh Zamor to add links on Kappa, Debezium aggregate roots, do a once over on doc for links
  • Wesley Brown to add links to google docs (Josh Zamor to find what he can in meantime)

Main project document: https://docs.google.com/document/d/19L-zvGrxMmZhXIUjIdMRqfK4cVT1qwguuDqv8vqkg10/edit#heading=h.iucjibmhqt53

  • Wesley Brown to followup with project goals, schedule, roles and responsibilities, travel?, etc.  Logistics of project!
  • Wesley Brown who are decision makers (essentially roles and responsibilities)? (e.g. specifically on streaming architecture, on data access, on processes)
  • Chongsun Ahn (Unlicensed) to start on learning plan for reporting stack, Kafka, Kappa architecture, Debezium, challenges we need to solve for

OpenLMIS: the global initiative for powerful LMIS software