Attendees

Goal: Kickoff technical vision, ask questions.

Agenda:

Project and goals (incl technical)

Give TZ an upgrade path to v3

Get v3 reports working in country, where the source system is v2.
Deliver, in-code (repeatable and testable), an ETL pipeline that can pull data from v2 and get it into the v3 reporting stack.
Achieve a 0 modification goal to v2 source system
source system has no discernible impact
Deliver a working system that has 0 bespoke modifications to v3, so that the v3 components can still be on the continual upgrade path
Deliver a streaming system, so that the pipeline's transformation stages may be repeated at-will.

Technical vision

Leverage the "reporting stack" to form the basis of this data migration pipeline
- Dockerized Nifi, Postgres, Kafka, Zookeeper, Debezium
- To reduce hosting costs we hope to leverage the same containers in migration as in reporting, however dependent on network topology and IT security we may need to run some containers in separate instances
Move the reporting stack back toward a streaming (kappa) architecture (w/ Kafka again)
Introduce Debezium for Change Data Capture, also to move back toward a streaming architecture, and to eliminate (nearly) load on source v2 system.
- Enable replication in eLMIS postgres (load module and change security)
- Debezium needs privileged network access to Postgres replication
  - opt 1: Debezium straight to source production eLMIS postgres
  - opt 2: Debezium to replicated eLMIS postgres

Reporting original: Nifi (Extract) - > Kafka (store) → Nifi (Transform) → Kafka (store) → Druid / Postgres (reporting snapshot)

Reporting got: Nifi (extract) → Postgres (reporting snapshot)

Move towards: Debezium (CDC streaming) → Kafka (store) → Nifi (transform) → Kafka (store) → Postgres (reporting snapshot)

Risks

Reporting stack learning curve
Access to v2 production data, access to discern semantics behind structure
Effecting availability of eLMIS source system would be very very bad
Re-introduce Kafka (as in learning curve?)
Introducing Debezium
- Solve for aggregate root problem
  - Stream join: https://debezium.io/blog/2018/01/17/streaming-to-elasticsearch/
  - Aggregate root table: https://debezium.io/blog/2018/09/20/materializing-aggregate-views-with-hibernate-and-debezium/
Reporting stack doesn't achieve robustness level we'd need

Next steps:

Josh Zamor (Deactivated) create a wiki project page thing
Josh Zamor (Deactivated) to re-word streaming pros for laymen - technical goal section's priorities
Josh Zamor (Deactivated) to add links on Kappa, Debezium aggregate roots, do a once over on doc for links
Wesley Brown to add links to google docs (Josh Zamor (Deactivated) to find what he can in meantime)

Wesley Brown to followup with project goals, schedule, roles and responsibilities, travel?, etc. Logistics of project!
Wesley Brown who are decision makers (essentially roles and responsibilities)? (e.g. specifically on streaming architecture, on data access, on processes)
Chongsun Ahn (Unlicensed) to start on learning plan for reporting stack, Kafka, Kappa architecture, Debezium, challenges we need to solve for