...
- Pipelines allow us to break the problem down into structured independent data flows.
- Everything is repeatable
- Infrastructure as Code
- (Re-)Running Pipelines
- Experimentation
- Nothing is manual, nothing is a one-off - Automate
- From environment setup to deployment, we put everything we can in code.
- We never "just do a one-time tweak to the database/API/app/etc"
- Backups
- Resilient
- Not a "big bang" migration, it will need to function continuously for weeks, months, perhaps even a year or longer.
- During lifetime eLMIS and OpenLMIS v3 will have releases, data models will change without notice.
- Source and destination software releases should slow new data from flowing, however both systems should stay available.
- Data is usable in near-real time, a user should expect their data once entered to be available in seconds to minutes, not tens of minutes, hours or days.
- Source system should be unaware and unaffected
- We must demonstrate that eLMIS availability and performance can't be noticeably effected.
- We must convince eLMIS developers that they do not need to coordinate releases with us, merely notify us of what's changing and by when.
Approach
Tech Stack: Tech Stack
- Deployed via Terraform, Ansible and Docker.
- Terrraform & Ansible to provision environments.
- Ansible to manage deployed application.
- Docker as deployment unit and runtime.
- Streaming architecture centered around Kafka
- Decouple source and destination systems.
- Native support for pipelines.
- Kafka is one of the top leading technologies - built in support for high-performance, repeatable pipelines, and resilient log-architecture.
- Change Data Capture using Debezium
- Debezium works off the same internal PostgreSQL mechanism as replication - eLMIS database should be unaware and unaffected by it's presence.
- Change Data Capture allows us to capture changes made directly to a database as they're made
- We have direct access to the source data & it's schema, including changes to either.
- Resilient against network outages - if the network is down, then Debezium simply will queue until it's restored.
- Monitoring using Prometheus and Grafana
- Control using ???
- Simple (e.g. REST) interface for starting/stopping pipeline
- Clear a pipeline's destination, and re-run pipeline (e.g. clear a table in v3, re-run the pipeline with a new transformation phase).
- Simple Transformations using Nifi (or perhaps Java / Python)
- Simple transformations may be chained together as needed, and are more testable.
- Nifi is provisional:
- Nifi is a tool we've been learning
- Nifi Processors are a known commodity for building transformation stages
- Nifi has been one of the more difficult tools to use, and it's more suited for building entire pipelines that are run one-off, rather than simple transformations.
- We don't have a good method for testing Nifi - individual processors.
- Schema Registry (optional)
- Reduces size of messages in Kafka for database changes, improving performance.
- Optional as it's mostly a performance enhancement.