WIP - goal is to think through the transformations that we'll likely need the most given our values, and not get confused with transformations that we might find commonly with out toolset which are usually geared toward analysis rather than migration.
Most Common
Mapper / Cross walk (likely the most common activity)
- Map field name A → B
Filter
- Discard field, row, etc
- Validation
- Normalization
ID Replacement
- Replace an ID with another. Usually from some structural difference in schemas.
Aggregator
- Add known values from source B not present in source A
- Join data into aggregate roots
Less likely to be needed
De-duplication
- Identify and remove duplicates
Pivot
- e.g. pivot table. Often used to de-normalize. In migrating from transactional to transactional system we're not likely to need this often.
Nifi and Transformations?
Key Question is we know that Nifi /can/ do these things, the question is it the best tool for this job? Key concern with Nifi has been managing it.
Batch transformation with Streams
- On joining streams: https://debezium.io/blog/2018/03/08/creating-ddd-aggregates-with-debezium-and-kafka-streams/
- An alternative that publishes an aggregate (domain root) before it hits the stream: https://debezium.io/blog/2018/09/20/materializing-aggregate-views-with-hibernate-and-debezium/
- Note: this would require modifying the source system, which for this project is less attractive.