Proposed Reporting Platform Repository Structure
Background
The new reporting platform requires a number of components that each run independently. The team needs to determine a structure for the code repository to ensure we follow practices that are architecturally sound. This document provides a central location for the discussion and focuses on documenting the decisions we made to arrive at the structure. This document was informed by this thread on the Openlmis-dev forum
The following components will be addressed:
- Apache Nifi
- Kafka
- Druid
- (Optional) Relational Datastore - This is dependent on the implementation needs. We need to decide if OpenLMIS wants to support a relational database for smaller implementations.
- Apache SuperSet
- Apache Zookeeper
Each of the components need to be able to be stood up with a docker-compose command, they need to support CI/CD and eventually allow for version controlled templating. This document includes a proposed GitHub Repository structure with links to Docker hub containers that can be used.
Proposed Structure
We will need to add a reporting docker-compose
file and templates for the specific service to the openlmis-ref-distro repository that stands up the reporting stack.
We would like to implement the reporting stack without creating new Docker images where ever possible, therefore we propose creating a reporting
folder in the openlmis-ref-distro
repo that will contain the docker-compose.yml
file and associated template files. Tooling that we build to load templates into the reporting stack will exist elsewhere in separate tool repo(s).
The docker-compose
file will need to reference Docker files for the following services and the repository will need to contain templates for them:
- NiFi
- Description: Contain all code related to running a production quality, versioned Nifi environment.
- Official Website: https://nifi.apache.org/
- Docker Container(s):
- Nifi: https://hub.docker.com/r/apache/nifi/
- Note: This docker container is unofficial and supports running the container in standalone mode either unsecured or with user authentication provided through two-way SSL with client certificates or LDAP.
- Nifi-Registry: https://hub.docker.com/r/apache/nifi-registry/
- (Possibly a data storage docker container)
- Nifi: https://hub.docker.com/r/apache/nifi/
- Kafka
- Description: This repository will contain all code related to running a production quality, versioned Kafka cluster deployed for a single node. Note that Docker configuration is well documented on this website.
- Official Website: https://kafka.apache.org/ | Heavily Supported by https://confluent.io
- Docker Container(s):
- Druid
- Description: This repository will contain all code related to running a production quality, versioned druid cluster that's used for the reporting storage engine. The linked Docker container acts as an example single node cluster.
- Official Website: http://druid.io/
- Docker Container(s):
- Druid Cluster: https://hub.docker.com/r/druidio/example-cluster/
- PostgreSQL
- Description: This repository will contain all code related to running a production quality, versioned PostgreSQL database that's used for storage in the reporting platform.
- Official Website: https://www.postgresql.org/
- Docker Container(s):
- Superset
- Description: This repository will contain all code related to running a production quality, versioned SuperSet environment.
- Official Website: https://superset.apache.org/
- No Official Docker Container(s) were found, but the following is well supported by an individual: https://github.com/amancevice/superset / https://hub.docker.com/r/amancevice/superset/
- Zookeeper
- Description: Apache Zookeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. This repository will contain all code related to running a production quality version of Apache Zookeeper.
- Official Website: https://zookeeper.apache.org/
- Docker Container(s):
- Zookeeper (Confluent): https://hub.docker.com/r/confluentinc/cp-zookeeper/
Configuration
- We may need to load NiFi flow templates and execute other API calls to set values in these templates.
- We may need to create and configure Kafka topics using command line calls, alternatively we may do this via NiFi.
- We may need to load in dashboard templates through the SuperSet API.
OpenLMIS Reporting Repository Template
The OpenLMIS team maintains an OpenLMIS Service Template GitHub repository to support the standardization of all repositories across the GitHub organization. We need to identify if these templates are appropriate for each of these reporting components. This analysis was performed on 7 May 2018 and may have changed since.
Current repository template structure:
- config
- checkstyle
- checkstyle.xml
- pmd
- ruleset.xml
- checkstyle
- consul
- config.json
- package.json
- registration.js
- src
- integration-test/...
- main
- test/...
- .gitignore
- Dockerfile
- LICENSE
- README.md
- build.gradle
- build.sh
- docker-compose.builder.yml
- docker-compose.override.yml
- docker-compose.yml
- documentation.gradle - We need to determine if this is appropriate to keep because we may need to document the component's APIs.
- gradle.properties
- package.json
- settings.gradle
Notes on new reporting template structure:
- To start, we will not have the following:
- config directory - the config directory will likely change. The current config directory contains a code style checker and maven build style checks, which are both based on java principles. We likely don't need that for this configuration.
- consul directory - we are not registering with consul
- src directory - this likely won't include java code, more likely templates
- FLYWAY.md - we likely won't have flyway migrations
- We will likely need to add the following:
- templates - this directory will include templates for auto loading information in to the service. This is applicable for Kafka topics, Nifi templates and SuperSet dashboards.
Discussion Topics
- What monitoring systems are used by OpenLMIS?
- How should each service interact with each monitoring system?
- Each of these docker containers focus on deploying single node clusters. This is fine for development environments, but is not appropriate for production where we would expect the ability to horizontally scale each cluster. How can we account for this as OpenLMIS?
- How should we handle translations? Do they only need to be accounted for in SuperSet?
- Do we need to depend on gradle for any build processes? How can we load in templates through the API for each service? For example, this can be done with shell scripts.
OpenLMIS: the global initiative for powerful LMIS software