Developer Onboarding Guide

Developer Onboarding Guide

Who this is for / How to use this

This guide is for implementation developers who will be deploying, maintaining, and/or improving Casper (e.g. TZ eLMIS developers). This document has five parts:

  1. Overview

  2. How to provision and deploy

  3. How to operate and monitor

  4. How the Casper Pipeline works and NiFi intro

  5. Transformations in NiFi

  6. Strategies for Building Nifi Transformations

Everyone should read the overview section. Then, you should use the following sections based on what your goals are.

Overview of the Casper Demo

The current Casper demo has code in the following git repositories:

Deployment is done from the master branch of each repository. Three of the repositories are hosted on Gitlab, instead of Github, primarily so that we can use Gitlab instead of Jenkins to run the deployment jobs.

In addition, eLMIS is deployed from the elmis-tzm/open-lmis repository, and OpenLMIS v3 is deployed via the openlmis/openlmis-ref-distro repository.

 

The five pieces of the Casper architecture are deployed on three AWS EC2 instances, as shown in the following diagram:

casper-elmis.a.openlmis.org casper.a.openlmis.org casper-superset.a.openlmis.org ↓ ↓ ↓ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ | eLMIS v2 | | |---→| v3 Reporting | |-----------------| | OpenLMIS v3 | |-----------------| | Casper pipeline |---→| | | NiFi Registry | └─────────────────┘ └─────────────────┘ └─────────────────┘ ↑ ↑ casper-elmis.a.openlmis.org:8080/nifi casper-nifi-registry.a.openlmis.org

The eLMIS instance is deployed on the same machine as the Casper pipeline, which collects data from it. The NiFi registry is deployed on the same machine as the OpenLMIS v3 reporting stack, since the NiFi registry is only used by the reporting stack (the Casper pipeline is built on NiFi, but it doesn't use the registry).

Key Technologies Used

  • Docker and docker-compose - These are perhaps the most important tool for deploying Casper. I highly recommend reading both of the linked guides. Each of the five components used by the Casper demo is defined by a single docker-compose.yml file. Each docker-compose file in turn lists between 1 and 17 docker containers used by that component.

  • NiFi - This is the primary piece of the Casper pipeline, and it is also used by the v3 reporting stack. The NiFi transforms in the Casper pipeline are what turn data in v2's format into data for v3's format. NiFi (and the rest of the pipeline) does "stream processing": the NiFi process is always waiting and listening for new input data (from v2), which is transformed (into v3 data) and passed forward as soon as it is received, without waiting for more data. I recommend starting to read the linked NiFi guide, though it may be more useful to looking things up as you need them. More details are below in the "How the Casper Pipeline Works" section

  • Terraform (and Ansible) - Terraform is used for creating resources on AWS. Those resources include servers (EC2 instances), domain names, databases, firewalls, and more. These resources are all defined by our terraform code, an approach known as "infrastructure as code". Though terraform is used to set up servers for the Casper demo, it terraform would not work with NDC servers.

Additional Technologies Used

  • OpenLMIS v3

    • TODO: what should we link here ?

  • OpenLMIS v3 reporting stack

  • Kafka - Along with NiFi, Kafka is used to drive the Casper pipeline. Each change to the v2 database becomes a message on a Kafka topic, and NiFi outputs transformed data as messages to other Kafka topics

  • Debezium - Streams data from eLMIS into Kafka

  • Superset - The data visualization web app that is the user interface for the OpenLMIS v3 reporting stack

  • AWS - The Casper demo is entirely hosted on Amazon's cloud services, managed by Terraform

  • Grafana - We run this application for monitoring the Casper pipeline to view statistics about Kafka and NiFi

How to provision and deploy

The diagram in the "Overview of the Casper Demo" section shows how the the five components of the demo are deployed to three servers. The goal of this section is to provide instructions for setting up all of the parts in that diagram.

Provisioning

This part of the guide assumes that you are using AWS to host everything and have and AWS account with IAM credentials (an access key ID and a secret access key). Provisioning AWS resources for the Casper demo is done through the, using our Terraform configuration.

Installation should be done on a machine that will control the targets. This is most likely your development computer.

Setup

  1. Install the Terraform command line v0.11 on your development computer

  2. Install ansible on your development computer

    1. Note that installing on OSX has been reported to be tricky. You should use virtualenv otherwise errors seem to be likely. This guide is useful for OSX users. Use Python 2.x, not 3.x. When using a virtualenv, do not use sudo pip install, instead drop the sudo which allows pip to install ansible in the virtualenv.

    2. mkvirtualenv olmis-deployment if you need a new virtual environment.

  3. Install pip, a package manager for Python

  4. Install the requirements for our Ansible scripts:

    $ pip install -r ../ansible/requirements/ansible.pip

  5. Clone the openlmis-deployment git repository

    $ git clone https://github.com/OpenLMIS/openlmis-deployment.git

Running Terraform

You will have to repeat these steps for each of the three machines used for the deployment. (It would be possible to put all the resources in a single terraform environment, but this way they can be managed separately.)

  1. Set up your AWS access keys

    $ export TF_VAR_aws_access_key_id=$AWS_ACCESS_KEY_ID $ export TF_VAR_aws_secret_access_key=$AWS_SECRET_ACCESS_KEY

  2. Go to one of the three subdirectories of the openlmis-deployment/provision/terraform/casper/ directory (e.g. v2/ for the server with eLMIS and the pipeline).

    $ cd openlmis-deployment/provision/terraform/casper/v2/

  3. Prepare terraform (this creates the .terraform directory):

    $ terraform init

  4. Start up the resources defined in the current folder. Or, if they are already running (e.g. if you are using the VillageReach AWS account), apply changes from any edited files. This command will ask for confirmation before actually making changes:

    $ terraform apply

  5. You should be able to check that the newly created resources are working by pinging them, even though no applications are deployed yet e.g.:

    $ ping casper-elmis.a.openlmis.org

OpenLMIS: the global initiative for powerful LMIS software