Service restart may fail, rendering Ref Distro unusable

Description

Problem

When running the Ref Distro, restarting a container / service can result in the Ref Distro needing to be stopped and restarted as a whole.

Reproduce

  1. In Ref Distro

  2. docker-compose up -d

  3. docker-compose restart referencedata

  4. Wait 1-2 mins.

  5. Goto the browser and try to login, there is a good chance it'll come back with a backend error.

Cause

It appears that whether this works or not is the chance that docker re-assigns the same internal IP to the restarted service. Either way the restarted service doesn't clean itself up in Consul, and that results in a health check for that instance of the service that's still attempting to contact the old IP. That "zombie" service is then being used in Consul Template, even though the service is critical, which results in a good chance that Nginx will round-robin the request to the defunct service instance.

Fixes

  1. Consul-Template should not use a Critical service as an Nginx upstream.

  2. A Service's health check should have a cleanup timeout in Consul to remove "zombie" service instances. Use deregister_critical_service_after: https://www.consul.io/docs/agent/checks.html.

  3. When a service exits cleanly, it should make an attempt to de-register itself from Consul.

Acceptance Criteria

  1. docker-compose stop referencedata results in no registered service in Consul;

  2. docker-compose kill referencedata results in the service being removed from Nginx's configuration as a side-effect of Consul-Template removing the critical service. AND after the timeout from deregister_critical_service_after has passed the service should be removed from Consul (i.e. consul has cleaned it).

Environment

None

Assignee

Unassigned

Reporter

Josh Zamor

Components

Affects versions

Priority

Minor
Configure