Nginx error when reload the config file occasionally

Description

*nginx error happens occasionally*. A solution from ThoughtWorks Siglus Team
1. when a new API added, then deploy to the test environment, but the new API can not be accessed from a web browser.
2. When we developed in the localhost, we register or deregister local service to consul service using the consul/registration.js and debug. Sometimes, the consul-template process in Nginx container will be exited for unknown reasons. If this occurs, the Nginx container restart failed.

*The root reason*:
1. consul-template in Nginx container will receive the new data from consul service and use it to create the Nginx file. And then call "nginx -s reload".
if Nginx can not reload the config file, it will exit error. And the consul-template will exit because of the subprocess error. The later data can not be received.
`consul-template -log-level info -consul consul:8500 -template /etc/consul-template/openlmis.conf:/etc/nginx/conf.d/default.conf:nginx -s reload`

2.consul/registration.js is used to register service to consul service, both in a container environment or a local development environment.
using consul/registration.js to register is ok, consul-template will create the right nginx file.

The key code:

upstream requisition {
least_conn;
keepalive 128;
server your_local_ip:8080;

}

location ~ /requisition/docs/?$ {
proxy_pass http://requisition;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}

but using consul/registration.js to deregister, the Nginx config file may be wrong. The upstream section cannot be found.

location ~ /requisition/docs/?$ {
proxy_pass http://requisition;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}

when Nginx reload the file, Nginx must be resolved the domain name, like `proxy_pass http://requisition`. If resolved error, "nginx -s reload" process will be exited error.

3.consul/registration.js is used to deregister.
the key steps in function, the mode is 'deregister':

function registrationBase(args, mode) {
registerService(args.service, mode);

if (args.raml) {
registerRaml(args.service, args.raml, mode);
}

if (args.path) {
registerPath(args.service, args.path, mode);
}
}

When function registerService(args.service, 'deregister') be called, call the delete service API(/v1/agent/service/deregister/service.id) inner.But only related service (upstream section in Nginx config) will be deleted from the consul, the API info (location section) will not be deleted. The API info will be deleted when calling function registerRaml and registerPath.

4.nginx container restart error.
if the Nginx config file created by consul-template was wrong, restarting Nginx container, Nginx will load the wrong config file cause Nginx container can not start.

*Solution*:
1.When "nginx -s reload" subprocess exited error. the consul-template process can keep running.
2.Modify the steps when deregister service using consul/registration.js. Calling function registerRaml and registerPath, then calling function registerService when deregister.
3.when restart Nginx, delete the old Nginx file firstly and then start Nginx process.

Environment

None

Assignee

Unassigned

Reporter

Yunlei Cai

Labels

Time tracking

40h

Affects versions

Priority

Major
Configure