This page is intended to hold information about Jenkins administration. Each time a bigger problem is encountered, learnings should be listed here to avoid the same problems in the future.
- Do not attempt to create EC2 instances manually in AWS Console - EC2 Jenkins plugin creates them on its own
- Avoid using "provision via AWS" button to spawn new EC2 instances - in the past this caused issues starting them up automatically (preferred way is to just set the instance capacity and they will spawn when needed)
- Whenever updating an init script that stands up EC2 instance, always update it for all existing instances and also for the template (Manage Jenkins → Configure system → Amazon EC2 → AMIs)
Jenkins disk space
- Jenkins running out of space? Try clearing the workspace directory if there are no builds running on master. Sometimes the jobs cannot clear it after builds and this takes up space.
- Check what occupies most of the space with "
ncdu /" (might take several minutes)
- If logs are taking too much space, check if log rotation is performed correctly
- logs should be compressed
- rotation frequency or file size should be adjusted to expected log volume - consult logrotate documentation for configuration adjustments - https://linux.die.net/man/8/logrotate
- on instances with centos global config can be found in
/etc/logrotate.conf individual configs are here
- logrotate is started by cron once a day. If you need immediate results, run it manually
- Check space taken by docker with
docker system df
" depending on the output, try to clean images, stopped containers or local volumes
- When cleaning docker make sure no job is currently running in Jenkins
- Jenkins Jobs directory might grow big. It should not be cleared, as it contains builds history
- Take a backup of the Jenkins directory - /var/lib/jenkins or take a snaphost of the whole volume (named Jenkins-snapshot) via the AWS console (Elastic Block Store → Snapshots→ Create snapshot)
- Stop Jenkins (via web console or via terminal - service jenkins stop) - make sure no bulds are running first!
- Get the latest Jenkins war (https://jenkins.io/download/) (LTS recommended) and replace it in /usr/lib/jenkins/jenkins.war
- Sometimes some additional steps may be required - the upgrade guide will list them: https://jenkins.io/doc/upgrade-guide/
- Start jenkins again - service jenkins start
- Make sure that after a few minutes Jenkins is running at http://build.openlmis.org/ and that everything looks stable
- In case of any problems you can consult the logs (Scalyr or /var/log/jenkins/jenkins.log - as superuser)