Do we need more space / will we be storing things we don't need to?
Not really, might need more detailed analysis. Overall we know that the things we need to store are growing and taking up significant space and of course will continue to grow.
What about the extra data (e.g. in the workspace). Jobs that don't cleanup correctly, etc...
This is certainly true, however we've been fixing this for the last two years:
Jobs that don't cleanup after themselves correctly
Docker bugs where space isn't released properly
A key aspect to this ticket is that it addresses the space problem by changing how easy it is for us to scale the storage needs of docker. As the storage space for wanted artifacts increases, we'll be able to increase the available block storage more quickly than when that block storage device is the root device.
Database migrations
Problem and options outlined above.
A discussed a number of issues, a small sample is:
The change from serial version numbers to timestamps was an intentional feature. In previous OpenLMIS versions (< 3), we used serial versions and we had significant issues with naming conflicts. It was a daily occurence for developers on the same branch that was more a nusance, however for longer running branches and contributions, it was a significant issue. Our move away from branching, away from serial versions is intentional to avoid these things.
The problem can only occur when the production flag is set. i.e. for systems where we intend to deploy releases. Not for developer/testing instances.
The problem is an edge case. We know of two instances where it's occurred for our own test systems (not yet in production). These instances have been when we're deploying pre-releases and releases, which is the most likely time for this condition to be encountered.
We are concerned that this could occur to a production instance. It's a small chance, but it's a risk worth mitigating.
A few various options were discussed, which need more in-depth discussion on the forum:
A note on migrations to remind reviewers to check for out of order migrations during a release cycle (likely for pre-releases - those where we'd expect deployments of code with the production flag set)
A script that helps notify reviewers when out of order migrations are created
Using a script or perhaps flyway setting to more boldly announce during a deployment that an out of order migration was found. Today it ignores it quietly, this wouldn't solve the problem but it would highlight that the edge case occurred so that issues could be mitigated sooner.
Should look at flyway's validate and repair commands
Next steps:
Add a comment/note so that reviewers need to review migration order
Start a discourse topic to explore the options with more depth
Action Items
Sebastian Brudziński: add a comment/note so that reviewers review order of migrations during release cycles
Sebastian Brudziński: Start a forum post on db migration options to get us into more depth/clarity
Josh Zamor to look at AWS permissions and why Jenkins was effected