2019-10-16 TC Meeting notes
Note: Oct 1st meeting was cancelled.
Date
7am PST / 4pm CEST
Meeting Link
Attendees
Discussion items
Time | Item | Who | Notes |
---|---|---|---|
5m | Agenda and action item review | Josh Zamor | |
45m+ | How to address performance regressions going forward in releasing 3.7 | Wesley Brown | |
(next time) | Check-in on release of 3.7 | Josh Zamor |
|
(next time) | Technical Committee moving forward? | Josh Zamor |
|
(next time or separate - checking with Ben Leibert (Unlicensed)) 10m | Bumping the version of Orderables | Daniel Serkowski (Unlicensed) |
|
Notes
How to address performance regressions
- In updating Orderables, performance has regressed.
- Biggest area: facility approved products - 10-15x slower than in 3.6.
- We've improved it to 3-4x slower than in 3.6
- In automated testing we have 9000 products, of which this is where our slowdown numbers come from.
- A cause: change in queries
- We tried to use native queries, perhaps a step in the wrong direction as some information was missing.
- Introduced select N+1 problems (5-6 queries per orderable)
- In Orderables we have had a Select N+1 issue
- In 3.7 we've made slight improvements: using lazy loading using a batch size set (1000)
- 30% improvement (as measured manually with Malawi dataset)
- Select N+1
- Tried native queries, HQL, etc
- Tried using various guides, recommending HQL, fetch joins, not one thing we've found to fix it.
- Are we monitoring how many queries are being used? Yes, starting to - guide in current ticket (OLMIS-6566)
- We've been working to address the above
- What are some strategies we can use to prevent this going forward?
- Making performance tests not flaky - it had found this, however it wasn't noticed until we dug into flaky performance tests.
- Look into the previous strategy of quarantining of performance tests (could be useful in other testing) (Josh Zamor find prev TC notes and link here)
- People should feel free to quarantine a test.
- We should be cautious about the team communication in that judgement call - and how long it takes to fix and get new information.
- So running in a tight-loop continuously is a possible approach for this.
- Build Status Review - daily meeting and the team reviews the status of each testing build (Jenkins dashboard). This helps with the team communication + judgement call. It might be supported with more automation.
- Is moving away from performance criteria (as in p(90) < 500ms), and instead graphing the result to spot outliers as opposed to trends.
- Enthusiasm behind doing this
- Philip started down this path - I need find this again
- there's is non-trivial work to make a good graph by test case
- Stabilizing test runs
- Find a way to have these numbers be more stable
- Increase samples
- Focus on stabilizing Reference Data first as everything relies on this
- Look into the previous strategy of quarantining of performance tests (could be useful in other testing) (Josh Zamor find prev TC notes and link here)
- Making performance tests not flaky - it had found this, however it wasn't noticed until we dug into flaky performance tests.
- How should this be prioritized for 3.8? Fewer features in favor of fixing this?
- We know which areas are frequently used - so focus on the areas that are used all the time
- Background:
- Limited budget
- Already cut into timeline for 3.8
- Less new things, improve what we already have (e.g. improving tech debt)
- New things are features / user-facing things
- PC only proposing one new thing: configuration. PC feedback wasn't a killer feature for 3.8 - so we're saying that this might be the time to focus solely on tech debt → performance and testing
- SD team is ready to commit 3 development sprints to testing and back-end improvements (to release 3.8 this year).
Notes for next time:
Check-in on release of 3.7
Would we revise the priorities we set last time?
- There is interest in (via slack):
- Upgrade Spring Boot
- Upgrade RAML
- Release Faster epic
Action Items
OpenLMIS: the global initiative for powerful LMIS software