2019-10-16 TC Meeting notes

Note: Oct 1st meeting was cancelled.

Date

16 Oct 2019

7am PST / 4pm CEST

Meeting Link

Attendees

Discussion items

Time	Item	Who	Notes
5m	Agenda and action item review	Josh Zamor
45m+	How to address performance regressions going forward in releasing 3.7	Wesley Brown
(next time)	Check-in on release of 3.7	Josh Zamor	Are we facing new challenges in releasing? Would we revise the priorities we set last time?
(next time)	Technical Committee moving forward?	Josh Zamor	We haven't had many topics the past few weeks, and have cancelled a couple calls How are we being effective? Should we revise the format?
(next time or separate - checking with Ben Leibert (Unlicensed)) 10m	Bumping the version of Orderables	Daniel Serkowski (Unlicensed)	Approach to bumping version of Orderables when using ? When we're updating Orderables but we're not committing any changes Not changing the version if nothing has changed in the UPDATE/PUT/POST action

Notes

How to address performance regressions

In updating Orderables, performance has regressed.
- Biggest area: facility approved products - 10-15x slower than in 3.6.
- We've improved it to 3-4x slower than in 3.6
- In automated testing we have 9000 products, of which this is where our slowdown numbers come from.
- A cause: change in queries
  - We tried to use native queries, perhaps a step in the wrong direction as some information was missing.
  - Introduced select N+1 problems (5-6 queries per orderable)
- In Orderables we have had a Select N+1 issue
  - In 3.7 we've made slight improvements: using lazy loading using a batch size set (1000)
  - 30% improvement (as measured manually with Malawi dataset)
- Select N+1
  - Tried native queries, HQL, etc
  - Tried using various guides, recommending HQL, fetch joins, not one thing we've found to fix it.
  - Are we monitoring how many queries are being used? Yes, starting to - guide in current ticket (OLMIS-6566)
We've been working to address the above
What are some strategies we can use to prevent this going forward?
- Making performance tests not flaky - it had found this, however it wasn't noticed until we dug into flaky performance tests.
  - Look into the previous strategy of quarantining of performance tests (could be useful in other testing) (Josh Zamor find prev TC notes and link here)
    - People should feel free to quarantine a test.
    - We should be cautious about the team communication in that judgement call - and how long it takes to fix and get new information.
      - So running in a tight-loop continuously is a possible approach for this.
    - Build Status Review - daily meeting and the team reviews the status of each testing build (Jenkins dashboard). This helps with the team communication + judgement call. It might be supported with more automation.
  - Is moving away from performance criteria (as in p(90) < 500ms), and instead graphing the result to spot outliers as opposed to trends.
    - Enthusiasm behind doing this
    - Philip started down this path - I need find this again
      - there's is non-trivial work to make a good graph by test case
  - Stabilizing test runs
    - Find a way to have these numbers be more stable
    - Increase samples
    - Focus on stabilizing Reference Data first as everything relies on this
How should this be prioritized for 3.8? Fewer features in favor of fixing this?
- We know which areas are frequently used - so focus on the areas that are used all the time
- Background:
  - Limited budget
  - Already cut into timeline for 3.8
  - Less new things, improve what we already have (e.g. improving tech debt)
    - New things are features / user-facing things
  - PC only proposing one new thing: configuration. PC feedback wasn't a killer feature for 3.8 - so we're saying that this might be the time to focus solely on tech debt → performance and testing
- SD team is ready to commit 3 development sprints to testing and back-end improvements (to release 3.8 this year).

Notes for next time:

Check-in on release of 3.7

Would we revise the priorities we set last time?

There is interest in (via slack):
- Upgrade Spring Boot
- Upgrade RAML
- Release Faster epic

OpenLMIS

2019-10-16 TC Meeting notes

Date

Meeting Link

Attendees

Discussion items

Notes

How to address performance regressions

Notes for next time:

Action Items