2019-08-07 TC Meeting notes
Date
7am PST / 4pm CEST
Meeting Link
Attendees
Discussion items
Time | Item | Who | Notes |
---|---|---|---|
5m | Agenda and action item review | Josh Zamor | |
10m | Performance tests | Josh Zamor | |
30m | Reporting stack: improvements to handle bigger amounts of data | Daniel Serkowski (Unlicensed) & Mateusz Wedeł (Unlicensed) |
|
Notes
Reporting Stack Improvements for Data
- Generated test data ~5k req, 180 req line items
- Performance problem:
- Estimate in production: 100k req per year
- Availability has been an issue (m4.large instance w/ 50GB EBS)
- System resources maxed out, e.g. disk space increased to 100GB
- Increased memory to 16GB RAM
- W/ this config
- Some reports > 1min (timeout exceeded)
- Primary resource hog: postgres (storage and CPU)
- The data:
- The username in a certain view is used to filter rows
A few observations:
- The data source (the table) is 70 columns wide, for a query that needs ~4 of those. Query time can be optimized by creating a different view with only the columns needed.
- The group by's are the biggest problem. IOW the duplication of rows is the biggest problem for this query. A data store without these duplicated rows is going to be a lot faster. Or a table structure where the rows can be filtered out quickly using an index (perhaps on username).
- Answering the why behind the query is important. Why group by 4 columns without an aggregation? What does the final dashboard/report need to do for the person viewing it?
Action Items
- Daniel Serkowski (Unlicensed): Focusing on query (as well as table structure), if you can share about the use case (the why) in reporting channel more help can be provided (please reach out to Josh Zamor if we want a design review before the next TC).
OpenLMIS: the global initiative for powerful LMIS software