2017-06-16 Meeting notes - Performance (mid to long-term)

Date

7am PDT

Webex

Attendees

Goals

  • Define what we mean when we say caching and snapshotting.
  • Get on the same page on caching strategies (HTTP, UI, DB)
  • Discuss Reference Data immutability and snapshotting in terms of performance
  • Identify with priority what our next 2-3 performance strategies are

Required Pre-reading

Discussion items

TimeItemWhoNotes
5Agenda reviewJosh


5mDefine caching and snapshottingJosh 
20m

Caching

  • HTTP and REST
  • DB
  • UI
Josh
10mReference Data immutability and snapshotting in performance

15mIdentify priority

Notes


Caching & Snapshotting definitions


Simple definitions

  • In caching we temporarily store information more local (closer proximity) information that we need, with the goal of being able to retrieve it quicker.
    • temporary storage, we need to clear the cache and re-populate it when the "source of truth" updates
  • In snapshotting, we take a picture of something as it was, and keep that picture, so that regardless of what happens to the subject of the photo, we may always look at our photo to see what that subject looked like at that one point in time.
    • long term storage, if the "source of truth" updates, we don't want our snapshot to update


Example

  • Browsers cache to reduce network load.  An HTTP 304 response is an indication from the server, that what the browser has is not out of date - the browser's cache is good
  • Requisition Service currently "snapshots" the list of Orderable IDs that a Requisition has is a Snapshot of the FTAP at the time the Requisition is created (by product of this is that for subsequent retrievals of FTAP, we save a network call to Reference Data service to get the IDs).  However the snapshot doesn't extend to the FTAP or Orderable definition (max months of stock, netcontent, etc)


Caching

HTTP caching opportunities:

  • UI - browser is already in use
  • Inbetween services (support HEAD operation in each method call from server - so identify if cached item of consumer is out of date, consumer service would also have to support it - hopefully part of network library?)
  • Saves network payload between server's Services (e.g. Requisition to Reference Data).  HEAD operation is lightweight.


DB caching:

  • JPA / Spring Boot / Hibernate all offer caching annotations
  • implementation of cache is usually an in-memory provider
  • DataNucleus used in Motech
    • With JDO
    • Recollection that the cache often turned into a looping over items in the cache, that a lot of time was spent figuring out how to optimize the cache, that it was a problem linked to deeply nested objects, entities pulled out of cache are detached (no database connection).
    • Josh speculated that since most cache's are key-value stores, that the deeply nested structure needed for MoTech was a hard mis-match:  for looping over collections likely a result of first pulling the "key"ed object, then having a large object heirarchy as the value - including sub-collections.
  • Deeply nested objects / navigation are problematic - hard to predict performance
  • key is important - it's THE search criteria for a cache
  • Redis, memcached, many others are key value stores.
  • cache invalidation gets tricky - esp with deeply nested objects
  • positive:  troubles in cache can incentivize looking at slow queries!
  • Josh, Pawel and Brandon all shared stories of cache difficulties. 
  • Summary:  tuning a RDBMS and the ORM is usually preferred for these sorts of performance optimizations before introducing caching.


Summary

  • don't need db level caching yet.  We had a slow query, we optimized it which made a large difference.  We still have opportunity to tune the database without the extra burden of introducing DB caching.
  • biggest performance issue is between services - from Malawi investigation, looking at non-full product fetching (solution is to fetch in bulk), rights checking (idea proposed in dev forum that uses DB and Permission Strings), etc.
  • optimizing RESTful endpoints (ref data esp?) to speed them up helped tremendously - similar to DB optimization.
  • not feeling DB, maybe feeling service.  First should do endpoint optimization.
  • snapshot functionality in Requisition isn't complete, ref data immutability (orderables) could be priority.  Need 
  • Payloads were big for the UI - need a call dedicated to UI caching / payload / sending big stuff for users with slow internet


Priority

  • short term, continue to optimize:
    • DB calls (and eager/lazy loading)
    • RESTful endpoint representations
  • mid-term:
    • need to fix Requisition snapshotting.  It could be fixed by introducing a fuller long-term storage of Orderables (and related entities) in Requisition service.  OR it could be fixed by making Reference Data's Orderables (and associated entities) immutable.
    • We need a call on UI "caching" of big reference data lists:  Orderables, maybe facilities, Programs, etc.  Benefit is it could reduce the size of each working "document" (Requsition, Order, Stock Card, etc).
      • Josh thinks if we even want to entertain the idea, we MUST have Reference Data immutability.  Otherwise you'd need a cache of a snapshot of an Orderable at the time of a Requisition.
  • long term / do only when short and mid-term are done, and only if needed:
    1. HTTP caching between services (focus on Reference Data)
    2. database caching (after we've squeezed the DB to the max)


Action items

OpenLMIS: the global initiative for powerful LMIS software