Requisition search endpoint performance
Overview
This document has been created to show low performance of requisition search endpoint. The document contains:
- an example execution of the endpoint on UI,
- how long each method in the requisition service was executed,
- where the endpoint takes too much time,
- proposal solutions.
Test environment
The test was executed on the following machine:
Both OpenLMIS instance and PostgreSQL database are on the same machine. In the database there is one requisition in the post authorized status that contain more than 1,000 non full supply products. A user has only rights to create and authorize requisitions. In this test the user will try to find appropriate period for home facility. The following java options has been set for the requisition service to connect with the service by JMX:
-Dlogging.config=/config/log/logback.xml -Dflyway.locations=classpath:db/migration -Dcom.sun.management.jmxremote.port=9010 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.rmi.port=9010 -Djava.rmi.server.hostname=IP_ADDRESS'
Endpoint
The endpoint is executed each time when a user with correct rights tries to initialize/process/authorize requisitions for the given facility and program. The user select Create/Authorize option in the navigation bar (in the Requisition section). On this view he or she has ability to select if want to create/process/authorize requisition for home facility or supervised facility. For both cases the user has ability to select a program. After that the user clicks the search button. Several endpoints are executed after that but for this document the most important is the search endpoint.
The above image presents execution time of the search endpoint. It took "only" 19.6 seconds because all components are on the same machine. In the database there was one post authorized requisition for this pair of facility and program. The full JSON object (with formatting) has more than 40,000 lines. In the real world it could take more than one minute.
The request response is only used to select appropriate period for next/future requisition on UI. On the image we see that there is no period for Jan2012 because there is post authorized requisition for this period and it has been hidden/removed. We could imagine a situation that in near future the user will create requisitions for next periods. Each time when the user will try to find appropriate period the endpoint will return larger and larger response and the execution time will be greater.
Methods
The following image presents a list of methods that has been used while the search endpoint has been executed. The visualvm tool has been used to capture those methods. The tool was configured to capture only methods from org.openlmis package and sub-packages in the requisition service.
In the above example the endpoint was executed in 19.6s. We can see that 19.3s takes to find single resource by UUID (the findOne method). With high probability the problem is in DTO creation. When a requisition DTO object is created all ID fields (like facilityI, processingPeriodId, etc) are converted into real objects from the reference-data service. In the example the requisition has more than 1,000 non full supply products so there is more than 1,000 calls to reference-data service only to take details that would probably not used on UI. Probably because it depends if a requisition is in the post authorize status or not. In the post authorized status it will not be available for the user on UI so those requisitions have too much data. Other think is that there is possibility that the list will not be used on product grid view.
Code
The search endpoint can be split into three steps.
Page<Requisition> requisitionsPage = requisitionService.searchRequisitions(facility, program, initiatedDateFrom, initiatedDateTo, processingPeriod, supervisoryNode, requisitionStatuses, emergency, pageable); List<Requisition> resultList = requisitionsPage.getContent(); List<Requisition> filteredList = resultList.stream().filter(req -> { try { permissionService.canViewRequisition(req.getId()); } catch (PermissionMessageException ex) { return false; } return true; }).collect(Collectors.toList()); List<RequisitionDto> dtoList = requisitionDtoBuilder.build(filteredList); return Pagination.getPage(dtoList, pageable, dtoList.size());
The first step (lines 360-362) is to take data from the database. This step looks quite simple because we create appropriate SQL query and execute it but the service do the pagination part in memory instead of on the database layer and when a single requisition is retrieved from database all related data (like requisition line items, template, etc) are also retrieved. Also all those data has to be mapped to correct classes in the Java code.
The second step (lines 363-372) is to check if a user has right to see the given requisition. This step is executed on service layer mainly because the requisition service has to ask reference-data service to check if a user has ability to view the given requisition.
The last step (lines 374-375) is to create appropriate DTO objects based on database objects and send response to the client. Like was written above if a requisition contains more external data, the longer it takes to convert business object into DTO object.
Scalyr
Additional I checked the system usage when the requisition search endpoint is executed. This time I used our training server where there are more than 90,000 requisitions in RELEASED status. After 15 minutes a user gets 504 error message from backend. It seems like there is no problem with CPU, disk or memory usage but more with the service-to-service communication because of huge network usage (Network bandwidth graph).
Solutions
- Stop using search endpoint on the Create/Authorize view Currently two endpoints are used to create appropriate period list on UI. It would be good to move this on the server side and create a single endpoint (or modify existing one) that would return a list of available periods for the given facility and program,
- Split Requisition DTO object into two objects where one contains only base information and second that contain all necessary data for UI,
Modify domain object that they will not retrieve related data automatically for example a requisition should not retrieve related line items,
Move pagination to database layer.
OpenLMIS: the global initiative for powerful LMIS software