Inspectorio - Optimizing APIs to save cost

Background

DocuFlow is a document management platform built by Inspectorio. It is tailored for Supply Chain document management. To build this fast, we built it on top of another service, and this service charges by API calls, which means every API call we make to their server will be our cost.

Problems with the current implementation

Because that external service charges us based on API calls, it’s not a good way to scale bigger. When I joined the team, I could see that we got more and more users on the platform, which meant Inspectorio needed to pay a lot of money for that service. For example, only 1 client could easily consume 25-50% of the API allowance that we have (according to the contract).

Therefore, I initiated a project to find ways to reduce the API calls we make but keep the same user experience. In the long term, it would help us reduce the subscription fee as well as dependency on that service.

Ideation and Solution

After having meetings/discussions with the team, we found out that there are a few ways we can reduce API consumption. Therefore, I created a new Epic to handle this. In the epic, we had multiple spikes, stories, and tasks to deeply investigate and implement it.

Solution 1: Rebuild the Favourites section

First, we did a small spike to find places that were consuming a high number of API calls. We looked at multiple features and found the Favorite section as a potential candidate.

On the platform, we allow the user to mark a document to the favorite section. Whenever they open the homepage, the system will call multiple APIs to the external service to get data. Because it’s on the home page, it is used by almost all users. That small section on the homepage consumes ~7% of all API calls, which makes it the top 1 candidate.

So we decided to rebuild this section as follows:

With this implementation, we also need to migrate existing data to our database, update some endpoints, and upload request payload.

Solution 2: Flexible cron job running frequency

When I checked the inactive organizations, I realized that they were also using API calls, which is pretty weird. After checking with the team, I understood that those API calls are just updates from the stream, and we did it every 1 minute, even with no updates!

This cron job alone costs us ~30% of all API calls. Because we have many inactive organizations, I believe it would be a big win for us to fix this issue. Therefore, I booked meetings with the team to ideate on how we could use the API more effectively in that specific case.

It turned out not to be a complicated task though. So we agreed to build a flexible cron job running frequency as follows:

With this approach, we can still keep the existing experience for the active user, and make sure that no API calls are wasted if there are no users.

Solution 3: Remove logic to the pre-loaded nested folder in the folder tree

Our platform allows the user to navigate by folder tree, which consumes multiple API calls to go through each folder to see if it has nested document (to show the arrow indicator)

Realizing that this UI is using many API calls but doesn’t have much value for the users, I decided to retire it and put the arrow for all folders regardless of its inside content.

We only call API when the user clicks on the arrow to expand the folder. The loaded data will be cached locally so it will not call any API in the next few minutes.

Solution 4: Cache folder and file metadata

Besides documents, metadata is indispensable on our platform (metadata is data related to the document, i.e. document version, applicable market,…). Therefore, whenever the user opens a document, the system has to make many API calls to the external service to retrieve all data points.

As a result of the spike, there were created 4 tickets to reduce the API calls:

In general, the solution is as follows:

Because of the user behavior, we only applied the above mechanism for the users with Admin or User roles. For the users with a Visitor role (view only permission), we only invalidate the cache automatically every 20 minutes - This even reduces the API consumption more.

Outcomes

So were we able to reduce the API call consumption? Definitely yes. Before every release, we check the number of API calls for a list of specific tasks. After the release, we mimicked the same tasks and can see a significant drop in API usage.

Overall, after a month of releasing these enhancements, I can proudly share that we were able to move the API usage to below the limit number (I cannot share it in more detail because of security reasons).