Posts

Showing posts from June, 2023

Horrors of caching

Caching in general is a good practice to improve the responsiveness of the system. But sometimes it can go horribly worng and cause cascading failures leading to bringing the entire system down. Below were such incidents that I encountered: Backend service caches M2M auth tokens which it gets by making a API calls to an Authservice. Auth service introduced a bug in which it returns empty token and 200 status code. The backend service checks the response code and cache the token for few hours and the rest is history :) Service caches DNS entry to make API call for the resource. DR hits and the cached DNS entry no longer works. In the absence of dynamic service discovery, needs entire cluster restart. Caching a /GET call and not respecting the expiration. Well this is just silly and should have been caught in the code review Happy caching!