Posts

Showing posts from 2021

Instrumentation, Observability and Monitoring (IOM)

Image
Terminology Observability : Observability is the property of a system to answer either trivial or complex questions about itself. How easily you can find answers to your questions is a measure of how good the system’s observability is. Monitoring : observing the quality of system performance over a time duration Instrumentation : refers to metrics exposed by the internals of the system using code (a type of white-box monitoring) Why IOM?  Analyzing long-term trends like User growth over time User time in the system System Performance over time Comparing over time or experiment groups How much faster are my queries after new version of a library Cache hit/miss ratio after adding more nodes Is my site slower than it was last week? Alerting - Something is broken, and somebody needs to fix it right now!  Building dashboards - Answer basic questions about your service - Latency, Traffic & Errors. Conducting ad hoc retrospective analysis - Our latency just shot up; what else happened ar