Rajesh Gajengi

← Back to projects

Monitoring & Alerting System

Prometheus + Grafana + Alertmanager for production workloads.

Prometheus icon PrometheusGrafana icon GrafanaAlertmanager icon AlertmanagerLinux icon Linux

Overview

Implemented observability stack to monitor services, track SLOs and send alerts. Work included: instrumentation, dashboards, alert routing and runbooks.

Features

  • Custom dashboards for latency/error rates
  • Alert routing with escalation policies

Challenges

  • Managing cardinality in metrics
  • Alert fatigue tuning

Related projects