Skip to content

AlertmanagerClusterCrashlooping

Meaning

Half or more of the Alertmanager instances within the same cluster are crashlooping.

Impact

Alerts could be notified multiple time unless pods are crashing to fast and no alerts can be sent.

Diagnosis

kubectl get pod -l app=alertmanager

NAMESPACE   NAME                    READY   STATUS              RESTARTS    AGE
default     alertmanager-main-0     1/2     CrashLoopBackOff    37107 2d
default     alertmanager-main-1     2/2     Running             0 43d
default     alertmanager-main-2     2/2     Running             0 43d 

Find the root cause by looking to events for a given pod/deployement

kubectl get events --field-selector involvedObject.name=alertmanager-main-0

Mitigation

Make sure pods have enough resources (CPU, MEM) to work correctly.