#support

Fixing Alertmanager Pending Mode in Signoz K8s Cluster

TLDR Elias had an issue with alertmanager pending mode. Prashant suggested increasing PVC size and using helm upgrade. After deleting the pvc and statefulset, and upgrading again, it worked.

Powered by Struct AI

1

12
8mo
Solved
Join the chat
Apr 17, 2023 (8 months ago)
Elias
Photo of md5-894e524356146cb346abbb1f595710f0
Elias
09:37 AM
Hey!
I'm currently trying Signoz for my k8s cluster.

I have the problem that the alertmanager is in pending mode.

What do I need to do to fix this?
Image 1 for Hey!
I'm currently trying Signoz for my k8s cluster.

I have the problem that the alertmanager is in pending mode.

What do I need to do to fix this?Image 2 for Hey!
I'm currently trying Signoz for my k8s cluster.

I have the problem that the alertmanager is in pending mode.

What do I need to do to fix this?

1

Srikanth
Photo of md5-ce04a9988e2fd758a659dc55be6f2543
Srikanth
09:49 AM
What does the describe pod show?
Elias
Photo of md5-894e524356146cb346abbb1f595710f0
Elias
09:51 AM
oh... Now there's an error message.

Warning FailedScheduling 2m8s default-scheduler running PreBind plugin "VolumeBinding": binding volumes: timed out waiting for the condition
09:53
Elias
09:53 AM
-------
Normal WaitForFirstConsumer 60m persistentvolume-controller
waiting for first consumer to be created before binding
Warning ProvisioningFailed 35m (x15 over 60m) cloud.ionos.com_csi-ionoscloud-547ff5c6cf-xf55x_ec448156-856a-40a1-b1d4-0d307d8bc24b failed to provision volume with StorageClass "ionos-enterprise-hdd": rpc error: code = OutOfRange desc = requested size 104857600 must be between 1073741824 and 4398046511104 bytes
Normal Provisioning 5m5s (x23 over 60m) cloud.ionos.com_csi-ionoscloud-547ff5c6cf-xf55x_ec448156-856a-40a1-b1d4-0d307d8bc24b External provisioner is provisioning volume for claim "platform/storage-signoz-monitoring-alertmanager-0"
Normal ExternalProvisioning 30s (x242 over 60m) persistentvolume-controller
waiting for a volume to be created, either by external provisioner "cloud.ionos.com" or manually created by system administrator
09:54
Elias
09:54 AM
Do i unterstand it right, that the pvc is to small? 😄
Prashant
Photo of md5-1899629483c7ab1dccfbee6cc2f637b9
Prashant
10:37 AM
Elias It looks like your storage class has set range limit for PVCs to be between 1GiB to 1TiB.

Can you increase PVC size of alertmanager and re-install using helm upgrade?

override-values.yaml
alertmanager:
  persistence:
    size: 1Gi
10:45
Prashant
10:45 AM
If you face any issues when using helm upgrade, remove the alertmanager statefulset and retry the command.
Elias
Photo of md5-894e524356146cb346abbb1f595710f0
Elias
12:23 PM
Thanks!
I deleted the statefulset and upgraded with
helm upgrade signoz-monitoring signoz/signoz -f override-values.yaml --namespace=platform

AlterManager is still pending since 8min. Same message in the pod logs. Pod description isn't showing an error:

Name:             signoz-monitoring-alertmanager-0
Namespace:        platform
Priority:         0
Service Account:  signoz-monitoring-alertmanager
Node:             <none>
Labels:           
                  
                  
                  controller-revision-hash=signoz-monitoring-alertmanager-6d448ccf6d
                  
Annotations:      checksum/config: 01ba4719c80b6fe911b091a7c05124b64eeece964e09c058ef8f9805daca546b
Status:           Pending
IP:
IPs:              <none>
Controlled By:    StatefulSet/signoz-monitoring-alertmanager
Init Containers:
  signoz-monitoring-alertmanager-init:
    Image:      
    Port:       <none>
    Host Port:  <none>
    Command:
      sh
      -c
      until wget --spider -q signoz-monitoring-query-service:8080/api/v1/version; do echo -e "waiting for query-service"; sleep 5; done; echo -e "query-service ready, starting alertmanager now";
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xfvkb (ro)
Containers:
  signoz-monitoring-alertmanager:
    Image:      
    Port:       9093/TCP
    Host Port:  0/TCP
    Args:
      --storage.path=/alertmanager
      --queryService.url=
    Requests:
      cpu:      100m
      memory:   100Mi
    Liveness:   http-get http://:http/ delay=0s timeout=1s period=10s #success=1 #failure=3
    Readiness:  http-get http://:http/ delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      POD_IP:   (v1:status.podIP)
    Mounts:
      /alertmanager from storage (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xfvkb (ro)
Volumes:
  storage:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  storage-signoz-monitoring-alertmanager-0
    ReadOnly:   false
  kube-api-access-xfvkb:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                  op=Exists for 300s
                              op=Exists for 300s
Events:                      <none>
12:26
Elias
12:26 PM
Hmm seems like the pvc is still pending.

C:\Users\elias\signoz>kubectl describe pvc --namespace=platform storage-signoz-monitoring-alertmanager-0
Name:          storage-signoz-monitoring-alertmanager-0
Namespace:     platform
StorageClass:  ionos-enterprise-hdd
Status:        Pending
Volume:
Labels:        
               
               
Annotations:   : 
               : prod-performance-o3niqbe464
               : 
Finalizers:    []
Capacity:
Access Modes:
VolumeMode:    Filesystem
Used By:       signoz-monitoring-alertmanager-0
Events:
  Type    Reason                Age                      From
                       Message
  ----    ------                ----                     ----
                       -------
  Normal  Provisioning          7m52s (x63 over 3h33m)   cloud.ionos.com_csi-ionoscloud-547ff5c6cf-xf55x_ec448156-856a-40a1-b1d4-0d307d8bc24b  External provisioner is provisioning volume for claim "platform/storage-signoz-monitoring-alertmanager-0"
  Normal  ExternalProvisioning  3m16s (x842 over 3h33m)  persistentvolume-controller
                       waiting for a volume to be created, either by external provisioner "" or manually created by system administrator
12:27
Elias
12:27 PM
The other pvcs are working:
C:\Users\elias\signoz>kubectl get pvc --namespace=platform
NAME                                                                       STATUS    VOLUME
        CAPACITY   ACCESS MODES   STORAGECLASS           AGE
data-signoz-monitoring-zookeeper-0                                         Bound     pvc-b675e319-b64e-4039-bea8-748869aed061   8Gi        RWO            ionos-enterprise-hdd   3h34m
data-volumeclaim-template-chi-signoz-monitoring-clickhouse-cluster-0-0-0   Bound     pvc-4bf90e0e-2ff5-4dd4-888c-473d87822e0f   20Gi       RWO            ionos-enterprise-hdd   3h34m
signoz-db-signoz-monitoring-query-service-0                                Bound     pvc-ca1049e9-de4d-4dbd-a9a3-677f4338e02d   1Gi        RWO            ionos-enterprise-hdd   3h34m
storage-signoz-monitoring-alertmanager-0                                   Pending
                                  ionos-enterprise-hdd   3h34m
12:32
Elias
12:32 PM
Deleted the pvc and the statefulset and upgraded again. Now it's working.

Thanks for your quick help!
Prashant
Photo of md5-1899629483c7ab1dccfbee6cc2f637b9
Prashant
01:07 PM
That's great to hear 👍

SigNoz Community

Built with ClickHouse as datastore, SigNoz is an open-source APM to help you find issues in your deployed applications & solve them quickly | Knowledge Base powered by Struct.AI

Indexed 1023 threads (61% resolved)

Join Our Community

Similar Threads

Issues with SigNoz Install through Helm Chart

Romain experienced a delay in SigNoz installation through Helm Chart, with pods in init state. Prashant identified the issue as insufficient resources in the K8s cluster and suggested specifying a storage class for PVCs, resolving the problem.

2

8
9mo
Solved

Troubleshooting Memory Space Issue in Kubernetes with Signoz

Abel had trouble running signoz on Kubernetes due to 'not enough space'. Pranay provided steps to increase PV. Eventually, Abel confirmed solution after changing PV size to '50Gi'.

4
yesterday
Solved

Issue with Helm Installation in GKE Autopilot Cluster

Kalman faced issues with helm installation with pods stuck in init state, and some crashing in a GKE autopilot cluster. Mayur provided suggestions to diagnose the issue, including checking IAM permissions and storage classes, and adjusting resource limits in the helm values. The thread is unresolved.

4

38
1mo

Issue with Pending States in AWS Cluster

Jatin reported that Kubernetes pods get stuck in pending state when nodes go down in their AWS cluster. Despite providing `kubectl describe` results and logs, Prashant couldn't specify the cause, citing deeper investigation into the cluster and k8s resources would be required.

11
3mo

Increasing Persistent Volume Size in Test Environment

surya tried to increase the disk size in their test environment and encountered an issue. Prashant provided guidance and the user successfully increased the Persistent Volume (PV) for ClickHouse to 50Gi.

5

43
yesterday
Solved