Fixing Alertmanager Pending Mode in Signoz K8s Cluster
TLDR Elias had an issue with alertmanager pending mode. Prashant suggested increasing PVC size and using helm upgrade
. After deleting the pvc and statefulset, and upgrading again, it worked.
1
Apr 17, 2023 (8 months ago)
Elias
09:37 AMI'm currently trying Signoz for my k8s cluster.
I have the problem that the alertmanager is in pending mode.
What do I need to do to fix this?
1
Srikanth
09:49 AMElias
09:51 AMWarning FailedScheduling 2m8s default-scheduler running PreBind plugin "VolumeBinding": binding volumes: timed out waiting for the condition
Elias
09:53 AMNormal WaitForFirstConsumer 60m persistentvolume-controller
waiting for first consumer to be created before binding
Warning ProvisioningFailed 35m (x15 over 60m) cloud.ionos.com_csi-ionoscloud-547ff5c6cf-xf55x_ec448156-856a-40a1-b1d4-0d307d8bc24b failed to provision volume with StorageClass "ionos-enterprise-hdd": rpc error: code = OutOfRange desc = requested size 104857600 must be between 1073741824 and 4398046511104 bytes
Normal Provisioning 5m5s (x23 over 60m) cloud.ionos.com_csi-ionoscloud-547ff5c6cf-xf55x_ec448156-856a-40a1-b1d4-0d307d8bc24b External provisioner is provisioning volume for claim "platform/storage-signoz-monitoring-alertmanager-0"
Normal ExternalProvisioning 30s (x242 over 60m) persistentvolume-controller
waiting for a volume to be created, either by external provisioner "cloud.ionos.com" or manually created by system administrator
Elias
09:54 AMPrashant
10:37 AMCan you increase PVC size of alertmanager and re-install using
helm upgrade
?override-values.yaml
alertmanager:
persistence:
size: 1Gi
Prashant
10:45 AMhelm upgrade
, remove the alertmanager statefulset and retry the command.Elias
12:23 PMI deleted the statefulset and upgraded with
helm upgrade signoz-monitoring signoz/signoz -f override-values.yaml --namespace=platform
AlterManager is still pending since 8min. Same message in the pod logs. Pod description isn't showing an error:
Name: signoz-monitoring-alertmanager-0
Namespace: platform
Priority: 0
Service Account: signoz-monitoring-alertmanager
Node: <none>
Labels:
controller-revision-hash=signoz-monitoring-alertmanager-6d448ccf6d
Annotations: checksum/config: 01ba4719c80b6fe911b091a7c05124b64eeece964e09c058ef8f9805daca546b
Status: Pending
IP:
IPs: <none>
Controlled By: StatefulSet/signoz-monitoring-alertmanager
Init Containers:
signoz-monitoring-alertmanager-init:
Image:
Port: <none>
Host Port: <none>
Command:
sh
-c
until wget --spider -q signoz-monitoring-query-service:8080/api/v1/version; do echo -e "waiting for query-service"; sleep 5; done; echo -e "query-service ready, starting alertmanager now";
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xfvkb (ro)
Containers:
signoz-monitoring-alertmanager:
Image:
Port: 9093/TCP
Host Port: 0/TCP
Args:
--storage.path=/alertmanager
--queryService.url=
Requests:
cpu: 100m
memory: 100Mi
Liveness: http-get http://:http/ delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:http/ delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
POD_IP: (v1:status.podIP)
Mounts:
/alertmanager from storage (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xfvkb (ro)
Volumes:
storage:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: storage-signoz-monitoring-alertmanager-0
ReadOnly: false
kube-api-access-xfvkb:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: op=Exists for 300s
op=Exists for 300s
Events: <none>
Elias
12:26 PMC:\Users\elias\signoz>kubectl describe pvc --namespace=platform storage-signoz-monitoring-alertmanager-0
Name: storage-signoz-monitoring-alertmanager-0
Namespace: platform
StorageClass: ionos-enterprise-hdd
Status: Pending
Volume:
Labels:
Annotations: :
: prod-performance-o3niqbe464
:
Finalizers: []
Capacity:
Access Modes:
VolumeMode: Filesystem
Used By: signoz-monitoring-alertmanager-0
Events:
Type Reason Age From
Message
---- ------ ---- ----
-------
Normal Provisioning 7m52s (x63 over 3h33m) cloud.ionos.com_csi-ionoscloud-547ff5c6cf-xf55x_ec448156-856a-40a1-b1d4-0d307d8bc24b External provisioner is provisioning volume for claim "platform/storage-signoz-monitoring-alertmanager-0"
Normal ExternalProvisioning 3m16s (x842 over 3h33m) persistentvolume-controller
waiting for a volume to be created, either by external provisioner "" or manually created by system administrator
Elias
12:27 PMC:\Users\elias\signoz>kubectl get pvc --namespace=platform
NAME STATUS VOLUME
CAPACITY ACCESS MODES STORAGECLASS AGE
data-signoz-monitoring-zookeeper-0 Bound pvc-b675e319-b64e-4039-bea8-748869aed061 8Gi RWO ionos-enterprise-hdd 3h34m
data-volumeclaim-template-chi-signoz-monitoring-clickhouse-cluster-0-0-0 Bound pvc-4bf90e0e-2ff5-4dd4-888c-473d87822e0f 20Gi RWO ionos-enterprise-hdd 3h34m
signoz-db-signoz-monitoring-query-service-0 Bound pvc-ca1049e9-de4d-4dbd-a9a3-677f4338e02d 1Gi RWO ionos-enterprise-hdd 3h34m
storage-signoz-monitoring-alertmanager-0 Pending
ionos-enterprise-hdd 3h34m
Elias
12:32 PMThanks for your quick help!
Prashant
01:07 PMSigNoz Community
Indexed 1023 threads (61% resolved)
Similar Threads
Issues with SigNoz Install through Helm Chart
Romain experienced a delay in SigNoz installation through Helm Chart, with pods in init state. Prashant identified the issue as insufficient resources in the K8s cluster and suggested specifying a storage class for PVCs, resolving the problem.
Troubleshooting Memory Space Issue in Kubernetes with Signoz
Abel had trouble running signoz on Kubernetes due to 'not enough space'. Pranay provided steps to increase PV. Eventually, Abel confirmed solution after changing PV size to '50Gi'.
Issue with Helm Installation in GKE Autopilot Cluster
Kalman faced issues with helm installation with pods stuck in init state, and some crashing in a GKE autopilot cluster. Mayur provided suggestions to diagnose the issue, including checking IAM permissions and storage classes, and adjusting resource limits in the helm values. The thread is unresolved.
Issue with Pending States in AWS Cluster
Jatin reported that Kubernetes pods get stuck in pending state when nodes go down in their AWS cluster. Despite providing `kubectl describe` results and logs, Prashant couldn't specify the cause, citing deeper investigation into the cluster and k8s resources would be required.
Increasing Persistent Volume Size in Test Environment
surya tried to increase the disk size in their test environment and encountered an issue. Prashant provided guidance and the user successfully increased the Persistent Volume (PV) for ClickHouse to 50Gi.