Issues with Signoz on k3s Cluster Using Helm
TLDR Nilanjan encountered issues with Signoz on a k3s cluster using Helm, with some pods not running. Srikanth and Prashant suggested using kubectl describe
to diagnose the issue, but the problem remains unresolved.
2
Mar 12, 2023 (9 months ago)
Nilanjan
06:18 PMI tried installing Signoz on k3s cluster using helm. However all the pods are not coming up in a running state. Below is the output of ,kubectl -n platform get pods command :
NAME READY STATUS RESTARTS AGE
my-release-k8s-infra-otel-agent-q449t 1/1 Running 3 (8m39s ago) 23h
my-release-clickhouse-operator-5457b49dfc-2wpkp 2/2 Running 5 (8m39s ago) 23h
my-release-signoz-frontend-86699c44c5-64kdg 0/1 Init:0/1 3 23h
my-release-signoz-query-service-0 0/1 Init:0/1 3 23h
my-release-signoz-otel-collector-fd6b4899-zbcsv 0/1 Init:0/1 3 23h
my-release-signoz-otel-collector-metrics-7594f556c9-7vj9r 0/1 Init:0/1 3 23h
my-release-k8s-infra-otel-deployment-6669899f75-xdlfq 1/1 Running 4 (8m39s ago) 23h
my-release-zookeeper-0 1/1 Running 0 23h
my-release-signoz-alertmanager-0 0/1 Pending 0 23h
chi-my-release-clickhouse-cluster-0-0-0 0/1 Pending 0 23h
Can you share some clue on what is that I am missing here ? Thanks !
1
Mar 13, 2023 (9 months ago)
Srikanth
03:00 AMkubectl describe
to get the detail. This is generic “my pod is stuck” issue.Prashant
09:17 AMTry
kubectl describe
on the chi pods and perhaps also related resources like PVCs of clickhouse.Complete command:
kubectl describe -n platform pod/chi-my-release-clickhouse-cluster-0-0-0
Nilanjan
06:05 PMkubectl describe -n platform pod/chi-my-release-clickhouse-cluster-0-0-0
Name: chi-my-release-clickhouse-cluster-0-0-0
Namespace: platform
Priority: 0
Service Account: my-release-clickhouse
Node: nroy-virtual-machine/192.168.163.128
Start Time: Mon, 13 Mar 2023 22:49:47 +0530
Labels: app.kubernetes.io/component=clickhouse
app.kubernetes.io/instance=my-release
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=clickhouse
app.kubernetes.io/version=22.8.8
clickhouse.altinity.com/app=chop
clickhouse.altinity.com/chi=my-release-clickhouse
clickhouse.altinity.com/cluster=cluster
clickhouse.altinity.com/namespace=platform
clickhouse.altinity.com/ready=yes
clickhouse.altinity.com/replica=0
clickhouse.altinity.com/settings-version=a0b5649f5ea9121accf6ecc528db9f761f7f1768
clickhouse.altinity.com/shard=0
clickhouse.altinity.com/zookeeper-version=98145c0055cd1bf605e2ba1ed27b9c43f4de32c7
controller-revision-hash=chi-my-release-clickhouse-cluster-0-0-558ffc5f76
helm.sh/chart=clickhouse-23.8.8
statefulset.kubernetes.io/pod-name=chi-my-release-clickhouse-cluster-0-0-0
Annotations: meta.helm.sh/release-name: my-release
meta.helm.sh/release-namespace: platform
signoz.io/path: /metrics
signoz.io/port: 9363
signoz.io/scrape: true
Status: Pending
IP: 10.42.0.2
IPs:
IP: 10.42.0.2
Controlled By: StatefulSet/chi-my-release-clickhouse-cluster-0-0
Init Containers:
my-release-clickhouse-init:
Container ID: <docker://d996ccf8646d640434122161142e69dab8081227344442eda2a6ec35b71fc69>1
Image: docker.io/busybox:1.35
Image ID: <docker-pullable://busybox@sha256>:f75aadb4c50f4fe0e790e5e081de3df4153a5adbe77a176205763d9808e3c12a
Port: <none>
Host Port: <none>
Command:
sh
-c
set -x
wget -O /tmp/histogramQuantile https://github.com/SigNoz/signoz/raw/develop/deploy/docker/clickhouse-setup/user_scripts/histogramQuantile
mv /tmp/histogramQuantile /var/lib/clickhouse/user_scripts/histogramQuantile
chmod +x /var/lib/clickhouse/user_scripts/histogramQuantile
State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 13 Mar 2023 22:50:52 +0530
Finished: Mon, 13 Mar 2023 22:51:06 +0530
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/lib/clickhouse/user_scripts from shared-binary-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-g4s9x (ro)
Containers:
clickhouse:
Container ID:
Image: docker.io/clickhouse/clickhouse-server:22.8.8-alpine
Image ID:
Ports: 8123/TCP, 9000/TCP, 9009/TCP, 9000/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP
Command:
/bin/bash
-c
/usr/bin/clickhouse-server --config-file=/etc/clickhouse-server/config.xml
State: Waiting
Reason: ImagePullBackOff
Ready: False
Restart Count: 0
Requests:
cpu: 100m
memory: 200Mi
Liveness: http-get http://:http/ping delay=60s timeout=1s period=3s #success=1 #failure=10
Readiness: http-get http://:http/ping delay=10s timeout=1s period=3s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/clickhouse-server/conf.d/ from chi-my-release-clickhouse-deploy-confd-cluster-0-0 (rw)
/etc/clickhouse-server/config.d/ from chi-my-release-clickhouse-common-configd (rw)
/etc/clickhouse-server/functions from custom-functions-volume (rw)
/etc/clickhouse-server/users.d/ from chi-my-release-clickhouse-common-usersd (rw)
/var/lib/clickhouse from data-volumeclaim-template (rw)
/var/lib/clickhouse/user_scripts from shared-binary-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-g4s9x (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
data-volumeclaim-template:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: data-volumeclaim-template-chi-my-release-clickhouse-cluster-0-0-0
ReadOnly: false
shared-binary-volume:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
custom-functions-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: my-release-clickhouse-custom-functions
Optional: false
chi-my-release-clickhouse-common-configd:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: chi-my-release-clickhouse-common-configd
Optional: false
chi-my-release-clickhouse-common-usersd:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: chi-my-release-clickhouse-common-usersd
Optional: false
chi-my-release-clickhouse-deploy-confd-cluster-0-0:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: chi-my-release-clickhouse-deploy-confd-cluster-0-0
Optional: false
kube-api-access-g4s9x:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 44m default-scheduler Successfully assigned platform/chi-my-release-clickhouse-cluster-0-0-0 to nroy-virtual-machine
Normal Pulled 43m kubelet Container image "docker.io/busybox:1.35" already present on machine
Normal Created 43m kubelet Created container my-release-clickhouse-init
Normal Started 43m kubelet Started container my-release-clickhouse-init
Normal Pulling 35m (x4 over 42m) kubelet Pulling image "docker.io/clickhouse/clickhouse-server:22.8.8-alpine"
Warning Failed 33m (x4 over 40m) kubelet Error: ErrImagePull
Warning Failed 32m (x6 over 40m) kubelet Error: ImagePullBackOff
Warning Failed 17m (x7 over 40m) kubelet Failed to pull image "docker.io/clickhouse/clickhouse-server:22.8.8-alpine": rpc error: code = Unknown desc = context deadline exceeded
Normal BackOff 3m35s (x84 over 40m) kubelet Back-off pulling image "docker.io/clickhouse/clickhouse-server:22.8.8-alpine"
1
SigNoz Community
Indexed 1023 threads (61% resolved)
Similar Threads
Issues with SigNoz Install through Helm Chart
Romain experienced a delay in SigNoz installation through Helm Chart, with pods in init state. Prashant identified the issue as insufficient resources in the K8s cluster and suggested specifying a storage class for PVCs, resolving the problem.
Issue with Helm Installation in GKE Autopilot Cluster
Kalman faced issues with helm installation with pods stuck in init state, and some crashing in a GKE autopilot cluster. Mayur provided suggestions to diagnose the issue, including checking IAM permissions and storage classes, and adjusting resource limits in the helm values. The thread is unresolved.
Issue with Pending States in AWS Cluster
Jatin reported that Kubernetes pods get stuck in pending state when nodes go down in their AWS cluster. Despite providing `kubectl describe` results and logs, Prashant couldn't specify the cause, citing deeper investigation into the cluster and k8s resources would be required.