SigNoz Installation Issue in Self-hosted K8 Cluster Resolved

TLDR Nond faced an issue with SigNoz installation on a self-hosted K8 cluster. The problem was caused by a gatekeeper system injecting istio and opa configs. Excluding SigNoz and reinstalling solved the issue.

Photo of Nond
Nond
Thu, 22 Jun 2023 22:02:40 UTC

Hey team, running into some issues with a new install of SigNoz (helm chart v0.16.2) on a self-hosted K8 cluster (running on EC2 in AWS and managed by Rancher). I'm able to get the following pods up in the `platform` name space: ```kubectl get pods NAME READY STATUS RESTARTS AGE chi-signoz-clickhouse-cluster-0-0-0 2/2 Running 0 4h17m signoz-alertmanager-0 0/2 Init:0/2 0 4h18m signoz-clickhouse-operator-7b55d5666-2jh65 3/3 Running 0 4h18m signoz-frontend-8644f95b4c-hbs2g 0/2 Init:0/2 0 4h18m signoz-k8s-infra-otel-agent-87h28 2/2 Running 0 4h18m signoz-k8s-infra-otel-agent-fdkpd 2/2 Running 0 4h18m signoz-k8s-infra-otel-agent-mpn8s 2/2 Running 0 4h18m signoz-k8s-infra-otel-agent-rfc4x 2/2 Running 0 4h18m signoz-k8s-infra-otel-deployment-5f4b455cd-qjp9m 2/2 Running 2 (4h17m ago) 4h18m signoz-otel-collector-66fb65544c-5cl6j 0/1 Init:0/1 0 24m signoz-otel-collector-bdc5b7c66-j84md 0/2 Init:0/2 0 4h18m signoz-otel-collector-metrics-5657769c49-b7w5l 0/1 Init:0/1 0 14m signoz-query-service-0 0/2 Init:0/2 0 4h18m signoz-zookeeper-0 2/2 Running 0 4h18m``` Peaking into the init containers of `signoz-otel-collector-bdc5b7c66-j84md`, I see the following error: ```kubectl logs signoz-otel-collector-metrics-5657769c49-vbl65 -c signoz-otel-collector-metrics-init wget: error getting response: Connection reset by peer waiting for clickhouseDB``` Same for `signoz-otel-collector-metrics-5657769c49-b7w5l` : ```kubectl logs signoz-otel-collector-metrics-5657769c49-b7w5l -c signoz-otel-collector-metrics-init wget: error getting response: Connection reset by peer waiting for clickhouseDB``` And for the frontend pod `signoz-frontend-8644f95b4c-hbs2g`: ```kubectl logs signoz-frontend-8644f95b4c-hbs2g -c signoz-frontend-init wget: can't connect to remote host (<IP_ADDRESS_HERE>): Connection refused waiting for query-service``` Query service pod `signoz-query-service-0` has the same error as otel collector: ```kubectl logs signoz-query-service-0 -c signoz-query-service-init wget: error getting response: Connection reset by peer waiting for clickhouseDB``` Any insight is appreciated, thank you!

Photo of Pranay
Pranay
Fri, 23 Jun 2023 02:09:32 UTC

hey Nond It seems to me there is issue with connection to ClickHouse . Are you able to connect to ClickHouse DB independently? I am not familiar with Rancher, but do they do anything restricting access to DBs

Photo of Srikanth
Srikanth
Mon, 26 Jun 2023 20:21:41 UTC

It seems like ClickHouse was not accepting connections. Can share the logs of the ClickHouse?

Photo of Nond
Nond
Mon, 26 Jun 2023 22:16:11 UTC

Hey team, looks like we got this resolved and SigNoz is now installed. In case anyone else runs into this issue, we had a gatekeeper system that injects istio and opa configs into all namespaces. I had to add SigNoz to an exclusion list for both istio and opa injection and do a clean reinstall of SigNoz for it to work.

Photo of Srikanth
Srikanth
Mon, 26 Jun 2023 22:21:18 UTC

Glad you got it to work.