SigNoz Deployment Problem After Upgrade
TLDR Al had an error in their SigNoz deployment after an upgrade. Prashant suggested various solutions. Eventually, Al successfully upgraded to version signoz-0.19.1
.
1
1
Jul 11, 2023 (5 months ago)
Al
09:11 PMREVISION UPDATED STATUS CHART APP VERSION DESCRIPTION
11 Wed Jul 5 18:52:30 2023 superseded signoz-0.17.0 0.21.0 Upgrade complete
12 Tue Jul 11 15:28:08 2023 deployed signoz-0.18.1 0.22.0 Upgrade complete
Suddenly receiving
<Error> TCPHandler: Code: 170. DB::Exception: Requested cluster 'cluster' not found.
Looking at clickhouse:
:) select cluster from system.clusters
SELECT cluster
FROM system.clusters
Query id: d822fc0f-67b6-4157-97a4-d7dde022c6cd
┌─cluster─────────────────────────────────────────┐
│ test_cluster_one_shard_three_replicas_localhost │
│ test_cluster_one_shard_three_replicas_localhost │
│ test_cluster_one_shard_three_replicas_localhost │
│ test_cluster_two_shards │
│ test_cluster_two_shards │
│ test_cluster_two_shards_internal_replication │
│ test_cluster_two_shards_internal_replication │
│ test_cluster_two_shards_localhost │
│ test_cluster_two_shards_localhost │
│ test_shard_localhost │
│ test_shard_localhost_secure │
│ test_unavailable_shard │
│ test_unavailable_shard │
└─────────────────────────────────────────────────┘
The PVC is mounted:
Filesystem Size Used Available Use% Mounted on
/dev/sdd 503.8G 96.2G 407.6G 19% /var/lib/clickhouse
Attaching file with log snippets.
Wouldn't mind some help, unsure how to recover this. Thanks!!
Jul 12, 2023 (5 months ago)
Prashant
05:40 AMPrashant
05:41 AMcluster
Prashant
05:41 AMPrashant
05:41 AMAl
01:48 PMIt was working with version 0.21
┌─cluster─────────────────────────────────────────┐
│ all-replicated │
│ all-sharded │
│ cluster │
│ test_cluster_one_shard_three_replicas_localhost │
│ test_cluster_one_shard_three_replicas_localhost │
│ test_cluster_one_shard_three_replicas_localhost │
│ test_cluster_two_shards │
│ test_cluster_two_shards │
│ test_cluster_two_shards_internal_replication │
│ test_cluster_two_shards_internal_replication │
│ test_cluster_two_shards_localhost │
│ test_cluster_two_shards_localhost │
│ test_shard_localhost │
│ test_shard_localhost_secure │
│ test_unavailable_shard │
│ test_unavailable_shard │
└─────────────────────────────────────────────────┘
but after upgrading to v0.22 only the
test_*
clusters are available.Jul 13, 2023 (4 months ago)
Al
03:10 PMhelm rollback signoz
to the previous revision (v0.21.0), restored clickhouse to working condition. Not sure what happened during the upgrade.Perhaps I can attempt upgrading to v0.22.0 again and see if it was transient issue.
Prashant
03:27 PMSELECT cluster
FROM system.clusters
Query id: 70c515db-ac46-4738-8518-3bf84757e645
┌─cluster─────────────────────────────────────────┐
│ all-replicated │
│ all-sharded │
│ cluster │
│ test_cluster_one_shard_three_replicas_localhost │
│ test_cluster_one_shard_three_replicas_localhost │
│ test_cluster_one_shard_three_replicas_localhost │
│ test_cluster_two_shards │
│ test_cluster_two_shards │
│ test_cluster_two_shards_internal_replication │
│ test_cluster_two_shards_internal_replication │
│ test_cluster_two_shards_localhost │
│ test_cluster_two_shards_localhost │
│ test_shard_localhost │
│ test_shard_localhost_secure │
│ test_unavailable_shard │
│ test_unavailable_shard │
└─────────────────────────────────────────────────┘
Prashant
03:27 PMPrashant
03:28 PMhelm repo update
Al
03:50 PMsignoz/signoz 0.18.1 0.22.0 SigNoz Observability Platform Helm Chart
Resulting in:
1. You have just deployed SigNoz cluster:
- frontend version: '0.22.0'
- query-service version: '0.22.0'
- alertmanager version: '0.23.1'
- otel-collector version: '0.79.2'
- otel-collector-metrics version: '0.79.2'
Al
06:36 PMJul 14, 2023 (4 months ago)
Prashant
08:22 AMJul 18, 2023 (4 months ago)
Al
08:11 PMREVISION UPDATED STATUS CHART APP VERSION DESCRIPTION
14 Tue Jul 18 19:10:43 2023 superseded signoz-0.18.2 0.22.0 Upgrade complete
The following log entry seems relevant:
2023.07.18 19:19:59.669624 [ 240 ] {} <Error> DDLWorker: Cannot parse DDL task query-0000025438: Cannot parse query or obtain cluster info. Will try to send error status: 371
Code: 371. DB::Exception: DDL task query-0000025438 contains current host chi-signoz-clickhouse-cluster-0-0:9000 in cluster cluster, but there is no such cluster here. (INCONSISTENT_CLUSTER_DEFINITION) (version 22.8.8.3 (official build))
2023.07.18 19:19:59.681107 [ 244 ] {} <Information> DDLWorker: Task query-0000025438 is outdated, deleting it
2023.07.18 19:19:59.684970 [ 240 ] {} <Error> DDLWorker: Cannot parse DDL task query-0000025439: Cannot parse query or obtain cluster info. Will try to send error status: 371
I have rolled back to
signoz-0.17.0 0.21.0
and the signoz deployment is functional again.Prashant
09:12 PMsignoz-0.19.1
.Also, you would be required to run the migration steps: https://signoz.io/docs/operate/migration/upgrade-0.23/
1
Jul 19, 2023 (4 months ago)
Al
06:25 PMsignoz-0.19.1
completed successfully!1
SigNoz Community
Indexed 1023 threads (61% resolved)
Similar Threads
Error Upgrading Chart Version in Kubernetes Environment
Manikandan encountered an error when upgrading to a different chart version. Srikanth helps correct the mistake by suggesting to pull the new releases and upgrade.
Issues with SigNoz Setup and Data Persistence in AKS
Vaibhavi experienced issues setting up SigNoz in AKS, and faced data persistence issues after installation. Srikanth provided guidance on ClickHouse version compatibility and resource requirements, helping Vaibhavi troubleshoot and resolve the issue.
SigNoz crashing in k8s due to ClickHouse OOM
Travis reported SigNoz crashing in k8s due to ClickHouse OOM. The team suggested increasing resources for ClickHouse, and other troubleshooting steps, but the issue remains unresolved.
External ClickHouse Server Installation Issue
Syed faced a connection error with an external ClickHouse server. Srikanth and Ankit suggested changing the cluster name to 'cluster'. The issue was resolved after updating the name.
Kubernetes Signoz-otel-collector Issue and Clickhouse Cold Storage
Pruthvi faced an issue with Kubernetes signoz-otel-collector. nitya-signoz suggested deleting the `signoz_logs` database and restarting collectors. Pruthvi then asked about Clickhouse cold storage on S3 and observed a spike in cost, which Ankit agreed to investigate further.