Problems Encountered with ClickHouse After Fresh Install

TLDR Romil experienced issues after a fresh install of ClickHouse. Prashant advised restarting certain pods, but that wasn't fully successful. Srikanth acknowledged the issue and pointed to a related open problem on GitHub.

Photo of Romil
Romil
Fri, 28 Jul 2023 16:02:18 UTC

just did a fresh install with latest chart on a new clickhouse ... ```2023/07/28 16:01:02 application run finished with error: failed to build pipelines: failed to create "clickhousetraces" exporter for data type "traces": code: 60, message: Table signoz_traces.distributed_signoz_index_v2 doesn't exist``` anyone get this problem?

Photo of Prashant
Prashant
Fri, 28 Jul 2023 16:54:56 UTC

This happens when migration files are not executed properly. OtelCollector and OtelCollector Metrics are responsible for running those to create the tables.

Photo of Prashant
Prashant
Fri, 28 Jul 2023 16:55:54 UTC

Init containers make sure that those pods run only after clickhouse cluster is healthy. But it seems like something is not right with the cluster health or order.

Photo of Prashant
Prashant
Fri, 28 Jul 2023 16:56:30 UTC

Could you try restarting SigNoz OtelCollector and OtelCollector Metrics pods?

Photo of Romil
Romil
Fri, 28 Jul 2023 16:56:59 UTC

i did ... it ended up restarting over a 100x by now

Photo of Romil
Romil
Fri, 28 Jul 2023 16:57:05 UTC

i just fixed it 2s ago

Photo of Romil
Romil
Fri, 28 Jul 2023 16:57:25 UTC

i found the last migration, in `schema_migrations` and deleted the row then set the other ones to dirty = 0

Photo of Romil
Romil
Fri, 28 Jul 2023 16:58:44 UTC

there are 2 problems that i see: • the chart doesn't respect `clickhouse.cluster` at all • the pod does not query to see how many shards or replicas are present so it marks 1 as good and other as dirty

Photo of Prashant
Prashant
Fri, 28 Jul 2023 17:06:49 UTC

Yes, `clickhouse.cluster` is static value `cluster` at the moment due to the limitations of the go-migrate and it being static in migration files

Photo of Prashant
Prashant
Fri, 28 Jul 2023 17:08:03 UTC

> the pod does not query to see how many shards or replicas are present so it marks 1 as good and other as dirty Could you please elaborate on this?

Photo of Romil
Romil
Fri, 28 Jul 2023 17:09:24 UTC

if i have 2 replicas (default behavior) then the other replica still has migration marked as dirty

Photo of Prashant
Prashant
Sat, 29 Jul 2023 07:16:03 UTC

I see. Dirty migration issue. TMK, this is usually resolved in later pod restarts. Srikanth could you please look into this?

Photo of Romil
Romil
Tue, 01 Aug 2023 17:02:06 UTC

2 more issues Prashant ```signoz-otel-collector-7b86585cd5-nrnnd signoz-otel-collector 2023-08-01T16:57:09.381Z info clickhousetracesexporter/clickhouse_factory.go:127 Clickhouse Migrate finished {"kind": "exporter", "data_type": "traces", "name": "clickhousetraces", "error": "migration failed in line 0: \n\n\n\nCREATE MATERIALIZED VIEW IF NOT EXISTS signoz_traces.dependency_graph_minutes_service_calls_mv ON CLUSTER cluster\nTO signoz_traces.dependency_graph_minutes AS\nSELECT\n A.serviceName as src,\n B.serviceName as dest,\n quantilesState(0.5, 0.75, 0.9, 0.95, 0.99)(toFloat64(B.durationNano)) as duration_quantiles_state,\n countIf(B.statusCode=2) as error_count,\n count(*) as total_count,\n toStartOfMinute(B.timestamp) as timestamp\nFROM signoz_traces.signoz_index_v2 AS A, signoz_traces.signoz_index_v2 AS B\nWHERE (A.serviceName != B.serviceName) AND (A.spanID = B.parentSpanID)\nGROUP BY timestamp, src, dest; (details: code: 47, message: Unknown identifier: B.timestamp; there are columns: timestamp, serviceName, B.serviceName, quantilesState(0.5, 0.75, 0.9, 0.95, 0.99)(toFloat64(B.durationNano)), countIf(equals(B.statusCode, 2)), count(): While processing serviceName AS src, B.serviceName AS dest, quantilesState(0.5, 0.75, 0.9, 0.95, 0.99)(toFloat64(B.durationNano)) AS duration_quantiles_state, countIf(B.statusCode = 2) AS error_count, count() AS total_count, toStartOfMinute(B.timestamp) AS timestamp)"}``` and ```signoz-otel-collector-7b86585cd5-nrnnd signoz-otel-collector 2023-08-01T17:00:01.906Z error clickhousetracesexporter/writer.go:128 Could not write a batch of spans {"kind": "exporter", "data_type": "traces", "name": "clickhousetraces", "error": "code: 60, message: Table signoz_traces.distributed_span_attributes doesn't exist"}```

Photo of Prashant
Prashant
Tue, 01 Aug 2023 17:15:10 UTC

cc <@4K143e> <@4K15aa>

Photo of Romil
Romil
Tue, 01 Aug 2023 17:15:34 UTC

it seems like the problem is related to the dirty flag that i pointed out earlier

Photo of Romil
Romil
Tue, 01 Aug 2023 17:15:44 UTC

constantly changing it to 0 and restarting the pod resolves

Photo of Romil
Romil
Tue, 01 Aug 2023 17:16:00 UTC

the issue is that if you don't know how many total migrations there are, you don't know when to stop

Photo of Romil
Romil
Tue, 01 Aug 2023 17:16:13 UTC

apart from the issue that it shouldn't be required :confused:

Photo of Srikanth
Srikanth
Wed, 02 Aug 2023 02:29:39 UTC

Please subscribe to this issue if you want to keep an eye on when it gets fixed . The root cause is a lack of atomicity.