TLDR Romil experienced issues after a fresh install of ClickHouse. Prashant advised restarting certain pods, but that wasn't fully successful. Srikanth acknowledged the issue and pointed to a related open problem on GitHub.
This happens when migration files are not executed properly. OtelCollector and OtelCollector Metrics are responsible for running those to create the tables.
Init containers make sure that those pods run only after clickhouse cluster is healthy. But it seems like something is not right with the cluster health or order.
Could you try restarting SigNoz OtelCollector and OtelCollector Metrics pods?
i did ... it ended up restarting over a 100x by now
i just fixed it 2s ago
i found the last migration, in `schema_migrations` and deleted the row then set the other ones to dirty = 0
there are 2 problems that i see: • the chart doesn't respect `clickhouse.cluster` at all • the pod does not query to see how many shards or replicas are present so it marks 1 as good and other as dirty
Yes, `clickhouse.cluster` is static value `cluster` at the moment due to the limitations of the go-migrate and it being static in migration files
> the pod does not query to see how many shards or replicas are present so it marks 1 as good and other as dirty Could you please elaborate on this?
if i have 2 replicas (default behavior) then the other replica still has migration marked as dirty
I see. Dirty migration issue. TMK, this is usually resolved in later pod restarts. Srikanth could you please look into this?
2 more issues Prashant ```signoz-otel-collector-7b86585cd5-nrnnd signoz-otel-collector 2023-08-01T16:57:09.381Z info clickhousetracesexporter/clickhouse_factory.go:127 Clickhouse Migrate finished {"kind": "exporter", "data_type": "traces", "name": "clickhousetraces", "error": "migration failed in line 0: \n\n\n\nCREATE MATERIALIZED VIEW IF NOT EXISTS signoz_traces.dependency_graph_minutes_service_calls_mv ON CLUSTER cluster\nTO signoz_traces.dependency_graph_minutes AS\nSELECT\n A.serviceName as src,\n B.serviceName as dest,\n quantilesState(0.5, 0.75, 0.9, 0.95, 0.99)(toFloat64(B.durationNano)) as duration_quantiles_state,\n countIf(B.statusCode=2) as error_count,\n count(*) as total_count,\n toStartOfMinute(B.timestamp) as timestamp\nFROM signoz_traces.signoz_index_v2 AS A, signoz_traces.signoz_index_v2 AS B\nWHERE (A.serviceName != B.serviceName) AND (A.spanID = B.parentSpanID)\nGROUP BY timestamp, src, dest; (details: code: 47, message: Unknown identifier: B.timestamp; there are columns: timestamp, serviceName, B.serviceName, quantilesState(0.5, 0.75, 0.9, 0.95, 0.99)(toFloat64(B.durationNano)), countIf(equals(B.statusCode, 2)), count(): While processing serviceName AS src, B.serviceName AS dest, quantilesState(0.5, 0.75, 0.9, 0.95, 0.99)(toFloat64(B.durationNano)) AS duration_quantiles_state, countIf(B.statusCode = 2) AS error_count, count() AS total_count, toStartOfMinute(B.timestamp) AS timestamp)"}``` and ```signoz-otel-collector-7b86585cd5-nrnnd signoz-otel-collector 2023-08-01T17:00:01.906Z error clickhousetracesexporter/writer.go:128 Could not write a batch of spans {"kind": "exporter", "data_type": "traces", "name": "clickhousetraces", "error": "code: 60, message: Table signoz_traces.distributed_span_attributes doesn't exist"}```
cc <@4K143e> <@4K15aa>
it seems like the problem is related to the dirty flag that i pointed out earlier
constantly changing it to 0 and restarting the pod resolves
the issue is that if you don't know how many total migrations there are, you don't know when to stop
apart from the issue that it shouldn't be required :confused:
Please subscribe to this issue if you want to keep an eye on when it gets fixed
Romil
Fri, 28 Jul 2023 16:02:18 UTCjust did a fresh install with latest chart on a new clickhouse ... ```2023/07/28 16:01:02 application run finished with error: failed to build pipelines: failed to create "clickhousetraces" exporter for data type "traces": code: 60, message: Table signoz_traces.distributed_signoz_index_v2 doesn't exist``` anyone get this problem?