SigNoz Deployment Issue in Docker Swarm with NFS Share

TLDR sati encounters issues with setting up SigNoz in docker-swarm with an NFS share resulting in otel-collector crash and inaccessible GUI. Nick experienced a similar problem and shared a workaround.

Powered by Struct AI
Mar 10, 2023 (6 months ago)
Photo of md5-18fd3472f281af81e2fac5d44861028a
02:49 PM

I have a problem with setting up SigNoz in docker-swarm with more than 1 node. (OS is everywhere Ubuntu 22.04.1)

My setup: 3 manager nodes and 1 worker (I started with 1 manager + 1 worker, but the effect was the same so I tried with more managers) plus 1 NFS host that exports the same resource to every swarm node; every swarm host has the nfs share mounted as /data folder (so the signoz repo dir is /data/signoz/[...] visible on every node). Everything is in the same network, every hosts can ping/connect to other and firewall is disabled on every host.
Steps I do:
1. installed docker, docker-compose, initiated docker swarm on manager1 (without any additional flags), joined other swarm nodes as manager2, manager3 and worker1
2. on manager1: git clone signoz repo into mounted /data dir
3. apply changes in docker-compose.yaml (disabled hotrod app, added syslog port into otel-collector service) and otel-collector-config.yaml (disabled collecting docker container logs and enabled syslog), hotrod/docker containers/syslog according to signoz documentation
4. docker stack deploy -c /data/signoz/deploy/docker-swarm/clickhouse-setup/docker-compose.yaml signoz

In this setup:
- otel-collector keeps crashing in a loop with error:
application run finished with error: cannot build pipelines: failed to create "clickhouselogsexporter" exporter, in pipeline "logs": cannot configure clickhouse logs exporter: clickhouse Migrate failed to run, error: migration failed in line 0: RENAME TABLE IF EXISTS signoz_logs.logs_atrribute_keys TO signoz_logs.logs_attribute_keys on CLUSTER cluster; (details: code: 57, message: There was an error on [clickhouse:9000]: Code: 57. DB::Exception: Table signoz_logs.logs_attribute_keys already exists. (TABLE_ALREADY_EXISTS) (version (official build)))
- if somehow otel-collector didn't crash and managed to start properly I cannot access the GUI - it shows a blank page with Loading icon and then after a timeout there's "404 not found"

But if I disable the NFS share on every node aside from manager1 (so the node where I deploy SigNoz) it seems everything works correctly. But then if I disable network on manger1 the services arr in Shutdown status until I restore the network connection to manager1, but this often ends in a state where otel-collector starts to crash in a loop with the same error as above

I don't have any idea what I'm doing wrong here and I'm probably missing something very simple/easy in this setup :(
Mar 19, 2023 (6 months ago)
Photo of md5-70a65c96b293c1aafa37eaaefeaefa64
09:21 PM
I have a similar problem myself, although I just re-installed signoz from the helm chart.

chi-mon-clickhouse-cluster-1-0-0.chi-mon-clickhouse-cluster-1-0.platform.svc.cluster.local :) show tables from signoz_logs;

SHOW TABLES FROM signoz_logs

Query id: 40102f01-0443-4845-ac09-ec1f405b442a

âânameâââââââââââââââââââââââââââââ atrribute_keys_float64_final_mv â
â atrribute_keys_int64_final_mv   â
â atrribute_keys_string_final_mv  â
â attribute_keys_float64_final_mv â
â attribute_keys_int64_final_mv   â
â attribute_keys_string_final_mv  â
â distributed_logs                â
â distributed_logs_atrribute_keys â
â distributed_logs_attribute_keys â
â distributed_logs_resource_keys  â
â distributed_usage               â
â logs                            â
â logs_atrribute_keys             â
â logs_attribute_keys             â
â logs_resource_keys              â
â resource_keys_string_final_mv   â
â schema_migrations               â
â usage                           â
ââââââââââââââââââââââââââââââââââââ Progress: 0.00 rows, 0.00 B (0.00 rows/s., 0.00 B/s.)                                                                                                                                                                                     
18 rows in set. Elapsed: 0.002 sec. 

chi-mon-clickhouse-cluster-1-0-0.chi-mon-clickhouse-cluster-1-0.platform.svc.cluster.local :) 

i ended up running DROP TABLE signoz.logs_atrribute_keysto progress past the error