Alert Issues After Upgrading SigNoz
TLDR James experienced issues with alert rules after upgrading SigNoz. Srikanth suggested using sqlite tool to remove the alerts.
Jun 21, 2023 (5 months ago)
James
07:32 AMRecently upgraded from v0.15 -> v0.20.2. Running via docker-compose. After trying to remove alerts that were behaving weirdly post update via the alert dashboard, signoz crashed and on reboot alertmanager service fails to pass startup with docker-compose up. On the rare chance it does pass, the frontend is mostly broken and displays not found on all pages after trying to mess around with alerts again. I believe this is a problem with how some alerts were updated via v0.19 upgrade script. (Running this script again gives a seg fault now). I am trying to find a way to make the service usable again, I am ok with manually deleting the current alerts if thats the problem but am unsure where they are stored and how to do so. Thanks in advance :).
Current version info:
SigNoz version : v0.20.2
Commit SHA-1 : 84c4668
Commit timestamp : 2023-06-09T08:52:13Z
Branch : HEAD
Go version : go1.18.10
Launch logs from alertmanager service:
2023-06-21T05:34:22.453Z INFO rules/thresholdRule.go:754 rule:Runner Error alerts found: 0
2023-06-21T05:34:22.453Z INFO rules/thresholdRule.go:309 msg:sending alerts rule:Runner Error
2023-06-21T05:34:23.629Z DEBUG rules/ruleTask.go:297 msg:%!(EXTRA string=rule task eval started, string= name:, string=18-groupname, string= start time:, time.Time=2023-06-21 05:34:23.629831765 +0000 UTC)
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xfa9cc1]
goroutine 429 [running]:
, 0x5?, 0x0)
/go/src/github.com/signoz/signoz/pkg/query-service/app/queryBuilder/query_builder.go:82 +0x41
, 0xc000fbe630, {0x0, 0x0, 0x0?})
/go/src/github.com/signoz/signoz/pkg/query-service/app/queryBuilder/query_builder.go:178 +0x5d3
, {0x1916378?, 0xc000fbe2d0?, 0x0?})
/go/src/github.com/signoz/signoz/pkg/query-service/rules/thresholdRule.go:554 +0x45
, {0x1916378, 0xc000fbe5d0}, {0xc0008e75f0?, 0x4588e9?, 0x0?}, {0x191f050, 0xc000ffe2c0})
/go/src/github.com/signoz/signoz/pkg/query-service/rules/thresholdRule.go:615 +0x11f
, {0x1916378, 0xc000fbe5d0}, {0x1318460?, 0xc00079ec90?, 0x0?}, 0x0?)
/go/src/github.com/signoz/signoz/pkg/query-service/rules/thresholdRule.go:669 +0x97
?, 0xc000fbe5a0?}, {0x0?, 0xc00069e3d0?, 0x0?}, 0xc0000418c0, 0x4?, {0x1923620, 0xc0004402d0})
/go/src/github.com/signoz/signoz/pkg/query-service/rules/ruleTask.go:321 +0x22f
, {0x1916378, 0xc000fbe5a0}, {0xc000794da0?, 0x46a6b9?, 0x0?})
/go/src/github.com/signoz/signoz/pkg/query-service/rules/ruleTask.go:338 +0x24a
/go/src/github.com/signoz/signoz/pkg/query-service/rules/ruleTask.go:114 +0x87
(0xc0000418c0, {0x1916308, 0xc000048018})
/go/src/github.com/signoz/signoz/pkg/query-service/rules/ruleTask.go:127 +0x4a9
/go/src/github.com/signoz/signoz/pkg/query-service/rules/manager.go:482 +0x5a
created by
/go/src/github.com/signoz/signoz/pkg/query-service/rules/manager.go:477 +0x425
2023-06-21T05:34:24.438Z INFO version/version.go:43
Srikanth
08:41 AMdeploy/docker/clickhouse-setup/data/signoz/signoz.db
. You can use sqlite tool to remove the alrets.James
10:52 AMOn a side note, there may be an unhandled edge case with v0.19 upgrade scripts for some alert rules, not sure if anyone else ran into this issue
Srikanth
10:55 AMSigNoz Community
Indexed 1023 threads (61% resolved)
Similar Threads
Troubleshooting Interactive Message Errors on Query-Service
Javier asked how to check query-service logs and shared an error message. Srikanth helped Javier identify the problem and suggested restarting the `clickhouse-setup_otel-collector-1` container. Javier confirmed the solution worked.
Issue with Restarting Containers After Signoz Update
Div had issues with restarting containers after updating Signoz. Srikanth suggested checking the migration guide, which resolved the issue.
Resolving Signoz Query Service Error
Einav encountered an error related to a missing table in the Signoz service which was preventing data visibility in the UI. Srikanth guided them to restart specific components and drop a database table, which resolved the issue.