#support

Debugging and Optimizing Slow Signoz Dashboard

TLDR Div reported slow Signoz dashboard and errors. Srikanth suggested creating a materialized column and adding filters to speed up queries. vishal-signoz confirmed the suggested approach.

Powered by Struct AI

1

1

26
7mo
Solved
Join the chat
May 10, 2023 (7 months ago)
Div
Photo of md5-69a12cc92bad5b259938b070788a11aa
Div
03:19 AM
Hi Team, recently Signoz started to be much slower in our case to query the data for last 6 hours and sometimes the dashboard failing like with this error

โ€ข is there any documentation to debug slow issues or optimise the signoz dashboard ?
Image 1 for Hi Team, recently Signoz started to be much slower in our case to query the data for last 6 hours and sometimes the dashboard failing like with this error

โ€ข is there any documentation to debug slow issues or optimise the signoz dashboard ?
03:20
Div
03:20 AM
we are in the latest version 0.18.3
Srikanth
Photo of md5-ce04a9988e2fd758a659dc55be6f2543
Srikanth
03:24 AM
What are the resources (CPU and Memory) given to ClickHouse?
Div
Photo of md5-69a12cc92bad5b259938b070788a11aa
Div
03:29 AM
Image 1 for
03:34
Div
03:34 AM
I have noticed when doing any search on signoz the clickhouse container CPU usage is going beyond 200%
03:34
Div
03:34 AM
using docker stats | grep container_id
Srikanth
Photo of md5-ce04a9988e2fd758a659dc55be6f2543
Srikanth
04:05 AM
How many CPU cores are available?
Div
Photo of md5-69a12cc92bad5b259938b070788a11aa
Div
04:09 AM
with nproc it shows 2, and no specific cpu core setting is done on the docker container, so it has access to the 2 CPU cores from host
Ankit
Photo of md5-dbe7088320fe1d922707613e02f3420d
Ankit
04:09 AM
Srikanth can you please get on a call with Div and check the query, the amount of data fetched, the processing speed of clickhouse, etc. We can discuss more based on the data. You if you have a set of commands, that can be run and will collect these data would also be good
04:10
Ankit
04:10 AM
2CPUs are less... how much data are you handling?
Div
Photo of md5-69a12cc92bad5b259938b070788a11aa
Div
04:14 AM
In a 1 day this is traces count
Image 1 for In a 1 day this is traces count
Srikanth
Photo of md5-ce04a9988e2fd758a659dc55be6f2543
Srikanth
04:17 AM
2 CPU == 2 concurrent queries. What does the chart you mentioned above query?
Div
Photo of md5-69a12cc92bad5b259938b070788a11aa
Div
04:18 AM
the chart above if you mean the clickhouse one, is the stats I got from docker stats | grep container_id and while filtering traces for 1 Day
Srikanth
Photo of md5-ce04a9988e2fd758a659dc55be6f2543
Srikanth
04:19 AM
No, I meant the original chat service chart
Div
Photo of md5-69a12cc92bad5b259938b070788a11aa
Div
04:20 AM
ah ok, it has 4 panels
1- network latency
2 - GC metrics
3- error rates
4- memory usage
Srikanth
Photo of md5-ce04a9988e2fd758a659dc55be6f2543
Srikanth
04:24 AM
You gc queries the stringTagMap in traces which will get really slow and would lead to timeout for other panels since the gc query is not yet finished. Create a materialized column for it and check again.
Div
Photo of md5-69a12cc92bad5b259938b070788a11aa
Div
04:28 AM
as a quick solution, increasing the CPU cores would be good enough ?
Srikanth
Photo of md5-ce04a9988e2fd758a659dc55be6f2543
Srikanth
04:38 AM
That might solve your issue temporarily, but I would recommend creating materialized column so you will not face the same issue.
Div
Photo of md5-69a12cc92bad5b259938b070788a11aa
Div
04:42 AM
is there any doc ref I can look at on how to add this materialized column for our usecase ?

we are sending the GC events via collector tracing service to signoz.
Srikanth
Photo of md5-ce04a9988e2fd758a659dc55be6f2543
Srikanth
05:03 AM
I donโ€™t think we have docs for that yet. We will look into it since it might also effect the exporter. If you can exec into clickhouse -> clieckhouse-client and run the query and share the end of the output that would be helpful.

SELECT toStartOfInterval(timestamp, INTERVAL 60 SECOND) AS ts, 
avg(durationNano) as value 
from signoz_traces.distributed_signoz_index_v2 where (timestamp >= (now() - toIntervalHour(6))) 
AND stringTagMap['gc_type']='GC' group by ts order by ts
Ankit
Photo of md5-dbe7088320fe1d922707613e02f3420d
Ankit
09:06 AM
vishal-signoz adding serviceName in the filters should speed up things, right?
09:08
Ankit
09:08 AM
vishal-signoz also, adding mapContains(stringTagMap, 'gc_type') will make the query faster?
vishal-signoz
Photo of md5-f936d3e5743d23344d6c60813189716f
vishal-signoz
09:09 AM
> adding serviceName in the filters should speed up things, right?
Yes, more filters means less DB rows scanned and it will be faster
09:10
vishal-signoz
09:10 AM
> also, adding mapContains(stringTagMap, 'gc_type') will make the query faster?
Yes if usecase is just to check if gc_type exists then this way it should be faster.
Ankit
Photo of md5-dbe7088320fe1d922707613e02f3420d
Ankit
09:36 AM
> Yes if usecase is just to check if gc_type exists then this way it should be faster.
I meant adding mapContains along with AND stringTagMap['gc_type']='GC'
vishal-signoz
Photo of md5-f936d3e5743d23344d6c60813189716f
vishal-signoz
10:05 AM
> I meant adding mapContains along with AND stringTagMap['gc_type']='GC'
Tested this on a test environment. This doesnโ€™t reduce number of rows scanned and no significant performance changes with mapContains filter

1

1

SigNoz Community

Built with ClickHouse as datastore, SigNoz is an open-source APM to help you find issues in your deployed applications & solve them quickly | Knowledge Base powered by Struct.AI

Indexed 1023 threads (61% resolved)

Join Our Community