TLDR oluchi reports issue with services UI going blank after a while. Conversation explores possible reasons, such as S3 connection problems and disk space, but no resolution is reached.
how many replicas of query-service are defined? It should be 1
Do the services appear and disappear or they have never seen after adding s3?
oluchi
hello Ankit, thanks for your response. 1. We have just one replica of signoz query 2. They disappear and they reappear after we uninstall and install signoz again (`S3 setup and annotation` are added in the values.yaml) file.
okay...can you share clickhouse logs? I am guessing if s3 connection fails, then clickhouse doesn't show any data. Also can you check if size of data is increasing in s3?
one second, let me do the checks :point_down: Ankit
``` worker.go:445:dropReplicas():start:infra/signoz-clickhouse/e9e59dca-39f7-4444-91e9-5fb092c9daa1:drop replicas based on AP I0205 00:16:26.531186 1 worker.go:462] worker.go:462:dropReplicas():end:infra/signoz-clickhouse/e9e59dca-39f7-4444-91e9-5fb092c9daa1:processed replicas: 0 I0205 00:16:26.531219 1 worker.go:419] includeStopped():infra/signoz-clickhouse/e9e59dca-39f7-4444-91e9-5fb092c9daa1:add CHI to monitoring I0205 00:16:26.802933 1 worker.go:485] infra/signoz-clickhouse/9ca4c129-c258-425d-80b1-a956508a0752:IPs of the CHI [*****] I0205 00:16:26.815881 1 worker.go:489] infra/signoz-clickhouse/342fa60b-416a-4027-ae25-6de4bca505b7:Update users IPS I0205 00:16:27.042605 1 worker.go:505] markReconcileComplete():infra/signoz-clickhouse/e9e59dca-39f7-4444-91e9-5fb092c9daa1:reconcile completed I0215 20:17:43.965089 1 controller.go:309] infra/signoz-clickhouse:endpointsInformer.UpdateFunc: IP ASSIGNED: []v1.EndpointSubset{ v1.EndpointSubset{ Addresses: []v1.EndpointAddress{ v1.EndpointAddress{ IP: "172.********", Hostname: "", NodeName: &"ip-*******l", TargetRef: nil, }, }, NotReadyAddresses: nil, Ports: []v1.EndpointPort{ v1.EndpointPort{ Name: "http", Port: 8123, Protocol: "TCP", AppProtocol: nil, }, v1.EndpointPort{ Name: "tcp", Port: 9000, Protocol: "TCP", AppProtocol: nil, }, }, }, } I0215 20:17:44.020501 1 worker.go:299] infra/signoz-clickhouse/f48fbf51-ff72-45f1-abd8-96a17e4f8191:IPs of the CHI [*******] I0215 20:17:44.026758 1 worker.go:303] infra/signoz-clickhouse/9afb9ed0-a38e-44a2-a57d-598971239d44:Update users IPS I0215 20:17:44.035005 1 worker.go:1645] updateConfigMap():infra/signoz-clickhouse/9afb9ed0-a38e-44a2-a57d-598971239d44:Update ConfigMap infra/chi-signoz-clickhouse-common-usersd```
this does not have much useful information
can you grep by `s3`?
also can you check size of s3 if that is receiving data?
Checking ... Ankit
No useful info came up with `s3` except the following Ankit ```{e899fee7-1eea-4e3f-b6dc-6e7bd6141071} <Error> TCPHandler: Code: 243. DB::Exception: Cannot reserve 1.00 MiB, not enough space. (NOT_ENOUGH_SPACE), Stack trace (when copying this message, always include the lines below):```
how much space is left in the disk?
cc: Prashant what's the default config? Maybe we want to change the defaults of clickhouse for better operation at scale
oluchi any idea how much data you were trying to ingest?
One second, checking now Ankit
and this message is also temporary..it gets fixed once heavy ingestion is over. Can you check the time of the error?
the time of the error, is an hour ago
about 10gb still left Ankit
Alright Ankit, thank you for your time!
oluchi Can you share your S3 configuration? Our retention is currently done on the span timestamp, and then only it moves the data to cold storage. However, you need to move the data based on disk availability. Did you configure the `move_factor`? What is the approximate ingestion estimate?
> ```{e899fee7-1eea-4e3f-b6dc-6e7bd6141071} <Error> TCPHandler: Code: 243. DB::Exception: Cannot reserve 1.00 MiB, not enough space. (NOT_ENOUGH_SPACE), Stack trace (when copying this message, always include the lines below):``` I have seen this error occurs when there is no enough storage for the clickhouse storage PVC i.e. `/var/lib/clickhouse` mount.
But yeah, do share your S3 configuration, so that we can have a look at it.
my default cold storage setup Prashant ```clickhouse: cloud: aws installCustomStorageClass: false persistence: size: 30Gi # Cold storage configuration coldStorage: enabled: true defaultKeepFreeSpaceBytes: "10485760"``` s3 config ```{ "Statement": [ { "Action": [ "s3:GetObject", "s3:GetObjectVersion", "s3:PutBucketVersioning", "s3:PutObject" ], "Effect": "Allow", "Resource": [ "arn:aws:s3:::<bucket name>", "arn:aws:s3:::<bucket_name>/*" ] } ], "Version": "2012-10-17" }```
`defaultKeepFreeSpaceBytes` is used to reserve some free space on any disk but that doesn’t move the data. What was your `move_factor` ?
`move_factor` is that a value on the values.yaml file?
I see this is unavailable in our charts, but I believe you could override this. I think that’s the reason you are not seeing services. Your disk space is getting filled, but the default detention (7 days) is set on the timestamp of the span, which will not move for a week. But since you haven’t set up any `move_factor` (i.e. % free disk space that should always exist, and if it crosses this threshold ClickHouse will move the data to cold storage).
Okay, thank you Srikanth, I will look up information on how to override the `move_factor`
Prashant how can oluchi add the `move_factor` for volumes in our charts
it would not be possible right now with override.yaml. Maybe except for using `clickhouse.files` configuration.
Srikanth isn't the `move_factor` set to `0.1` by default?
shouldn't that be sufficient?
That’s why I was asking for the ingestion rate. If the rate is higher, the data get dropped before the background task can move. I wanted them to try something higher and test it.
Hello Srikanth, how do I check for ingestion rate, is it a kubectl cmd or I have to ssh into the clickhouse pods?
Yeah, you could get relevant info by querying in ClickHouse. Let me share some command that outputs the span per duration.
Can you exec into ClickHouse and share the output of this? ```SELECT toStartOfInterval(timestamp, toIntervalMinute(10)) AS time, count() AS count FROM signoz_traces.signoz_index_v2 GROUP BY time ORDER BY time ASC```
`not found`
Srikanth ```/ $ SELECT sh: SELECT: not found / $ toStartOfInterval(timestamp, toIntervalMinute(10)) AS time, sh: syntax error: unexpected word (expecting ")") / $ count() AS count / $ FROM signoz_traces.signoz_index_v2 sh: FROM: not found / $ GROUP BY time sh: GROUP: not found / $ ORDER BY time ASC sh: ORDER: not found / $ ```
oluchi you will have to execute it using `clickhouse client`
I thought as much Prashant, thanks
How can I drop data since a determinated day ?
Srikanth can you please look into this?
oluchi
Thu, 16 Feb 2023 12:23:26 UTCHello Signoz team, I noticed after a while, our services UI goes blank, we have set up retention with S3 bucket, please what could be actually wrong?