#support

SigNoz crashing in k8s due to ClickHouse OOM

TLDR Travis reported SigNoz crashing in k8s due to ClickHouse OOM. The team suggested increasing resources for ClickHouse, and other troubleshooting steps, but the issue remains unresolved.

Powered by Struct AI
Mar 27, 2023 (8 months ago)
Travis
Photo of md5-be309157e07e273684fb361e19b96b50
Travis
05:55 PM
hey SigNoz! We're using signoz in k8s and find that after a few hours we stop collecting logs. Restarting the otel-collector allows us to start collecting logs again. Any idea where to look for what's causing it to crash?
Ankit
Photo of md5-dbe7088320fe1d922707613e02f3420d
Ankit
06:10 PM
It should not...It's quite stable. What do the logs of otel-collector say? and what resource is allocated to SigNoz?
06:11
Ankit
06:11 PM
do you know how many log lines you are ingesting/min?
Travis
Photo of md5-be309157e07e273684fb361e19b96b50
Travis
07:16 PM
i'm seeing this in the logs:
signoz-otel-collector-init wget: can't connect to remote host (172.20.64.8): Connection refused                                                                                                                   
signoz-otel-collector-init waiting for clickhouseDB                                                                                                                                                               
stream logs failed container "signoz-otel-collector" in pod "signoz-otel-collector-76dd66c56c-98nk5" is waiting to start: PodInitializing for signoz/signoz-otel-collector-76dd66c56c-98nk5 (signoz-otel-collector)
Ankit
Photo of md5-dbe7088320fe1d922707613e02f3420d
Ankit
07:17 PM
clickhouse is becoming unavailable. Check the CPU and Memory allocated to SigNoz. Please increase it to 4CPUs
Travis
Photo of md5-be309157e07e273684fb361e19b96b50
Travis
10:04 PM
hmm... i'll look into how much is allocated to SigNoz. i don't think we set anything, just using whatever defaults were. fwiw, these nodes are not heavily utilized right now.
Image 1 for hmm... i'll look into how much is allocated to SigNoz. i don't think we set anything, just using whatever defaults were. fwiw, these nodes are _not_ heavily utilized right now.
10:15
Travis
10:15 PM
Which pod specifically needs more cpu? is it the signoz-otel-collector pod?
11:42
Travis
11:42 PM
okay. i increased those limits for clickhouse and restarted pods... i'll monitor to see if this happens again.

fwiw, here's some logs i found in the signoz-otel-collector-pods.

 signoz-otel-collector 2023-03-27T23:40:17.465Z    error    exporterhelper/queued_retry.go:310    Dropping data because sending_queue is full. Try increasing queue_size.    {"kind": "exporter", "data_type": "lo │
│ signoz-otel-collector                                                                                                              │
│ signoz-otel-collector     /go/pkg/mod/go.opentelemetry.io/[email protected]/exporter/exporterhelper/queued_retry.go:310                                                                                           │
│ signoz-otel-collector                                                                                                                  │
│ signoz-otel-collector     /go/pkg/mod/go.opentelemetry.io/[email protected]/exporter/exporterhelper/logs.go:114                                                                                                   │
│ signoz-otel-collector                                                                                                                           │
│ signoz-otel-collector     /go/pkg/mod/go.opentelemetry.io/collector/[email protected]/logs.go:36                                                                                                                   │
│ signoz-otel-collector                                                                                                                   │
│ signoz-otel-collector     /go/pkg/mod/go.opentelemetry.io/collector/processor/[email protected]/batch_processor.go:339                                                                                       │
│ signoz-otel-collector                                                                                                           │
│ signoz-otel-collector     /go/pkg/mod/go.opentelemetry.io/collector/processor/[email protected]/batch_processor.go:176                                                                                       │
│ signoz-otel-collector                                                                                                │
│ signoz-otel-collector     /go/pkg/mod/go.opentelemetry.io/collector/processor/[email protected]/batch_processor.go:144                                                                                       │
│ signoz-otel-collector 2023-03-27T23:40:17.465Z    warn    [email protected]/batch_processor.go:178    Sender failed    {"kind": "processor", "name": "batch", "pipeline": "logs", "error": "sending_queue is │
Mar 28, 2023 (8 months ago)
Travis
Photo of md5-be309157e07e273684fb361e19b96b50
Travis
01:02 AM
hmm... i increased the CPU available to clickhouse via this -- https://github.com/SigNoz/charts/blob/c0672a0c5491150348db74cfd27730414a6c66e8/charts/signoz/values.yaml#L161C4-L167

but i'm still seeing the same issue.
10:15
Travis
10:15 PM
friendly re-ping on this.
Mar 29, 2023 (8 months ago)
Ankit
Photo of md5-dbe7088320fe1d922707613e02f3420d
Ankit
04:06 AM
Srikanth Prashant please look into this
Srikanth
Photo of md5-ce04a9988e2fd758a659dc55be6f2543
Srikanth
04:37 AM
Travis did you make any changes to the collector config? Can you share the rate of sent_log_records and failed_log_records?
Travis
Photo of md5-be309157e07e273684fb361e19b96b50
Travis
03:23 PM
sure thing. sorry, i poked around through the SigNoz docs and i don't see where i can find those values?
Srikanth
Photo of md5-ce04a9988e2fd758a659dc55be6f2543
Srikanth
03:26 PM
Hmm, sorry, they are not anywhere documented. Let me share more context and how you can share them.
03:38
Srikanth
03:38 PM
Go to dashboards -> New dashboard -> Add panel -> Time series and chart the SUM_RATE of accepted_log_records and SUM_RATE of sent_log_records in different panels and share the result screenshots?
Travis
Photo of md5-be309157e07e273684fb361e19b96b50
Travis
04:02 PM
hmm alright. trying to get everything restarted to work again, but query service is forever
signoz-query-service-init waiting for clickhouseDB
Srikanth
Photo of md5-ce04a9988e2fd758a659dc55be6f2543
Srikanth
04:09 PM
ClickHouse should be available for query-service and collectors. Is it running?
Travis
Photo of md5-be309157e07e273684fb361e19b96b50
Travis
04:12 PM
hmm... restarted chi-signoz-clickhouse-cluster-0-0-0.
04:14
Travis
04:14 PM
it runs for ~1 minute and then it dies. logs:
│ clickhouse 2023.03.29 16:12:03.236724 [ 7 ] {} <Information> Application: Setting max_server_memory_usage was set to 3.60 GiB (4.00 GiB available * 0.90 max_server_memory_usage_to_ram_ratio)                    │
│ clickhouse 2023.03.29 16:12:03.248365 [ 7 ] {} <Information> CertificateReloader: One of paths is empty. Cannot apply new configuration for certificates. Fill all paths and try again.                           │
│ clickhouse 2023.03.29 16:12:03.278497 [ 7 ] {} <Information> Application: Uncompressed cache policy name                                                                                                          │
│ clickhouse 2023.03.29 16:12:03.278524 [ 7 ] {} <Information> Application: Uncompressed cache size was lowered to 2.00 GiB because the system has low amount of memory                                             │
│ clickhouse 2023.03.29 16:12:03.279636 [ 7 ] {} <Information> Context: Initialized background executor for merges and mutations with num_threads=16, num_tasks=32                                                  │
│ clickhouse 2023.03.29 16:12:03.279972 [ 7 ] {} <Information> Context: Initialized background executor for move operations with num_threads=8, num_tasks=8                                                         │
│ clickhouse 2023.03.29 16:12:03.280512 [ 7 ] {} <Information> Context: Initialized background executor for fetches with num_threads=8, num_tasks=8                                                                 │
│ clickhouse 2023.03.29 16:12:03.280890 [ 7 ] {} <Information> Context: Initialized background executor for common operations (e.g. clearing old parts) with num_threads=8, num_tasks=8                             │
│ clickhouse 2023.03.29 16:12:03.281002 [ 7 ] {} <Information> Application: Mark cache size was lowered to 2.00 GiB because the system has low amount of memory                                                     │
│ clickhouse 2023.03.29 16:12:03.281075 [ 7 ] {} <Information> Application: Loading user defined objects from /var/lib/clickhouse/                                                                                  │
│ clickhouse 2023.03.29 16:12:03.282445 [ 7 ] {} <Information> Application: Loading metadata from /var/lib/clickhouse/                                                                                              │
│ clickhouse 2023.03.29 16:12:03.310147 [ 7 ] {} <Information> DatabaseAtomic (system): Metadata processed, database system has 6 tables and 0 dictionaries in total.                                               │
│ clickhouse 2023.03.29 16:12:03.310171 [ 7 ] {} <Information> TablesLoader: Parsed metadata of 6 tables in 1 databases in 0.012396625 sec                                                                          │
│ clickhouse 2023.03.29 16:12:03.310199 [ 7 ] {} <Information> TablesLoader: Loading 6 tables with 0 dependency level                                                                                               │
│ clickhouse 2023.03.29 16:12:18.565650 [ 58 ] {} <Information> TablesLoader: 16.666666666666668%                                                                                                                   │
│ clickhouse 2023.03.29 16:13:21.737596 [ 58 ] {} <Information> TablesLoader: 33.333333333333336%                                                                                                                   │
│ clickhouse 2023.03.29 16:13:31.576439 [ 8 ] {} <Information> Application: Received termination signal (Terminated)                                                                                                │
│ signoz-clickhouse-init + chmod +x /var/lib/clickhouse/user_scripts/histogramQuantile                                                                                                                              │
│ Stream closed EOF for signoz/chi-signoz-clickhouse-cluster-0-0-0 (signoz-clickhouse-init)                                                                                                                         │
│ Stream closed EOF for signoz/chi-signoz-clickhouse-cluster-0-0-0 (clickhouse)                                                                                                                                     │
│
04:34
Travis
04:34 PM
yeah so it appears it's running out of memory... which is my original question. how can i increase the memory/cpu allotted to clickhouse when running in k8s?
Srikanth
Photo of md5-ce04a9988e2fd758a659dc55be6f2543
Srikanth
04:37 PM
Prashant can help with that.
Prashant
Photo of md5-1899629483c7ab1dccfbee6cc2f637b9
Prashant
04:44 PM
Travis By default, we haven't set any limits for the ClickHouse pods. Hence, you would only get OOM for insufficient resource in your K8s cluster.

https://github.com/SigNoz/charts/blob/main/charts/signoz/values.yaml#L161-L167
Travis
Photo of md5-be309157e07e273684fb361e19b96b50
Travis
04:47 PM
i don't think my cluster was actually running out of resources... the utilization is fairly low. yet clickhouse is unavailable.

https://signoz-community.slack.com/archives/C01HWQ1R0BC/p1679944653303509?thread_ts=1679939741.949799&amp;cid=C01HWQ1R0BC
04:52
Travis
04:52 PM
after i restart chi-signoz-clickhosue-cluster-0-0-0, i see that it eventually crashes.

│ clickhouse 2023.03.29 16:50:17.822883 [ 7 ] {} <Information> Application: Loading user defined objects from /var/lib/clickhouse/                                                                                  │
│ clickhouse 2023.03.29 16:50:17.823295 [ 7 ] {} <Information> Application: Loading metadata from /var/lib/clickhouse/                                                                                              │
│ clickhouse 2023.03.29 16:50:17.831628 [ 7 ] {} <Information> DatabaseAtomic (system): Metadata processed, database system has 6 tables and 0 dictionaries in total.                                               │
│ clickhouse 2023.03.29 16:50:17.831656 [ 7 ] {} <Information> TablesLoader: Parsed metadata of 6 tables in 1 databases in 0.003232883 sec                                                                          │
│ clickhouse 2023.03.29 16:50:17.831689 [ 7 ] {} <Information> TablesLoader: Loading 6 tables with 0 dependency level                                                                                               │
│ clickhouse 2023.03.29 16:50:31.297963 [ 59 ] {} <Information> TablesLoader: 16.666666666666668%                                                                                                                   │
│ Stream closed EOF for signoz/chi-signoz-clickhouse-cluster-0-0-0 (signoz-clickhouse-init)                                                                                                                         │
│ clickhouse 2023.03.29 16:51:24.583409 [ 59 ] {} <Information> TablesLoader: 50%                                                                                                                                   │
│ clickhouse 2023.03.29 16:51:40.424811 [ 58 ] {} <Information> TablesLoader: 66.66666666666667%                                                                                                                    │
│ clickhouse 2023.03.29 16:51:46.080805 [ 8 ] {} <Information> Application: Received termination signal (Terminated)                                                                                                │
│
04:53
Travis
04:53 PM
but as far as i can tell, the node it's running on has plenty of headroom.
04:53
Travis
04:53 PM
via EKS dashboard in AWS:
Image 1 for via EKS dashboard in AWS:
05:11
Travis
05:11 PM
i can't figure out why i keep getting this Application: Received termination signal (Terminated)

clickhouse 2023.03.29 17:09:07.664915 [ 58 ] {} <Information> TablesLoader: 16.666666666666668%                                                                                                                   clickhouse 2023.03.29 17:09:45.751947 [ 58 ] {} <Information> TablesLoader: 33.333333333333336%                                                                                                                   clickhouse 2023.03.29 17:10:11.747418 [ 58 ] {} <Information> TablesLoader: 50%                                                                                                                                   clickhouse 2023.03.29 17:10:15.038757 [ 8 ] {} <Information> Application: Received termination signal (Terminated)                                                                                                clickhouse 2023.03.29 17:10:22.257197 [ 60 ] {} <Information> TablesLoader: 66.66666666666667%

Prashant
Photo of md5-1899629483c7ab1dccfbee6cc2f637b9
Prashant
05:14 PM
Travis Are you scraping logs from all pods in the cluster? How many are there?
Travis
Photo of md5-be309157e07e273684fb361e19b96b50
Travis
05:16 PM
57 total pods. i really don't care about logs from most pods, just from our application itself, which is only ~9 pods.
05:16
Travis
05:16 PM
i believe k8s default is to scrape logs from all pods, right? so yes.
Prashant
Photo of md5-1899629483c7ab1dccfbee6cc2f637b9
Prashant
05:17 PM
I see.
05:17
Prashant
05:17 PM
First of all, can you run kubectl describe on the CHI pod?
05:18
Prashant
05:18 PM
And share the termination exit codes and the events
05:18
Prashant
05:18 PM
Also, looking into logs of chi pods for any errors that may have been printed prior.
Travis
Photo of md5-be309157e07e273684fb361e19b96b50
Travis
05:18 PM
events:
 Events:                                                                                                                                                                                                           │
│   Type     Reason     Age                     From               Message                                                                                                                                          │
│   ----     ------     ----                    ----               -------                                                                                                                                          │
│   Normal   Scheduled  4m35s                   default-scheduler  Successfully assigned signoz/chi-signoz-clickhouse-cluster-0-0-0 to ip-10-0-3-214.us-west-2.compute.internal                                     │
│   Normal   Pulled     4m34s                   kubelet            Container image "" already present on machine                                                                              │
│   Normal   Created    4m34s                   kubelet            Created container signoz-clickhouse-init                                                                                                         │
│   Normal   Started    4m34s                   kubelet            Started container signoz-clickhouse-init                                                                                                         │
│   Normal   Pulled     4m33s                   kubelet            Container image "" already present on machine                                                │
│   Normal   Created    4m33s                   kubelet            Created container clickhouse                                                                                                                     │
│   Normal   Started    4m33s                   kubelet            Started container clickhouse                                                                                                                     │
│   Warning  Unhealthy  3m31s (x18 over 4m22s)  kubelet            Readiness probe failed: Get "": dial tcp 10.0.3.105:8123: connect: connection refused                                 │
│   Warning  Unhealthy  3m31s                   kubelet            Liveness probe failed: Get "": dial tcp 10.0.3.105:8123: connect: connection refused                                  │
│                                                                                                                                                                                                                   │
05:20
Travis
05:20 PM
here's the state:
│     State:          Terminated                                                                                                                                                                                    │
│       Reason:       Completed                                                                                                                                                                                     │
│       Exit Code:    0                                                                                                                                                                                             │
│       Started:      Wed, 29 Mar 2023 10:13:25 -0700                                                                                                                                                               │
│       Finished:     Wed, 29 Mar 2023 10:13:26 -0700                                                                                                                                                               │
│     Ready:          True
Prashant
Photo of md5-1899629483c7ab1dccfbee6cc2f637b9
Prashant
05:26 PM
^ Travis is the state above from init containers or the CHI container?
Travis
Photo of md5-be309157e07e273684fb361e19b96b50
Travis
05:28 PM
from the chi-signoz-clickhouse-cluster-0-0-0 pod
05:30
Travis
05:30 PM
oh i see the different containers now... that was from init. :man-facepalming:
05:30
Travis
05:30 PM
here this is more interesting:
Containers:
  clickhouse:
    Container ID:  1
    Image:         
    Image ID:      
    Ports:         8123/TCP, 9000/TCP, 9009/TCP, 9000/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP, 0/TCP
    Command:
      /bin/bash
      -c
      /usr/bin/clickhouse-server --config-file=/etc/clickhouse-server/config.xml
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Wed, 29 Mar 2023 10:25:26 -0700
      Finished:     Wed, 29 Mar 2023 10:27:25 -0700
    Ready:          False
    Restart Count:  6
    Requests:
      cpu:        4
      memory:     8Gi
    Liveness:     http-get http://:http/ping delay=60s timeout=1s period=3s #success=1 #failure=10
    Readiness:    http-get http://:http/ping delay=10s timeout=1s period=3s #success=1 #failure=3
Prashant
Photo of md5-1899629483c7ab1dccfbee6cc2f637b9
Prashant
05:31 PM
&gt;
Exit Code:    137

This confirms OOM.
05:32
Prashant
05:32 PM
It could be caused by outburst of logs from all pods.

You can increase the resource requests of clickhouse and test it out.
Travis
Photo of md5-be309157e07e273684fb361e19b96b50
Travis
05:35 PM
alright. it was 8Gi before, which seems like a lot. half the memory on one of our nodes. i'll try giving it 16Gi and see...
Prashant
Photo of md5-1899629483c7ab1dccfbee6cc2f637b9
Prashant
05:36 PM
8Gi for resource requests or limits?
05:36
Prashant
05:36 PM
can you share what you have it set?
Travis
Photo of md5-be309157e07e273684fb361e19b96b50
Travis
05:36 PM
clickhouse.resources.requests.memory
05:37
Travis
05:37 PM
      resources:
        requests:
          cpu: '4'
          memory: 16Gi
Prashant
Photo of md5-1899629483c7ab1dccfbee6cc2f637b9
Prashant
05:37 PM
okay. let me know how it goes with this.
Travis
Photo of md5-be309157e07e273684fb361e19b96b50
Travis
05:39 PM
why do you think it's using so much memory? because we're trying to scrape logs from all pods? i assume in the ConfigMap for the otel-collector pod that's where i'd exclude logs for pods i don't care about?
05:40
Travis
05:40 PM
oh, it's in the signoz-k8s-infra-otel-agent configmap, yeah?
    receivers:
      filelog/k8s:
        exclude:
        - /var/log/pods/kube-system_*.log
        - /var/log/pods/*_hotrod*_*/*/*.log
        - /var/log/pods/*_locust*_*/*/*.log
        include:
        - /var/log/pods/*/*/*.log
05:45
Travis
05:45 PM
i guess also -- if we don't have the limit set, why do we need to increase the requests.memory? shouldn't it just use everything up to all the available memory on the node?
Prashant
Photo of md5-1899629483c7ab1dccfbee6cc2f637b9
Prashant
06:22 PM
yes, ideally it should not limit it consuming more resources.
06:23
Prashant
06:23 PM
Travis can you try with the following?

      resources:
        requests:
          cpu: '1'
          memory: 4Gi
        limits:
          cpu: '4'
          memory: 16Gi
06:26
Prashant
06:26 PM
If this does not resolve it, we could perhaps schedule a call to take a look at it.
Travis
Photo of md5-be309157e07e273684fb361e19b96b50
Travis
06:31 PM
testing now!

for reference, since clickhouse is working, here's what the logs look like. if that "rows/sec" metric is meaningful to you.

clickhouse 2023.03.29 18:30:07.892732 [ 216 ] <Information> executeQuery: Read 375301 rows, 2.86 MiB in 6.395357589 sec., 58683 rows/sec., 458.46 KiB/sec.
clickhouse 2023.03.29 18:30:10.799741 [ 235 ] <Information> executeQuery: Read 375301 rows, 2.86 MiB in 5.918852143 sec., 63407 rows/sec., 495.37 KiB/sec.
clickhouse 2023.03.29 18:30:13.988077 [ 11 ]  <Information> executeQuery: Read 375301 rows, 2.86 MiB in 6.059014593 sec., 61940 rows/sec., 483.91 KiB/sec.
clickhouse 2023.03.29 18:30:14.038654 [ 10 ] <Information> executeQuery: Read 375301 rows, 2.86 MiB in 6.09963416 sec., 61528 rows/sec., 480.69 KiB/sec.
clickhouse 2023.03.29 18:30:18.055179 [ 229 ] <Information> executeQuery: Read 5 rows, 282.00 B in 26.759150599 sec., 0 rows/sec., 10.54 B/sec.
clickhouse 2023.03.29 18:30:18.079163 [ 235 ] <Information> executeQuery: Read 375301 rows, 2.86 MiB in 7.233857096 sec., 51881 rows/sec., 405.32 KiB/sec
06:39
Travis
06:39 PM
hmm... with that it crashes. i have to set the requests.memory higher it seems
Mar 30, 2023 (8 months ago)
Travis
Photo of md5-be309157e07e273684fb361e19b96b50
Travis
04:03 AM
well that's too bad.. now even with the requests.memory set to use all the memory on the node, it's still crashing with an OOM.
04:04
Travis
04:04 AM
at the bottom of this page, i see that multiple replicas are not supported for clickhouse, is that right?
04:05
Travis
04:05 AM
i tried doing something like this -- https://signoz-community.slack.com/archives/C01HWQ1R0BC/p1677646864948979?thread_ts=1677645049.328509&amp;cid=C01HWQ1R0BC

but clickhouse client command doesn't work.

 $ kubectl exec -n signoz -it chi-signoz-clickhouse-cluster-0-0-0 -- sh
Defaulted container "clickhouse" out of: clickhouse, signoz-clickhouse-init (init)
/ $ clickhouse client
ClickHouse client version 22.8.8.3 (official build).
Connecting to localhost:9000 as user default.
Code: 210. DB::NetException: Connection refused (localhost:9000). (NETWORK_ERROR)
Srikanth
Photo of md5-ce04a9988e2fd758a659dc55be6f2543
Srikanth
06:20 AM
Yes, we don’t yet support replication.
&gt; but clickhouse client command doesn’t work.
Try clickhouse-client , ideally, both should work. Make sure you are exec’ing into clickhouse-cluster not the clickhouse-operator.
Travis
Photo of md5-be309157e07e273684fb361e19b96b50
Travis
03:20 PM
yeah, no luck.

<<K9s-Shell>> Pod: signoz/chi-signoz-clickhouse-cluster-0-0-0 | Container: clickhouse
bash-5.1$ clickhouse-client
ClickHouse client version 22.8.8.3 (official build).
Connecting to localhost:9000 as user default.
Code: 210. DB::NetException: Connection refused (localhost:9000). (NETWORK_ERROR)
03:21
Travis
03:21 PM
i assume this is because the clickhouse-server is not actually running yet or something? it's still running the TablesLoader when it OOMs.
Srikanth
Photo of md5-ce04a9988e2fd758a659dc55be6f2543
Srikanth
03:35 PM
Yes, it’s not ready to accept any client connections.
03:36
Srikanth
03:36 PM
How much data do you think is already present in the DB?
Travis
Photo of md5-be309157e07e273684fb361e19b96b50
Travis
03:51 PM
that's hard for me to say.. but i can get a shell in the container for ~30 seconds before it OOMs. is there somewhere i could look?
03:54
Travis
03:54 PM
from clickhouse docs it seems like /var/lib/clickhouse 👍
03:55
Travis
03:55 PM
my shell dies pretty quick, but /var/lib/clickhouse/data is only 160kb.
04:23
Travis
04:23 PM
i can't find out how large /var/lib/clickhouse/store is, because the pod OOMs before du has time to return any info to me and i lose my shell.
04:24
Travis
04:24 PM
assuming i am okay to just drop all logs, can i just delete everything in the /var/lib/clickhouse/store dir altogether?
04:29
Travis
04:29 PM
then, once i can get clickhouse running again i'll set retention much lower to hopefully avoid this in the future...
Srikanth
Photo of md5-ce04a9988e2fd758a659dc55be6f2543
Srikanth
04:30 PM
I know the /store contains the part files, but I don’t know what else goes in there? Can you delete the whole PV data just to be safe and not leave it in any corrupt state?
Travis
Photo of md5-be309157e07e273684fb361e19b96b50
Travis
04:51 PM
you're saying just delete the entire /var/lib/clickhouse/ dir?
04:51
Travis
04:51 PM
will it get recreated on startup?
Srikanth
Photo of md5-ce04a9988e2fd758a659dc55be6f2543
Srikanth
04:52 PM
Yes
Travis
Photo of md5-be309157e07e273684fb361e19b96b50
Travis
04:53 PM
okay. i'll spin up a separate ec2 instance and mount my EFS so i can ensure the whole thing gets deleted. i doubt it'd get deleted in time from within my clickhouse pod before it dies

SigNoz Community

Built with ClickHouse as datastore, SigNoz is an open-source APM to help you find issues in your deployed applications & solve them quickly | Knowledge Base powered by Struct.AI

Indexed 1023 threads (61% resolved)

Join Our Community