#support

Using Kafka Broker for Telem Data to Ensure Fault Tolerance

TLDR sarthak inquires about using kafka broker with ClickHouse for fault tolerance. Srikanth and Ankit discuss scale and data flow feasibility. sarthak also requests a circuit-breaking mechanism, which Ankit explains is already in place.

Powered by Struct AI
white_check_mark1
6
4mo
Solved
Join the chat
May 17, 2023 (4 months ago)
sarthak
Photo of md5-bc67a182dac7a9be9e74164eebcd86ee
sarthak
04:02 AM
hello everyone , is it recommended to use kafka broker as receiver of telem data to export in clickhouse instead of standard grpc in case we need to handle high scale and keep fault tolerance so as to prevent possible data loss in case storage/clickhouse failure ?
white_check_mark1
May 18, 2023 (4 months ago)
Srikanth
Photo of md5-ce04a9988e2fd758a659dc55be6f2543
Srikanth
01:50 AM
How much scale are we talking about? It might be overkill for regular users. Just putting the queue alone doesn’t guarantee the prevention of data loss since exporter will eventually drop the data when ClickHouse is not reachable.
Ankit
Photo of md5-dbe7088320fe1d922707613e02f3420d
Ankit
04:07 AM
> It might be overkill for regular users.
correct. I think the data flow would look like otel-collector => kafka => clickhouse so we expect Kafka to handle bursts in traffic and downtime of clickhouse
Srikanth
Photo of md5-ce04a9988e2fd758a659dc55be6f2543
Srikanth
04:55 AM
They mentioned they want to use it as a receiver as a substitute for the gRPC OTLP receiver and then export it to ClickHouse.
sarthak
Photo of md5-bc67a182dac7a9be9e74164eebcd86ee
sarthak
08:52 AM
ok , so is there a way to implement some circuit breaking mechanism at microservice level keeping transport mechanism to as it is (gRPC) which can be pass with other env variables so that source service does not become down in case signoz backend is completely down as it will be continuously sending telemetry event to signoz , just faced this on my basic setup and testing
Ankit
Photo of md5-dbe7088320fe1d922707613e02f3420d
Ankit
02:05 PM
source service starts dropping telemetry data if signoz is down. It should not affect application other than it will need more memory to keep a batch after which it starts dropping data. The service will also print logs about being unable to send data to signoz but application should work just fine