Configuring ClickHouse for S3 Retention and EC2 Deployment

TLDR sarthak asks about configuring S3 data retention and if ClickHouse documentation is enough. vishal-signoz shares guide links and confirms ClickHouse will query S3 data, SigNoz handles everything, and using EC2 with vertical scaling is fine for their use case.

Photo of sarthak
sarthak
Wed, 03 May 2023 09:46:04 UTC

hi , is there a way or some configuration tag in clickhouse-storage.xml to write a rule to move data on s3 based on time period eg : 1 month and empty the clickhouse tables so that it can ingest data periodically without any space issues .

Photo of vishal-signoz
vishal-signoz
Wed, 03 May 2023 11:26:39 UTC

You can set retention based on time period like move to s3 by following this doc:

Photo of vishal-signoz
vishal-signoz
Wed, 03 May 2023 11:27:42 UTC

Even in s3 the performance drop is around 30% so you can access s3 data like normal data (with some performance drop)

Photo of sarthak
sarthak
Wed, 03 May 2023 11:54:31 UTC

thanks vishal for the response , so you are saying , suppose if i retain after 7 days data into s3 by configuring then , after that clickhouse will automatically query old data from s3 , actually i also had that doubt that if i need to write some code/query using s3 table engine to get visuals from data stored in s3 .

Photo of vishal-signoz
vishal-signoz
Wed, 03 May 2023 11:56:25 UTC

> so you are saying , suppose if i retain after 7 days data into s3 by configuring then , after that clickhouse will automatically query old data from s3 Yes

Photo of sarthak
sarthak
Wed, 03 May 2023 11:56:51 UTC

thanks bro for clearification

Photo of vishal-signoz
vishal-signoz
Wed, 03 May 2023 11:57:03 UTC

> actually i also had that doubt that if i need to write some code/query using s3 table engine to get visuals from data stored in s3 No, SigNoz handles everything for you

Photo of sarthak
sarthak
Wed, 03 May 2023 11:58:24 UTC

actually i want to deeply understand about clickhouse working and storage pipeline , that's why i m exploring so that i can handle all issues whatever comes while going into production experiment .

Photo of sarthak
sarthak
Wed, 03 May 2023 11:58:44 UTC

so acc to you , clickhouse official documentation is enough right ?

Photo of vishal-signoz
vishal-signoz
Wed, 03 May 2023 11:59:16 UTC

Yes, you can follow this guide to connect to clickhouse:

Photo of sarthak
sarthak
Wed, 03 May 2023 11:59:39 UTC

okay

Photo of sarthak
sarthak
Wed, 03 May 2023 12:02:57 UTC

one more thing , i think i should discuss is that , i m thinking to use ec2 over elastic load balancing for replication but will be keeping vertically scalable rather than sharding after estimating the current scale i need to handle, will it be the good option in terms of best practices .

Photo of vishal-signoz
vishal-signoz
Wed, 03 May 2023 12:30:52 UTC

yes it should be fine

Photo of sarthak
sarthak
Wed, 03 May 2023 12:57:38 UTC

yeah , actually i m deploying it into production as a experiment for application performance analysis

Photo of sarthak
sarthak
Wed, 03 May 2023 12:58:18 UTC

so i was exploring some open source apm , i found this suitable in terms of features , usecases coverage , technical architecture wise

Photo of sarthak
sarthak
Wed, 03 May 2023 13:00:38 UTC

so i started reading more about observebility , opentelem and signoz in detail so that i can setup completely infrastructure in scalable way so as to remove dependency on saas based apm's

Photo of sarthak
sarthak
Wed, 03 May 2023 13:01:57 UTC

now i understood its working , deployed on stage and now trying to get proper grip on all components including best deployment stretagy