Kubernetes Signoz-otel-collector Issue and Clickhouse Cold Storage

TLDR Pruthvi faced an issue with Kubernetes signoz-otel-collector. nitya-signoz suggested deleting the `signoz_logs` database and restarting collectors. Pruthvi then asked about Clickhouse cold storage on S3 and observed a spike in cost, which Ankit agreed to investigate further.

Photo of Pruthvi
Pruthvi
Thu, 30 Mar 2023 09:50:03 UTC

I am facing issue in Kubernetes signoz-otel-collector ```2023-03-30T09:40:18.329Z info kube/client.go:101 k8s filtering {"kind": "processor", "name": "k8sattributes", "pipeline": "metrics/generic", "labelSelector": "", "fieldSelector": "spec.nodeName=ip-10-0-6-126.ap-south-1.compute.internal"} 2023-03-30T09:40:18.469Z info clickhouselogsexporter/exporter.go:356 Running migrations from path: {"kind": "exporter", "data_type": "logs", "name": "clickhouselogsexporter", "test": "/logsmigrations"} Error: cannot build pipelines: failed to create "clickhouselogsexporter" exporter, in pipeline "logs": cannot configure clickhouse logs exporter: clickhouse Migrate failed to run, error: Dirty database version 5. Fix and force version. 2023/03/30 09:40:18 application run finished with error: cannot build pipelines: failed to create "clickhouselogsexporter" exporter, in pipeline "logs": cannot configure clickhouse logs exporter: clickhouse Migrate failed to run, error: Dirty database version 5. Fix and force version.```

Photo of nitya-signoz
nitya-signoz
Thu, 30 Mar 2023 10:19:46 UTC

Hi, can you please tell me how did you reach this state? Did it happen when you were upgrading or did you make any changes to the logs schema manually ? Also if your existing logs data is not that important -> a hacky way to get things back to normal will be deleting the `signoz_logs` database.

Photo of Pruthvi
Pruthvi
Thu, 30 Mar 2023 10:22:10 UTC

I dropped the pvc claim of clickhouse.

Photo of Pruthvi
Pruthvi
Thu, 30 Mar 2023 10:22:31 UTC

And restarted clickhouse

Photo of Pruthvi
Pruthvi
Thu, 30 Mar 2023 10:22:49 UTC

As I had some issues in updating logs filter rules

Photo of nitya-signoz
nitya-signoz
Thu, 30 Mar 2023 10:27:30 UTC

What do you mean by “updating logs filter rules” ?

Photo of nitya-signoz
nitya-signoz
Thu, 30 Mar 2023 10:28:03 UTC

If the PV is the same then there shouldn’t be a problem, correct me if I am wrong <@4K165d>

Photo of Pruthvi
Pruthvi
Thu, 30 Mar 2023 10:28:07 UTC

added these in processors ```- type: filter expr: 'attributes.namespace == "signoz"' - type: filter expr: 'attributes.namespace == "tools"' - type: filter expr: 'attributes.container_name == "otc-container"'```

Photo of Pruthvi
Pruthvi
Thu, 30 Mar 2023 10:28:37 UTC

new volume has come up and i think otel-collector created the tables and got stuck in the migration

Photo of nitya-signoz
nitya-signoz
Thu, 30 Mar 2023 10:29:55 UTC

Oh, you deleted the PV as well. Applying filter processors wont cause any issues on otel-collector.

Photo of Pruthvi
Pruthvi
Thu, 30 Mar 2023 10:30:10 UTC

somehow it happened

Photo of nitya-signoz
nitya-signoz
Thu, 30 Mar 2023 10:30:16 UTC

Can you delete the `signoz_logs` database and restart your collectors.

Photo of Pruthvi
Pruthvi
Thu, 30 Mar 2023 10:30:21 UTC

and also i see these logs in clickhouse ```0. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0xa3ef75a in /usr/bin/clickhouse 1. DB::Block::getByName(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool) const @ 0x13ef0872 in /usr/bin/clickhouse 2. DB::getBlockAndPermute(DB::Block const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, DB::PODArray<unsigned long, 4096ul, Allocator<false, false>, 15ul, 16ul> const*) @ 0x158db96f in /usr/bin/clickhouse 3. DB::MergeTreeDataPartWriterCompact::writeDataBlockPrimaryIndexAndSkipIndices(DB::Block const&, std::__1::vector<DB::Granule, std::__1::allocator<DB::Granule> > const&) @ 0x158d682e in /usr/bin/clickhouse 4. DB::MergeTreeDataPartWriterCompact::fillDataChecksums(DB::MergeTreeDataPartChecksums&) @ 0x158d7bc2 in /usr/bin/clickhouse 5. DB::MergeTreeDataPartWriterCompact::fillChecksums(DB::MergeTreeDataPartChecksums&) @ 0x158d847c in /usr/bin/clickhouse 6. DB::MergedBlockOutputStream::finalizePartAsync(std::__1::shared_ptr<DB::IMergeTreeDataPart>&, bool, DB::NamesAndTypesList const*, DB::MergeTreeDataPartChecksums*) @ 0x159c9396 in /usr/bin/clickhouse 7. DB::MutateAllPartColumnsTask::finalize() @ 0x159ee9c5 in /usr/bin/clickhouse 8. ? @ 0x159ecfec in /usr/bin/clickhouse 9. DB::MutatePlainMergeTreeTask::executeStep() @ 0x159d562e in /usr/bin/clickhouse 10. DB::MergeTreeBackgroundExecutor<DB::MergeMutateRuntimeQueue>::routine(std::__1::shared_ptr<DB::TaskRuntimeData>) @ 0xa3b9f1b in /usr/bin/clickhouse 11. DB::MergeTreeBackgroundExecutor<DB::MergeMutateRuntimeQueue>::threadFunction() @ 0xa3b9950 in /usr/bin/clickhouse 12. ThreadPoolImpl<ThreadFromGlobalPool>::worker(std::__1::__list_iterator<ThreadFromGlobalPool, void*>) @ 0xa4b38a6 in /usr/bin/clickhouse 13. void std::__1::__function::__policy_invoker<void ()>::__call_impl<std::__1::__function::__default_alloc_func<ThreadFromGlobalPool::ThreadFromGlobalPool<void ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda0'()>(void&&)::'lambda'(), void ()> >(std::__1::__function::__policy_storage const*) @ 0xa4b51f7 in /usr/bin/clickhouse 14. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0xa4b11c8 in /usr/bin/clickhouse 15. ? @ 0xa4b43dd in /usr/bin/clickhouse 16. ? @ 0x7fac3fccb609 in ? 17. clone @ 0x7fac3fbf0133 in ? (version 22.8.8.3 (official build)) 2023.03.30 10:29:17.039355 [ 20 ] {35ae2841-cf20-43d4-ae32-f7bcc0e99ad6::20230330_482_482_0_485} <Error> MutatePlainMergeTreeTask: Code: 10. DB::Exception: Not found column os_type in block. There are only columns: timestamp, id, trace_id, span_id, severity_text, severity_number, body, k8s_container_name, k8s_namespace_name, observed_timestamp, trace_flags, resources_string_key, resources_string_value, attributes_string_key, attributes_string_value, attributes_int64_key, attributes_int64_value, attributes_float64_key, attributes_float64_value. (NOT_FOUND_COLUMN_IN_BLOCK) (version 22.8.8.3 (official build)) 2023.03.30 10:29:17.041098 [ 20 ] {35ae2841-cf20-43d4-ae32-f7bcc0e99ad6::20230330_482_482_0_485} <Error> virtual bool DB::MutatePlainMergeTreeTask::executeStep(): Code: 10. DB::Exception: Not found column os_type in block. There are only columns: timestamp, id, trace_id, span_id, severity_text, severity_number, body, k8s_container_name, k8s_namespace_name, observed_timestamp, trace_flags, resources_string_key, resources_string_value, attributes_string_key, attributes_string_value, attributes_int64_key, attributes_int64_value, attributes_float64_key, attributes_float64_value. (NOT_FOUND_COLUMN_IN_BLOCK), Stack trace (when copying this message, always include the lines below):```

Photo of Pruthvi
Pruthvi
Thu, 30 Mar 2023 10:30:41 UTC

generally how do you connect to clickhouse db ?

Photo of nitya-signoz
nitya-signoz
Thu, 30 Mar 2023 10:31:22 UTC

you can directly exec into the pod

Photo of nitya-signoz
nitya-signoz
Thu, 30 Mar 2023 10:32:21 UTC

And run `drop database signoz_logs` for dropping the database

Photo of Pruthvi
Pruthvi
Thu, 30 Mar 2023 10:33:28 UTC

I have 2 shards, it needs to be done on both right to confirm ?

Photo of nitya-signoz
nitya-signoz
Thu, 30 Mar 2023 10:34:34 UTC

you can do `drop database signoz_logs on cluster cluster`

Photo of Pruthvi
Pruthvi
Thu, 30 Mar 2023 10:35:32 UTC

looks like it worked on dropping them, but if this occurs next time then is there no option apart from losing logs ?

Photo of nitya-signoz
nitya-signoz
Thu, 30 Mar 2023 10:37:40 UTC

No, we can get it back to a normal state just that you will have to check the migrations regarding what went wrong and will have to compare schemas. It will require more manual effort.

Photo of Pruthvi
Pruthvi
Thu, 30 Mar 2023 10:38:24 UTC

oh

Photo of Pruthvi
Pruthvi
Thu, 30 Mar 2023 10:38:42 UTC

If you don't mind another question on cold storage of Clickhouse

Photo of nitya-signoz
nitya-signoz
Thu, 30 Mar 2023 10:39:17 UTC

Sure

Photo of Pruthvi
Pruthvi
Thu, 30 Mar 2023 10:39:46 UTC

I have enabled cold storage on S3 and i saw that in the S3 bucket there was around 3GB of data. But some how i saw that there was lot of spike in cost of S3 usage .. NATbytesTransferred was around 120GB

Photo of Pruthvi
Pruthvi
Thu, 30 Mar 2023 10:39:51 UTC

How does S3 cold storage work ?

Photo of Pruthvi
Pruthvi
Thu, 30 Mar 2023 10:40:05 UTC

does signoz use it to read from S3 always ?

Photo of nitya-signoz
nitya-signoz
Thu, 30 Mar 2023 10:42:47 UTC

Have you enabled it for all metrics, traces and logs? Ideally, data is read from S3 only when you query that data it is fetched, apart from that it shouldn’t. For logs, it’s basically the timerange that you select. Ankit do you have more idea about the `NATbytesTransferred` ?

Photo of Pruthvi
Pruthvi
Thu, 30 Mar 2023 10:45:11 UTC

Photo of Pruthvi
Pruthvi
Thu, 30 Mar 2023 10:45:15 UTC

FYI cost spike in AWS

Photo of Ankit
Ankit
Thu, 30 Mar 2023 10:55:12 UTC

yeah... Surprisingly I also observed spike in cost a few days back. It was RequestsTier1 for us too. And it is not for every saas user. I will be diving deeper soon into this. Pruthvi can you please create a github issue at SigNoz? At least we should do the analysis of cost. cc <@4K165d>

Photo of Pruthvi
Pruthvi
Thu, 30 Mar 2023 11:01:52 UTC

Sure will make an issue