TLDR Pruthvi faced an issue with Kubernetes signoz-otel-collector. nitya-signoz suggested deleting the `signoz_logs` database and restarting collectors. Pruthvi then asked about Clickhouse cold storage on S3 and observed a spike in cost, which Ankit agreed to investigate further.
Hi, can you please tell me how did you reach this state? Did it happen when you were upgrading or did you make any changes to the logs schema manually ? Also if your existing logs data is not that important -> a hacky way to get things back to normal will be deleting the `signoz_logs` database.
I dropped the pvc claim of clickhouse.
And restarted clickhouse
As I had some issues in updating logs filter rules
What do you mean by “updating logs filter rules” ?
If the PV is the same then there shouldn’t be a problem, correct me if I am wrong <@4K165d>
added these in processors ```- type: filter expr: 'attributes.namespace == "signoz"' - type: filter expr: 'attributes.namespace == "tools"' - type: filter expr: 'attributes.container_name == "otc-container"'```
new volume has come up and i think otel-collector created the tables and got stuck in the migration
Oh, you deleted the PV as well. Applying filter processors wont cause any issues on otel-collector.
somehow it happened
Can you delete the `signoz_logs` database and restart your collectors.
and also i see these logs in clickhouse ```0. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0xa3ef75a in /usr/bin/clickhouse 1. DB::Block::getByName(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool) const @ 0x13ef0872 in /usr/bin/clickhouse 2. DB::getBlockAndPermute(DB::Block const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, DB::PODArray<unsigned long, 4096ul, Allocator<false, false>, 15ul, 16ul> const*) @ 0x158db96f in /usr/bin/clickhouse 3. DB::MergeTreeDataPartWriterCompact::writeDataBlockPrimaryIndexAndSkipIndices(DB::Block const&, std::__1::vector<DB::Granule, std::__1::allocator<DB::Granule> > const&) @ 0x158d682e in /usr/bin/clickhouse 4. DB::MergeTreeDataPartWriterCompact::fillDataChecksums(DB::MergeTreeDataPartChecksums&) @ 0x158d7bc2 in /usr/bin/clickhouse 5. DB::MergeTreeDataPartWriterCompact::fillChecksums(DB::MergeTreeDataPartChecksums&) @ 0x158d847c in /usr/bin/clickhouse 6. DB::MergedBlockOutputStream::finalizePartAsync(std::__1::shared_ptr<DB::IMergeTreeDataPart>&, bool, DB::NamesAndTypesList const*, DB::MergeTreeDataPartChecksums*) @ 0x159c9396 in /usr/bin/clickhouse 7. DB::MutateAllPartColumnsTask::finalize() @ 0x159ee9c5 in /usr/bin/clickhouse 8. ? @ 0x159ecfec in /usr/bin/clickhouse 9. DB::MutatePlainMergeTreeTask::executeStep() @ 0x159d562e in /usr/bin/clickhouse 10. DB::MergeTreeBackgroundExecutor<DB::MergeMutateRuntimeQueue>::routine(std::__1::shared_ptr<DB::TaskRuntimeData>) @ 0xa3b9f1b in /usr/bin/clickhouse 11. DB::MergeTreeBackgroundExecutor<DB::MergeMutateRuntimeQueue>::threadFunction() @ 0xa3b9950 in /usr/bin/clickhouse 12. ThreadPoolImpl<ThreadFromGlobalPool>::worker(std::__1::__list_iterator<ThreadFromGlobalPool, void*>) @ 0xa4b38a6 in /usr/bin/clickhouse 13. void std::__1::__function::__policy_invoker<void ()>::__call_impl<std::__1::__function::__default_alloc_func<ThreadFromGlobalPool::ThreadFromGlobalPool<void ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda0'()>(void&&)::'lambda'(), void ()> >(std::__1::__function::__policy_storage const*) @ 0xa4b51f7 in /usr/bin/clickhouse 14. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0xa4b11c8 in /usr/bin/clickhouse 15. ? @ 0xa4b43dd in /usr/bin/clickhouse 16. ? @ 0x7fac3fccb609 in ? 17. clone @ 0x7fac3fbf0133 in ? (version 22.8.8.3 (official build)) 2023.03.30 10:29:17.039355 [ 20 ] {35ae2841-cf20-43d4-ae32-f7bcc0e99ad6::20230330_482_482_0_485} <Error> MutatePlainMergeTreeTask: Code: 10. DB::Exception: Not found column os_type in block. There are only columns: timestamp, id, trace_id, span_id, severity_text, severity_number, body, k8s_container_name, k8s_namespace_name, observed_timestamp, trace_flags, resources_string_key, resources_string_value, attributes_string_key, attributes_string_value, attributes_int64_key, attributes_int64_value, attributes_float64_key, attributes_float64_value. (NOT_FOUND_COLUMN_IN_BLOCK) (version 22.8.8.3 (official build)) 2023.03.30 10:29:17.041098 [ 20 ] {35ae2841-cf20-43d4-ae32-f7bcc0e99ad6::20230330_482_482_0_485} <Error> virtual bool DB::MutatePlainMergeTreeTask::executeStep(): Code: 10. DB::Exception: Not found column os_type in block. There are only columns: timestamp, id, trace_id, span_id, severity_text, severity_number, body, k8s_container_name, k8s_namespace_name, observed_timestamp, trace_flags, resources_string_key, resources_string_value, attributes_string_key, attributes_string_value, attributes_int64_key, attributes_int64_value, attributes_float64_key, attributes_float64_value. (NOT_FOUND_COLUMN_IN_BLOCK), Stack trace (when copying this message, always include the lines below):```
generally how do you connect to clickhouse db ?
And run `drop database signoz_logs` for dropping the database
I have 2 shards, it needs to be done on both right to confirm ?
you can do `drop database signoz_logs on cluster cluster`
looks like it worked on dropping them, but if this occurs next time then is there no option apart from losing logs ?
No, we can get it back to a normal state just that you will have to check the migrations regarding what went wrong and will have to compare schemas. It will require more manual effort.
oh
If you don't mind another question on cold storage of Clickhouse
Sure
I have enabled cold storage on S3 and i saw that in the S3 bucket there was around 3GB of data. But some how i saw that there was lot of spike in cost of S3 usage .. NATbytesTransferred was around 120GB
How does S3 cold storage work ?
does signoz use it to read from S3 always ?
Have you enabled it for all metrics, traces and logs? Ideally, data is read from S3 only when you query that data it is fetched, apart from that it shouldn’t. For logs, it’s basically the timerange that you select. Ankit do you have more idea about the `NATbytesTransferred` ?
FYI cost spike in AWS
yeah... Surprisingly I also observed spike in cost a few days back. It was RequestsTier1 for us too. And it is not for every saas user. I will be diving deeper soon into this. Pruthvi can you please create a github issue at SigNoz? At least we should do the analysis of cost. cc <@4K165d>
Sure will make an issue
Pruthvi
Thu, 30 Mar 2023 09:50:03 UTCI am facing issue in Kubernetes signoz-otel-collector ```2023-03-30T09:40:18.329Z info kube/client.go:101 k8s filtering {"kind": "processor", "name": "k8sattributes", "pipeline": "metrics/generic", "labelSelector": "", "fieldSelector": "spec.nodeName=ip-10-0-6-126.ap-south-1.compute.internal"} 2023-03-30T09:40:18.469Z info clickhouselogsexporter/exporter.go:356 Running migrations from path: {"kind": "exporter", "data_type": "logs", "name": "clickhouselogsexporter", "test": "/logsmigrations"} Error: cannot build pipelines: failed to create "clickhouselogsexporter" exporter, in pipeline "logs": cannot configure clickhouse logs exporter: clickhouse Migrate failed to run, error: Dirty database version 5. Fix and force version. 2023/03/30 09:40:18 application run finished with error: cannot build pipelines: failed to create "clickhouselogsexporter" exporter, in pipeline "logs": cannot configure clickhouse logs exporter: clickhouse Migrate failed to run, error: Dirty database version 5. Fix and force version.```