Troubleshooting Kubernetes Node Metrics Failure

TLDR Viljar reported issues with updating Kubernetes node metrics and visualizing the data. Prashant suggested restarting the collector, traced the issue and promised to consult the team. A related issue has been tracked.

Photo of Viljar
Viljar
Sat, 15 Jul 2023 16:56:02 UTC

Hey! So I upgraded to latest version with helm did the CRD updates. Everything started , but i just dont seem to get data about k8s_node_cpu_utilization or any other, however container metrics work just fine . Any ideas how to approach it ?

Photo of Viljar
Viljar
Sat, 15 Jul 2023 17:03:21 UTC

what makes things super strange that it somehow does not retrieve metrics even from past :disappointed: so dashboards and alerts are somewhat useless

Photo of Viljar
Viljar
Sat, 15 Jul 2023 17:14:40 UTC

with further testing seems that queries without variables & aggregrations return data while all aggregations seems to fail when using variables in query builder

Photo of Prashant
Prashant
Sun, 16 Jul 2023 07:25:52 UTC

Srikanth did anything change in the aggregations recently?

Photo of Viljar
Viljar
Sun, 16 Jul 2023 07:28:22 UTC

for more reference im using . And I have two clusters under Signoz monitoring , i think i might be a factor here cause with refreshes sometimes the metric apprears to display value

Photo of Viljar
Viljar
Sun, 16 Jul 2023 07:31:36 UTC

and issue seems only occur with Value type of visualization ,

Photo of Srikanth
Srikanth
Sun, 16 Jul 2023 08:18:03 UTC

Prashant I didn’t understand your question. We have not changed anything in aggregation. It might be an issue with json and vars.

Photo of Viljar
Viljar
Sun, 16 Jul 2023 08:29:29 UTC

is there anything i could help with ? getting some logs/data so it would be easier to reproduce ?

Photo of Prashant
Prashant
Sun, 16 Jul 2023 08:43:50 UTC

Viljar OtelAgent of K8sInfra chart is responsible for collecting kubelet stats metrics. You could try restarting the collector in the affected K8s clusters. ```kubectl rollout restart -n platform daemonset -l=```

Photo of Viljar
Viljar
Sun, 16 Jul 2023 08:46:17 UTC

okey i give it a go , i intially just restarted pods after deployment

Photo of Viljar
Viljar
Sun, 16 Jul 2023 08:46:55 UTC

but later - im currently out of office

Photo of Prashant
Prashant
Sun, 16 Jul 2023 08:48:48 UTC

I think I am able to reproduce your issue with number widget and query-builder.

Photo of Prashant
Prashant
Sun, 16 Jul 2023 08:49:14 UTC

Let me share my finding with my team and will get to you later.

Photo of Viljar
Viljar
Sun, 16 Jul 2023 08:49:41 UTC

thank you so much

Photo of Prashant
Prashant
Mon, 17 Jul 2023 06:07:02 UTC

Tracking issue: