Errors in Trace Export and Troubleshooting SigNoz Docker Deployment

TLDR Dipen faced errors during trace export in their docker deployment. Prashant suggested checking the OTLP http endpoint with telemetrygen and using the logging exporter with loglevel set to warn or error.

Photo of Dipen
Dipen
Tue, 13 Jun 2023 11:39:36 UTC

Hey folks, Weirdly for my standalone docker deployment, i am seeing errors in my application logs during trace export in production recently. This happens occasionally. Tried running the application locally by setting the `OTEL_EXPORTER_OTLP_ENDPOINT` to the prod signoz deployments and got some error. Below is my django logs ```Traceback (most recent call last): File "/Users/deepsea/Documents/Dukaan/py-order/lib/python3.9/site-packages/opentelemetry/sdk/_logs/export/__init__.py", line 259, in _export_batch self._exporter.export(self._log_records[:idx]) # type: ignore File "/Users/deepsea/Documents/Dukaan/py-order/lib/python3.9/site-packages/opentelemetry/exporter/otlp/proto/http/_log_exporter/__init__.py", line 142, in export resp = self._export(serialized_data) File "/Users/deepsea/Documents/Dukaan/py-order/lib/python3.9/site-packages/opentelemetry/exporter/otlp/proto/http/_log_exporter/__init__.py", line 113, in _export return ( File "/Users/deepsea/Documents/Dukaan/py-order/lib/python3.9/site-packages/requests/sessions.py", line 578, in post return self.request('POST', url, data=data, json=json, **kwargs) File "/Users/deepsea/Documents/Dukaan/py-order/lib/python3.9/site-packages/opentelemetry/instrumentation/requests/__init__.py", line 128, in instrumented_request return _instrumented_requests_call( File "/Users/deepsea/Documents/Dukaan/py-order/lib/python3.9/site-packages/opentelemetry/instrumentation/requests/__init__.py", line 245, in _instrumented_requests_call raise exception.with_traceback(exception.__traceback__) File "/Users/deepsea/Documents/Dukaan/py-order/lib/python3.9/site-packages/opentelemetry/instrumentation/requests/__init__.py", line 209, in _instrumented_requests_call result = call_wrapped() # *** PROCEED File "/Users/deepsea/Documents/Dukaan/py-order/lib/python3.9/site-packages/opentelemetry/instrumentation/requests/__init__.py", line 126, in call_wrapped return wrapped_request(self, method, url, *args, **kwargs) File "/Users/deepsea/Documents/Dukaan/py-order/lib/python3.9/site-packages/requests/sessions.py", line 530, in request resp = self.send(prep, **send_kwargs) File "/Users/deepsea/Documents/Dukaan/py-order/lib/python3.9/site-packages/opentelemetry/instrumentation/requests/__init__.py", line 148, in instrumented_send return _instrumented_requests_call( File "/Users/deepsea/Documents/Dukaan/py-order/lib/python3.9/site-packages/opentelemetry/instrumentation/requests/__init__.py", line 159, in _instrumented_requests_call return call_wrapped() File "/Users/deepsea/Documents/Dukaan/py-order/lib/python3.9/site-packages/opentelemetry/instrumentation/requests/__init__.py", line 146, in call_wrapped return wrapped_send(self, request, **kwargs) File "/Users/deepsea/Documents/Dukaan/py-order/lib/python3.9/site-packages/requests/sessions.py", line 643, in send r = adapter.send(request, **kwargs) File "/Users/deepsea/Documents/Dukaan/py-order/lib/python3.9/site-packages/requests/adapters.py", line 498, in send raise ConnectionError(err, request=request) requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))``` And i cant find any way of checking what exactly is happening. I am assuming the otel-collector is somehow not able to ingest the spans in clickhouse but i couldn’t find any way of viewing debug logs for the collector to understand the root cause. Is this probably a scale issue? I checked the troubleshooting guide and running the trouble shooting command gave following error. ```> sudo docker run -it --rm signoz/troubleshoot checkEndpoint --endpoint=172.17.0.1:4318 2023-06-13T11:36:21.026Z INFO troubleshoot/main.go:28 STARTING! 2023-06-13T11:36:21.026Z INFO checkEndpoint/checkEndpoint.go:41 checking reachability of SigNoz endpoint Error: not able to send data to SigNoz endpoint ... rpc error: code = Unavailable desc = connection closed before server preface received Usage: signoz checkEndpoint [flags] Examples: checkEndpoint -e localhost:4317 Flags: -e, --endpoint string URL to SigNoz with port -h, --help help for checkEndpoint``` Been stuck on this and google searches aren’t returning anything helpful.

Photo of Dipen
Dipen
Tue, 13 Jun 2023 11:42:01 UTC

We have a avg 14 Million spans generated per hour

Photo of Prashant
Prashant
Tue, 13 Jun 2023 17:23:28 UTC

Hey Dipen :wave: Troubleshooting only supports gRPC endpoint that would be `172.17.0.1:4317` instead.

Photo of Prashant
Prashant
Tue, 13 Jun 2023 17:24:13 UTC

You can use telemetrygen instead to check OTLP http endpoint.

Photo of Prashant
Prashant
Tue, 13 Jun 2023 17:26:07 UTC

> ```Connection aborted.', RemoteDisconnected('Remote end closed connection without response'``` it is possible that either the endpoint is not right or the signoz otel-collector is not healthy.

Photo of Dipen
Dipen
Wed, 14 Jun 2023 13:32:45 UTC

Generating telemetry data isn;t the problem, the problem is knowing whats happening inside collector. Is there any debug log option in collector docker-compose which can enable some logs telling where exactly otel-collector might be failing?

Photo of Prashant
Prashant
Wed, 14 Jun 2023 18:09:35 UTC

You can use logging exporter with `loglevel` set to either `warn` or `error`. Ref:

Photo of Prashant
Prashant
Wed, 14 Jun 2023 18:10:53 UTC

Update otel-collector config to include the `logging` exporter. Remember to include it in all desired pipelines as well.