#support

SigNoz Log Format and JSON Parsing

TLDR Luke asked about the log format for SigNoz and using operators to convert loggers. nitya-signoz shared resources about specifications and recommended using operators. Travis had issues with JSON parsing but resolved them by updating otel-agent-config.yaml.

Powered by Struct AI
pray2
17
6mo
Solved
Join the chat
Mar 14, 2023 (6 months ago)
Luke
Photo of md5-e51af255fc3e250dfeff7522d5054b13
Luke
04:54 PM
Is there a well-defined spec for what format logs should be in to be collected by SigNoz? It seems SigNoz expects the JSON to contain specific keys.
Luke
Photo of md5-e51af255fc3e250dfeff7522d5054b13
Luke
05:07 PM
We are using structlog in Python. Do you find people typically use operators to convert, or just modify their loggers directly to match the format?
nitya-signoz
Photo of md5-a52b9d6c34f193d9a1ff940024f36f77
nitya-signoz
05:11 PM
People mostly use operators to convert, but we are seeing new users who are directly using the SDK(mostly java) to send logs directly.

Since you are using python you can try the otel sdk for python, though support for logs is experimental as of now. https://github.com/open-telemetry/opentelemetry-python/tree/main/docs/examples/logs
05:12
nitya-signoz
05:12 PM
Also parsing becomes easier if you are logging in json or key value format.
Luke
Photo of md5-e51af255fc3e250dfeff7522d5054b13
Luke
05:12 PM
Yes, we use JSON
11:51
Luke
11:51 PM
Sorry, one more silly question. Should the keys of the JSON be things like span_id or SpanId ? The opentelemetry docs suggest the latter, But, some signoz docs (like this one https://signoz.io/docs/userguide/fluentd_to_signoz/#steps-to-recieve-logs-from-fluentd) seem to use span_id.
Mar 15, 2023 (6 months ago)
Luke
Photo of md5-e51af255fc3e250dfeff7522d5054b13
Luke
12:01 AM
https://opentelemetry.io/docs/reference/specification/protocol/file-exporter/#examples

These examples seem to use like severityText rather than SeverityText as well. Thats 3 potential variants…
nitya-signoz
Photo of md5-a52b9d6c34f193d9a1ff940024f36f77
nitya-signoz
03:38 AM
It doesn’t matter, you will have to use the traceParser regardless, here is how you do it. https://github.com/SigNoz/logs-benchmark/blob/0b2451e6108d8fa5fdd5808c4e174bd52b9d55d3/signoz/signoz-client/otel-collector-config.yaml#L22
Mar 16, 2023 (6 months ago)
Travis
Photo of md5-be309157e07e273684fb361e19b96b50
Travis
09:31 PM
Hey nitya-signoz, I'm a coworker of Luke's. I wanted to additionally mention that we have deployed Signoz via kubernetes and we're automatically seeing all the pod logs. Which receiver are these logs ingested by? OTLP? I noticed the OTLP receiver doesn't support operators. https://signoz.io/docs/userguide/logs/#operators-for-parsing-and-manipulating-logs

> The receivers FluentForward and OTLP doesn’t have operators. But for parsing them we can use logprocessor. i would have expected this to work:
    processors:
      logstransform:
        operators:
          - type: json_parser
            id: my_new_body
            parse_from: attributes.body

however, after restarting the collector, I'm still not seeing "my_new_body" as a field. any ideas?

I confirmed by checking the losgs that the processor is enabled:
 signoz-otel-collector 2023-03-16T21:26:55.811Z    info    pipelines/pipelines.go:90    Processor is starting...    {"kind": "processor", "name": "logstransform", "pipeline": "logs"}                             │
│ signoz-otel-collector 2023-03-16T21:26:55.811Z    info    pipelines/pipelines.go:94    Processor started.    {"kind": "processor", "name": "logstransform", "pipeline": "logs"}

but i do see a failure, since not all logs contain a body or are valid json (lots of the pod logs are not).
│ signoz-otel-collector 2023-03-16T21:29:03.909Z    error    helper/transformer.go:110    Failed to process entry    {"kind": "processor", "name": "logstransform", "pipeline": "logs", "operator_id": "my_new_body ││ ", "operator_type": "json_parser", "error": {"description": "Entry is missing the expected parse_from field.", "suggestion": "Ensure that all incoming entries contain the parse_from field." ...

a couple of quesetion:
1. is this the right way to go about this? should i be using operators on a receiver instead of using a processor?
2. if this error is preventing me from running logstransform on any logs, is there a way to filter which logs this runs on?
Mar 17, 2023 (6 months ago)
nitya-signoz
Photo of md5-a52b9d6c34f193d9a1ff940024f36f77
nitya-signoz
04:08 AM
The k8s logs are collected by the filelog/k8s receiver.
04:12
nitya-signoz
04:12 AM
If you see the json_parser configuration then you are parsing from attributes.body and the parsed attributes will be sent to attributes key only by default https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/stanza/docs/operators/json_parser.md . You can change it by changing the value of parse_to .

You can also use the if the key to parse if only the body key is present in attributes.

If you can help me with examples of what you are sending and what you are trying to extract, I can help.
Travis
Photo of md5-be309157e07e273684fb361e19b96b50
Travis
08:38 PM
ooh i see. i didn't have the filelog/k8s receiver configured in any way -- it just works by default i suppose?

so here's an example log that i'm currently seeing in signoz. i don't see an attributes key.

{
  "timestamp": 1679085202378150700,
  "id": "2N9hfxnx4K6pMEslQ4UBGZL0EWB",
  "trace_id": "",
  "span_id": "",
  "trace_flags": 0,
  "severity_text": "",
  "severity_number": 0,
  "body": "{\"body\": {\"http\": {\"method\": \"GET\", \"request_id\": \"5514ff9e43d94cbca171a6751ccae7ca\", \"version\": \"1.1\", \"user_agent\": \"kube-probe/1.24+\"}, \"network\": {\"client\": {\"ip\": \"10.0.3.226\", \"port\": 33064}}, \"duration\": 427268, \"request_id\": \"5514ff9e43d94cbca171a6751ccae7ca\", \"logger\": \"api.access\", \"filename\": \"main.py\", \"func_name\": \"logging_middleware\", \"lineno\": 74, \"message\": \"10.0.3.226:33064 - \\\"GET /api/v1/healthz HTTP/1.1\\\" 200\"}, \"severityText\": \"info\", \"timestamp\": \"2023-03-17T20:33:22.377798Z\", \"traceId\": \"5514ff9e43d94cbca171a6751ccae7ca\"}",
  "resources_string": {
    "host_name": "<hostname>",
    "k8s_cluster_name": "",
    "k8s_container_name": "mlcore-web",
    "k8s_container_restart_count": "0",
    "k8s_namespace_name": "mlcore",
    "k8s_node_name": "<nodename>",
    "k8s_pod_ip": "<k8s_pod_ip>",
    "k8s_pod_name": "mlcore-web-6876b7c7b9-2cxxx",
    "k8s_pod_start_time": "2023-03-17 13:55:03 +0000 UTC",
    "k8s_pod_uid": "caad5d5e-7a16-471d-8a5f-0459b5aa90c4",
    "os_type": "linux",
    "signoz_component": "otel-agent"
  },
  "attributes_string": {
    "log_file_path": "/var/log/pods/mlcore_mlcore-web-6876b7c7b9-2cxxx_7144c554-5d97-4774-ae17-6c39ef19a518/mlcore-web/0.log",
    "log_iostream": "stderr",
    "logtag": "F",
    "time": "2023-03-17T20:33:22.378150623Z"
  },
  "attributes_int": {},
  "attributes_float": {}
}

and here's my relevant otel-collector-config:
    receivers:
      filelog/k8s:
        include:
          - /var/log/pods/*/*/*.log
        exclude:
          - /var/log/pods/kube-system_*/*/*.log
        operators:
          - type: json_parser
            id: body_parser
            parse_from: attributes.body
            parse_to: attributes.parsed_body

i also have the filelog/k8s set in the pipelines.logs.receivers:
      pipelines:
        logs:
          receivers: [otlp, filelog/k8s]

it seems my json_parser is not working at all. i've tried adding any combination of attributes.body or just body or body.body and with/without parse_to, but i can't seem to see any difference.
09:21
Travis
09:21 PM
i even tried something as simple as:
    receivers:
      filelog/k8s:
        include:
          - /var/log/pods/*/*/*.log
        exclude:
          - /var/log/pods/kube-system_*/*/*.log
        operators:
          - type: add
            field: travis_key
            value: travis_val

but that causes otel-collector to fail starting up with an error:
Error: failed to get config: cannot unmarshal the configuration: 1 error(s) decoding:* error decoding 'receivers': error reading receivers configuration for "filelog/k8s": 1 error(s) decoding:* error decoding 'operators[0]': unmarshal to add: 1 error(s) decoding:* error decoding 'field': unrecognized prefix
2023/03/17 21:17:03 application run finished with error: failed to get config: cannot unmarshal the configuration: 1 error(s) decoding:* error decoding 'receivers': error reading receivers configuration for "filelog/k8s": 1 error(s) decoding:* error decoding 'operators[0]': unmarshal to add: 1 error(s) decoding:* error decoding 'field': unrecognized prefix
Mar 18, 2023 (6 months ago)
Travis
Photo of md5-be309157e07e273684fb361e19b96b50
Travis
12:41 AM
as an update, i realized this is in the otel-agent-config.yaml, not the otel-collector-config.yaml.

using operators there does seem to be working!
12:43
Travis
12:43 AM
however, i'm still seeing some weirdness when using the jsonparser.

i want to parse whatever arbitrary json my log might contain. i want to assume that we don't know _all
the keys ahead of time in signoz. is that possible?

otherwise, every time we add a field to our logs, we need to come configure the json parser to explicitly extract that field. this feels wrong.
12:56
Travis
12:56 AM
i guess, maybe to be more clear... i expected the jsonparser to leave me with a json field. it _does seem like it's parsing the field, but i can't actually use those nested values unless i move them?

here's my body after it's hit by the json_parser
  "body": "{\"filename\":\"main.py\",\"func_name\":\"logging_middleware\",\"http\":{\"method\":\"GET\",\"request_id\":\"TfHgVf2bYLlyDRSQT6YD8\",\"status_code\":200,\"url\":\"\",\"user_agent\":\"node-fetch\",\"version\":\"1.1\"},\"lineno\":74,\"logger\":\"api.access\",\"message\":\"10.0.2.174:40388 - \\\"GET /api/v1/accounts/iyvnjbnodqsfcfiwegflr/projects/3720/tasks/e2b98ed9-ba95-41bf-be6a-216df7ab57c9 HTTP/1.1\\\" 200\",\"network\":{\"client\":{\"ip\":\"10.0.2.174\",\"port\":40388}},\"request_id\":\"TfHgVf2bYLlyDRSQT6YD8\"}",

i can successfully do something like:
        - from: attributes.body.duration
          to: attributes.duration
          type: move

but i don't know all the keys that the body might contain, i just really want to be able to ad-hoc build queries that reference body.duration GTE &lt;some_value&gt;