#support

Issues with Installing Signoz on Standalone VM with Two Additional Disks

TLDR vvpreo is having difficulty installing Signoz on a standalone VM due to an issue with OTEL collector not starting because of migration problems. Ankit and vishal-signoz proposed various solutions, but the issue remained unresolved. They planned a screen-sharing session to resolve it.

Powered by Struct AI

1

Oct 02, 2023 (2 months ago)
vvpreo
Photo of md5-01b4600434aa419becd17a9f7773e2ff
vvpreo
07:05 AM
Hello, Signoz practitioners,

I am trying to install Signoz on standalone VM with two additional disks mounted for Clickhouse.

Almost all services started using docker containers, but OTEL collector can't because of problems with migrations.

I am using main branch from Github repository. (current verisons mentioned there). I've adopted one docker-compose to work with two partitions for Clickhouse.

But all I see is

Error: invalid configuration: service::pipeline::traces: references exporter "clickhousetraces" which is not configured
2023/10/02 07:06:54 application run finished with error: invalid configuration: service::pipeline::traces: references exporter "clickhousetraces" which is not configured
Oct 03, 2023 (2 months ago)
Ankit
Photo of md5-dbe7088320fe1d922707613e02f3420d
Ankit
04:16 AM
vishal-signoz any idea why this might be happening? vvpreo can you confirm this is not fixed yet?
04:16
Ankit
04:16 AM
Also, vvpreo I am guessing you might have misconfigured the otel-collector config. Can you paste your config here for us to have a look?
vvpreo
Photo of md5-01b4600434aa419becd17a9f7773e2ff
vvpreo
04:17 AM
I Confirm. not solved
04:18
vvpreo
04:18 AM
I am using several docker compose files instead of one (if it matters.
04:19
vvpreo
04:19 AM
Ankit
04:21
vvpreo
04:21 AM
This is config
receivers:
  tcplog/docker:
    listen_address: "0.0.0.0:2255"
    operators:
      - type: regex_parser
        regex: '^<([0-9]+)>[0-9]+ (?P<timestamp>[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}(\.[0-9]+)?([zZ]|([\+-])([01]\d|2[0-3]):?([0-5]\d)?)?) (?P<container_id>\S+) (?P<container_name>\S+) [0-9]+ - -( (?P<body>.*))?'
        timestamp:
          parse_from: attributes.timestamp
          layout: '%Y-%m-%dT%H:%M:%S.%LZ'
      - type: move
        from: attributes["body"]
        to: body
      - type: remove
        field: attributes.timestamp
        # please remove names from below if you want to collect logs from them
      - type: filter
        id: signoz_logs_filter
        expr: 'attributes.container_name matches "^signoz-(logspout|frontend|alertmanager|query-service|otel-collector|otel-collector-metrics|clickhouse-1|clickhouse-2|zookeeper)"'
  opencensus:
    endpoint: 0.0.0.0:55678
  otlp/spanmetrics:
    protocols:
      grpc:
        endpoint: localhost:12345
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
  jaeger:
    protocols:
      grpc:
        endpoint: 0.0.0.0:14250
      thrift_http:
        endpoint: 0.0.0.0:14268
      # thrift_compact:
      #   endpoint: 0.0.0.0:6831
      # thrift_binary:
      #   endpoint: 0.0.0.0:6832
  hostmetrics:
    collection_interval: 30s
    scrapers:
      cpu: {}
      load: {}
      memory: {}
      disk: {}
      filesystem: {}
      network: {}
  prometheus:
    config:
      global:
        scrape_interval: 60s
      scrape_configs:
        # otel-collector internal metrics
        - job_name: otel-collector
          static_configs:
          - targets:
              - localhost:8888
            labels:
              job_name: otel-collector


processors:
  logstransform/internal:
    operators:
      - type: trace_parser
        if: '"trace_id" in attributes or "span_id" in attributes'
        trace_id:
          parse_from: attributes.trace_id
        span_id:
          parse_from: attributes.span_id
        output: remove_trace_id
      - type: trace_parser
        if: '"traceId" in attributes or "spanId" in attributes'
        trace_id:
          parse_from: attributes.traceId
        span_id:
          parse_from: attributes.spanId
        output: remove_traceId
      - id: remove_traceId
        type: remove
        if: '"traceId" in attributes'
        field: attributes.traceId
        output: remove_spanId
      - id: remove_spanId
        type: remove
        if: '"spanId" in attributes'
        field: attributes.spanId
      - id: remove_trace_id
        type: remove
        if: '"trace_id" in attributes'
        field: attributes.trace_id
        output: remove_span_id
      - id: remove_span_id
        type: remove
        if: '"span_id" in attributes'
        field: attributes.span_id
  batch:
    send_batch_size: 10000
    send_batch_max_size: 11000
    timeout: 10s
  signozspanmetrics/prometheus:
    metrics_exporter: prometheus
    latency_histogram_buckets: [100us, 1ms, 2ms, 6ms, 10ms, 50ms, 100ms, 250ms, 500ms, 1000ms, 1400ms, 2000ms, 5s, 10s, 20s, 40s, 60s ]
    dimensions_cache_size: 100000
    dimensions:
      - name: service.namespace
        default: default
      - name: deployment.environment
        default: default
      # This is added to ensure the uniqueness of the timeseries
      # Otherwise, identical timeseries produced by multiple replicas of
      # collectors result in incorrect APM metrics
      - name: 'signoz.collector.id'
  # memory_limiter:
  #   # 80% of maximum memory up to 2G
  #   limit_mib: 1500
  #   # 25% of limit up to 2G
  #   spike_limit_mib: 512
  #   check_interval: 5s
  #
  #   # 50% of the maximum memory
  #   limit_percentage: 50
  #   # 20% of max memory usage spike expected
  #   spike_limit_percentage: 20
  # queued_retry:
  #   num_workers: 4
  #   queue_size: 100
  #   retry_on_failure: true
  resourcedetection:
    # Using OTEL_RESOURCE_ATTRIBUTES envvar, env detector adds custom labels.
    detectors: [env, system] # include ec2 for AWS, gcp for GCP and azure for Azure.
    timeout: 2s

extensions:
  health_check:
    endpoint: 0.0.0.0:13133
  zpages:
    endpoint: 0.0.0.0:55679
  pprof:
    endpoint: 0.0.0.0:1777

exporters:
  clickhousetraces:
    datasource: 
    docker_multi_node_cluster: ${DOCKER_MULTI_NODE_CLUSTER}
    low_cardinal_exception_grouping: ${LOW_CARDINAL_EXCEPTION_GROUPING}
  clickhousemetricswrite:
    endpoint: 
    resource_to_telemetry_conversion:
      enabled: true
  clickhousemetricswrite/prometheus:
    endpoint: 
  prometheus:
    endpoint: 0.0.0.0:8889
  # logging: {}

  clickhouselogsexporter:
    dsn: 
    docker_multi_node_cluster: ${DOCKER_MULTI_NODE_CLUSTER}
    timeout: 5s
    sending_queue:
      queue_size: 100
    retry_on_failure:
      enabled: true
      initial_interval: 5s
      max_interval: 30s
      max_elapsed_time: 300s

service:
  telemetry:
    metrics:
      address: 0.0.0.0:8888
  extensions:
    - health_check
    - zpages
    - pprof
  pipelines:
    traces:
      receivers: [jaeger, otlp]
      processors: [signozspanmetrics/prometheus, batch]
      exporters: [clickhousetraces]
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [clickhousemetricswrite]
    metrics/generic:
      receivers: [hostmetrics]
      processors: [resourcedetection, batch]
      exporters: [clickhousemetricswrite]
    metrics/prometheus:
      receivers: [prometheus]
      processors: [batch]
      exporters: [clickhousemetricswrite/prometheus]
    metrics/spanmetrics:
      receivers: [otlp/spanmetrics]
      exporters: [prometheus]
    logs:
      receivers: [otlp, tcplog/docker]
      processors: [logstransform/internal, batch]
      exporters: [clickhouselogsexporter]
04:23
vvpreo
04:23 AM
This is compose:
services:
  otel-collector:
    image: signoz/signoz-otel-collector:${OTELCOL_TAG:-0.79.7}
    container_name: signoz-otel-collector
#    restart: unless-stopped
    privileged: true
    # entrypoint: ["sleep", "9999999999"]
    # ./signoz-collector --config=/etc/otel-collector-config.yaml --feature-gates=-pkg.translator.prometheus.NormalizeName
    command:
      [
        "--config=/etc/otel-collector-config.yaml",
        "--feature-gates=-pkg.translator.prometheus.NormalizeName",
      ]
    user: root # required for reading docker container logs
    volumes:
      - "{{remote_project_data_dir}}/otel-collector-config.yaml:/etc/otel-collector-config.yaml"
      - /var/lib/docker/containers:/var/lib/docker/containers:ro
    environment:
      - OTEL_RESOURCE_ATTRIBUTES=host.name=signoz-host,os.type=linux
      - DOCKER_MULTI_NODE_CLUSTER=false
      - LOW_CARDINAL_EXCEPTION_GROUPING=false
    ports:
      # - "1777:1777"     # pprof extension
      - "4317:4317" # OTLP gRPC receiver
      - "4318:4318" # OTLP HTTP receiver
      # - "8888:8888"     # OtelCollector internal metrics
      # - "8889:8889"     # signoz spanmetrics exposed by the agent
      # - "9411:9411"     # Zipkin port
      # - "13133:13133"   # health check extension
      # - "14250:14250"   # Jaeger gRPC
      # - "14268:14268"   # Jaeger thrift HTTP
      # - "55678:55678"   # OpenCensus receiver
      # - "55679:55679"   # zPages extension

    networks:
      - traefik-internal

  otel-collector-metrics:
    image: signoz/signoz-otel-collector:${OTELCOL_TAG:-0.79.7}
    container_name: signoz-otel-collector-metrics
    privileged: true
    command:
      [
        "--config=/etc/otel-collector-metrics-config.yaml",
        "--feature-gates=-pkg.translator.prometheus.NormalizeName",
      ]
    volumes:
      - "{{remote_project_data_dir}}/otel-collector-metrics-config.yaml:/etc/otel-collector-metrics-config.yaml"
    # ports:
    #   - "1777:1777"     # pprof extension
    #   - "8888:8888"     # OtelCollector internal metrics
    #   - "13133:13133"   # Health check extension
    #   - "55679:55679"   # zPages extension
    restart: unless-stopped
    networks:
      - traefik-internal

  logspout:
    image: "gliderlabs/logspout:v3.2.14"
    container_name: signoz-logspout
    volumes:
      - /etc/hostname:/etc/host_hostname:ro
      - /var/run/docker.sock:/var/run/docker.sock
    command: 
    depends_on:
      - otel-collector
    restart: on-failure

networks:
  traefik-internal:
    external: true
vishal-signoz
Photo of md5-f936d3e5743d23344d6c60813189716f
vishal-signoz
09:07 AM
&gt; Almost all services started using docker containers, but OTEL collector can’t because of problems with migrations.
vvpreo Please share any logs related to migrations which you see
vvpreo
Photo of md5-01b4600434aa419becd17a9f7773e2ff
vvpreo
09:08 AM
moment
09:09
vvpreo
09:09 AM
2023-10-03T08:36:38.292Z        info    clickhouselogsexporter/exporter.go:455  Running migrations from path:   {"kind": "exporter", "data_type": "logs", "name": "clickhouselogsexporter", "test": "/logsmigrations"}
Error: failed to build pipelines: failed to create "clickhouselogsexporter" exporter for data type "logs": cannot configure clickhouse logs exporter: clickhouse Migrate failed to run, error: Dirty database version 1. Fix and force version.
2023/10/03 08:36:38 application run finished with error: failed to build pipelines: failed to create "clickhouselogsexporter" exporter for data type "logs": cannot configure clickhouse logs exporter: clickhouse Migrate failed to run, error: Dirty database version 1. Fix and force version.
09:11
vvpreo
09:11 AM
I just reinstalled clickhouse, disabled collector-metrics, logspout, And migrations passed. But later I uncommented lines in config (which were problematic) and now see what I've sent to you.
09:11
vvpreo
09:11 AM
Anyway Error is the same since beggining:

cannot configure clickhouse logs exporter: clickhouse Migrate failed to run, error: Dirty database version 1. Fix and force version
09:15
vvpreo
09:15 AM
May be it is possible to control launch migrations?
To be sure, that I've launched migrations only once
vishal-signoz
Photo of md5-f936d3e5743d23344d6c60813189716f
vishal-signoz
09:16 AM
vvpreo Please connect to clickhouse container and run these commands:
docker exec -it signoz-clickhouse /bin/bash

// connect to clickhouse client
clickhouse client

// clickhouse queries
use signoz_logs;
drop table schema_migrations;
drop table logs_attribute_keys on CLUSTER cluster;
vvpreo
Photo of md5-01b4600434aa419becd17a9f7773e2ff
vvpreo
09:17 AM
moment
vishal-signoz
Photo of md5-f936d3e5743d23344d6c60813189716f
vishal-signoz
09:19 AM
https://signoz-community.slack.com/archives/C01HWQ1R0BC/p1696324551005369?thread_ts=1696230324.525709&amp;cid=C01HWQ1R0BC
Yes there’s already an issue on this.
We are working on this and this will be fixed soon.
vvpreo
Photo of md5-01b4600434aa419becd17a9f7773e2ff
vvpreo
09:21 AM
Here what I've got
Image 1 for Here what I've got
vishal-signoz
Photo of md5-f936d3e5743d23344d6c60813189716f
vishal-signoz
09:22 AM
That’s fine, can you restart docker containers now?
vvpreo
Photo of md5-01b4600434aa419becd17a9f7773e2ff
vvpreo
09:23 AM
OTEL collector?
vishal-signoz
Photo of md5-f936d3e5743d23344d6c60813189716f
vishal-signoz
09:23 AM
Yes
vvpreo
Photo of md5-01b4600434aa419becd17a9f7773e2ff
vvpreo
09:23 AM
moment
09:26
vvpreo
09:26 AM
Dead again
Image 1 for Dead again
nitya-signoz
Photo of md5-a52b9d6c34f193d9a1ff940024f36f77
nitya-signoz
09:30 AM
can you bring up the collectors one by one
09:31
nitya-signoz
09:31 AM
first try stopping the crashing collectors and then run the commands for deleting the table.
09:31
nitya-signoz
09:31 AM
We can get on a huddle if you want to fix this over by sharing your screen
vvpreo
Photo of md5-01b4600434aa419becd17a9f7773e2ff
vvpreo
09:32 AM
that would be great )
Oct 04, 2023 (2 months ago)
vvpreo
Photo of md5-01b4600434aa419becd17a9f7773e2ff
vvpreo
02:47 AM
Thank you for your help!
One more question:

Do these commands delete data? or only migrations info?
// clickhouse queries
use signoz_logs;
drop table schema_migrations on CLUSTER cluster;
drop table logs_attribute_keys on CLUSTER cluster;
vishal-signoz
Photo of md5-f936d3e5743d23344d6c60813189716f
vishal-signoz
03:14 AM
Schema migrations table only stores migration data.
Attribute keys are metadata for suggestions for filters and aggregate attributes.

1

vvpreo
Photo of md5-01b4600434aa419becd17a9f7773e2ff
vvpreo
04:12 AM
Thank you

SigNoz Community

Built with ClickHouse as datastore, SigNoz is an open-source APM to help you find issues in your deployed applications & solve them quickly | Knowledge Base powered by Struct.AI

Indexed 1023 threads (61% resolved)

Join Our Community