Skip to main content
Question

ifs-monitoring with high availability installation

  • January 12, 2026
  • 1 reply
  • 41 views

Forum|alt.badge.img+2

Hello,

We are trying to install the ifs-monitoring helm chart.  In our test system with one middle tier server, it works great.  We run .\main.ps1 -resource 'MONITORING', wait for the pods to start, and we can log in to Kibana and Grafana.  However, our integration and production systems use high availability with three middle tier servers, and on those systems it’s not working.  The ifs-monitoring pre-install-kibana pod fails to start with this error:

Creating a new Elasticsearch token for Kibana
Cleaning previous token
DELETE undefined failed: connect ECONNREFUSED 10.152.183.247:9200
Error: connect ECONNREFUSED 10.152.183.247:9200
at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1606:16) {
errno: -111,
code: 'ECONNREFUSED',
syscall: 'connect',
address: '10.152.183.247',
port: 9200
}

And the ifs-monitoring fluentd pod fails with this error:

[38;5;6mfluentd [38;5;5m13:37:10.74 [0m
[38;5;6mfluentd [38;5;5m13:37:10.74 [0m[1mWelcome to the Bitnami fluentd container[0m
[38;5;6mfluentd [38;5;5m13:37:10.75 [0mSubscribe to project updates by watching [1mhttps://github.com/bitnami/containers[0m
[38;5;6mfluentd [38;5;5m13:37:10.75 [0mSubmit issues and feature requests at [1mhttps://github.com/bitnami/containers/issues[0m
[38;5;6mfluentd [38;5;5m13:37:10.75 [0m
[38;5;6mfluentd [38;5;5m13:37:10.75 [0m[38;5;2mINFO [0m ==> ** Starting Fluentd setup **
find: '/docker-entrypoint-initdb.d/': No such file or directory
[38;5;6mfluentd [38;5;5m13:37:10.77 [0m[38;5;2mINFO [0m ==> No custom scripts in /docker-entrypoint-initdb.d
[38;5;6mfluentd [38;5;5m13:37:10.77 [0m[38;5;2mINFO [0m ==> ** Fluentd setup finished! **

[38;5;6mfluentd [38;5;5m13:37:10.82 [0m[38;5;2mINFO [0m ==> ** Starting Fluentd **
2026-01-09 13:37:14 +0000 [info]: init supervisor logger path=nil rotate_age=nil rotate_size=nil
2026-01-09 13:37:14 +0000 [info]: parsing config file is succeeded path="/opt/bitnami/fluentd/conf/fluentd.conf"
2026-01-09 13:37:14 +0000 [info]: gem 'fluentd' version '1.16.0'
2026-01-09 13:37:14 +0000 [info]: gem 'fluent-plugin-concat' version '2.5.0'
2026-01-09 13:37:14 +0000 [info]: gem 'fluent-plugin-detect-exceptions' version '0.0.15'
2026-01-09 13:37:14 +0000 [info]: gem 'fluent-plugin-elasticsearch' version '5.2.5'
2026-01-09 13:37:14 +0000 [info]: gem 'fluent-plugin-grafana-loki' version '1.2.20'
2026-01-09 13:37:14 +0000 [info]: gem 'fluent-plugin-kafka' version '0.18.1'
2026-01-09 13:37:14 +0000 [info]: gem 'fluent-plugin-kubernetes_metadata_filter' version '3.1.3'
2026-01-09 13:37:14 +0000 [info]: gem 'fluent-plugin-multi-format-parser' version '1.0.0'
2026-01-09 13:37:14 +0000 [info]: gem 'fluent-plugin-prometheus' version '2.0.3'
2026-01-09 13:37:14 +0000 [info]: gem 'fluent-plugin-record-modifier' version '2.1.1'
2026-01-09 13:37:14 +0000 [info]: gem 'fluent-plugin-rewrite-tag-filter' version '2.4.0'
2026-01-09 13:37:14 +0000 [info]: gem 'fluent-plugin-s3' version '1.7.2'
2026-01-09 13:37:14 +0000 [info]: gem 'fluent-plugin-systemd' version '1.0.5'
2026-01-09 13:37:14 +0000 [info]: gem 'fluentd' version '1.15.3'
2026-01-09 13:37:17 +0000 [info]: adding forwarding server 'fluentd-0.fluentd-headless.ifs-monitoring.svc.cluster.local:24224' host="fluentd-0.fluentd-headless.ifs-monitoring.svc.cluster.local" port=24224 weight=60 plugin_id="object:ca44"
/opt/bitnami/fluentd/gems/fluentd-1.16.0/lib/fluent/plugin/out_forward.rb:745:in `getaddrinfo': getaddrinfo: Name or service not known (SocketError)
from /opt/bitnami/fluentd/gems/fluentd-1.16.0/lib/fluent/plugin/out_forward.rb:745:in `resolve_dns!'
from /opt/bitnami/fluentd/gems/fluentd-1.16.0/lib/fluent/plugin/out_forward.rb:731:in `resolved_host'
from /opt/bitnami/fluentd/gems/fluentd-1.16.0/lib/fluent/plugin/out_forward.rb:591:in `validate_host_resolution!'
from /opt/bitnami/fluentd/gems/fluentd-1.16.0/lib/fluent/plugin/out_forward.rb:244:in `block in configure'
from /opt/bitnami/fluentd/gems/fluentd-1.16.0/lib/fluent/plugin/out_forward.rb:239:in `each'
from /opt/bitnami/fluentd/gems/fluentd-1.16.0/lib/fluent/plugin/out_forward.rb:239:in `configure'
from /opt/bitnami/fluentd/gems/fluentd-1.16.0/lib/fluent/plugin.rb:187:in `configure'
from /opt/bitnami/fluentd/gems/fluentd-1.16.0/lib/fluent/agent.rb:132:in `add_match'
from /opt/bitnami/fluentd/gems/fluentd-1.16.0/lib/fluent/agent.rb:74:in `block in configure'
from /opt/bitnami/fluentd/gems/fluentd-1.16.0/lib/fluent/agent.rb:64:in `each'
from /opt/bitnami/fluentd/gems/fluentd-1.16.0/lib/fluent/agent.rb:64:in `configure'
from /opt/bitnami/fluentd/gems/fluentd-1.16.0/lib/fluent/root_agent.rb:149:in `configure'
from /opt/bitnami/fluentd/gems/fluentd-1.16.0/lib/fluent/engine.rb:105:in `configure'
from /opt/bitnami/fluentd/gems/fluentd-1.16.0/lib/fluent/engine.rb:80:in `run_configure'
from /opt/bitnami/fluentd/gems/fluentd-1.16.0/lib/fluent/supervisor.rb:571:in `run_supervisor'
from /opt/bitnami/fluentd/gems/fluentd-1.16.0/lib/fluent/command/fluentd.rb:352:in `<top (required)>'
from <internal:/opt/bitnami/ruby/lib/ruby/site_ruby/3.1.0/rubygems/core_ext/kernel_require.rb>:85:in `require'
from <internal:/opt/bitnami/ruby/lib/ruby/site_ruby/3.1.0/rubygems/core_ext/kernel_require.rb>:85:in `require'
from /opt/bitnami/fluentd/gems/fluentd-1.16.0/bin/fluentd:15:in `<top (required)>'
from /opt/bitnami/fluentd/bin/fluentd:25:in `load'
from /opt/bitnami/fluentd/bin/fluentd:25:in `<main>'

I checked our main_config.json file, and all of the properties are correct as indicated in the high availability setup instructions.   Linuxhost points to the load balancer, Nodes points to the three middle tier servers, and  LoadBalancerPrivateIP is the IP address of the load balancer.

Has anyone had success getting ifs-monitoring installed in a high availability environment?

- Jeff

1 reply

  • Do Gooder (Customer)
  • February 20, 2026

 

Hi,

We hit the same problem installing ifs-monitoring on a 3‑node HA environment (three middle‑tier servers behind a load balancer). The installation worked fine on a single‑node cluster, but consistently failed on multi‑node.

🔍 Root cause

It wasn’t Fluentd/Filebeat, the load balancer, or ingress. The issue comes from how the IFS monitoring Helm charts schedule infrastructure workloads (Elasticsearch, Kibana, Prometheus, Grafana). They are deployed with a PriorityClass:

ifs-infra-node-critical

This implicitly requires at least one node to host “infrastructure” workloads.
On a single‑node cluster, scheduling succeeds because the only node accepts everything.
On a multi‑node cluster without an “infra” node, Elasticsearch cannot be scheduled (pods stay Pending), the pre-install-kibana hook timeouts, and Prometheus/Grafana remain Pending as well.

✅ Fix

Designate at least one node as an infrastructure node:

kubectl label node <your-node-name> node-role.kubernetes.io/infra=true

After this, Kubernetes can schedule:

  • elasticsearch-master-0/1/2,
  • then Kibana (pre‑install hook succeeds),
  • then Prometheus and Grafana.

⚠️ Important follow‑up if you use Longhorn (PVCs stuck in Pending)

If your cluster uses Longhorn as the default StorageClass, you may still see PVCs in Pending (and pods in Pending) after adding the infra label.
That’s because the PVCs were originally created before the node was marked as infra, so Longhorn selected attachment placement that no longer matches the new scheduling requirements.

Solution: delete the old PVCs (and optionally the related pods) in the ifs-monitoring namespace, then re‑run the monitoring installation so volumes are recreated after the infra label is in place.

kubectl delete pvc -n ifs-monitoring --all

kubectl delete pod -n ifs-monitoring --all

# then re-run your monitoring install

Once the PVCs are recreated, they will bind to the infra node and all monitoring components should become Ready.

📐 UAT architecture suggestion (3‑node example)

  • Node 1 → infra: Elasticsearch, Kibana, Prometheus, Grafana, Filebeat (and ingress/cert-manager if applicable)
  • Node 2 → app: IFS Web / Projections / Connect
  • Node 3 → app: IFS Web / Projections / Connect

This cleanly separates infra from app workloads and avoids scheduling conflicts.

🎉 Result

After labeling one node as infra and recreating the Longhorn PVCs, the ifs-monitoring stack deployed cleanly with all pods and PVCs in Running/Bound state.

 

Didier