ifs-monitoring with high availability installation

Question

Hello,We are trying to install the ifs-monitoring helm chart. In our test system with one middle tier server, it works great. We run .\main.ps1 -resource 'MONITORING', wait for the pods to start, and we can log in to Kibana and Grafana. However, our integration and production systems use high availability with three middle tier servers, and on those systems it’s not working. The ifs-monitoring pre-install-kibana pod fails to start with this error:Creating a new Elasticsearch token for KibanaCleaning previous tokenDELETE undefined failed: connect ECONNREFUSED 10.152.183.247:9200Error: connect ECONNREFUSED 10.152.183.247:9200 at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1606:16) { errno: -111, code: 'ECONNREFUSED', syscall: 'connect', address: '10.152.183.247', port: 9200}And the ifs-monitoring fluentd pod fails with this error:[38;5;6mfluentd [38;5;5m13:37:10.74 [0m[38;5;6mfluentd [38;5;5m13:37:10.74 [0m[1mWelcome to the Bitnami fluentd container[0m[38;5;6mfluentd [38;5;5m13:37:10.75 [0mSubscribe to project updates by watching [1mhttps://github.com/bitnami/containers[0m[38;5;6mfluentd [38;5;5m13:37:10.75 [0mSubmit issues and feature requests at [1mhttps://github.com/bitnami/containers/issues[0m[38;5;6mfluentd [38;5;5m13:37:10.75 [0m[38;5;6mfluentd [38;5;5m13:37:10.75 [0m[38;5;2mINFO [0m ==> ** Starting Fluentd setup **find: '/docker-entrypoint-initdb.d/': No such file or directory[38;5;6mfluentd [38;5;5m13:37:10.77 [0m[38;5;2mINFO [0m ==> No custom scripts in /docker-entrypoint-initdb.d[38;5;6mfluentd [38;5;5m13:37:10.77 [0m[38;5;2mINFO [0m ==> ** Fluentd setup finished! **[38;5;6mfluentd [38;5;5m13:37:10.82 [0m[38;5;2mINFO [0m ==> ** Starting Fluentd **2026-01-09 13:37:14 +0000 [info]: init supervisor logger path=nil rotate_age=nil rotate_size=nil2026-01-09 13:37:14 +0000 [info]: parsing config file is succeeded path="/opt/bitnami/fluentd/conf/fluentd.conf"2026-01-09 13:37:14 +0000 [info]: gem 'fluentd' version '1.16.0'2026-01-09 13:37:14 +0000 [info]: gem 'fluent-plugin-concat' version '2.5.0'2026-01-09 13:37:14 +0000 [info]: gem 'fluent-plugin-detect-exceptions' version '0.0.15'2026-01-09 13:37:14 +0000 [info]: gem 'fluent-plugin-elasticsearch' version '5.2.5'2026-01-09 13:37:14 +0000 [info]: gem 'fluent-plugin-grafana-loki' version '1.2.20'2026-01-09 13:37:14 +0000 [info]: gem 'fluent-plugin-kafka' version '0.18.1'2026-01-09 13:37:14 +0000 [info]: gem 'fluent-plugin-kubernetes_metadata_filter' version '3.1.3'2026-01-09 13:37:14 +0000 [info]: gem 'fluent-plugin-multi-format-parser' version '1.0.0'2026-01-09 13:37:14 +0000 [info]: gem 'fluent-plugin-prometheus' version '2.0.3'2026-01-09 13:37:14 +0000 [info]: gem 'fluent-plugin-record-modifier' version '2.1.1'2026-01-09 13:37:14 +0000 [info]: gem 'fluent-plugin-rewrite-tag-filter' version '2.4.0'2026-01-09 13:37:14 +0000 [info]: gem 'fluent-plugin-s3' version '1.7.2'2026-01-09 13:37:14 +0000 [info]: gem 'fluent-plugin-systemd' version '1.0.5'2026-01-09 13:37:14 +0000 [info]: gem 'fluentd' version '1.15.3'2026-01-09 13:37:17 +0000 [info]: adding forwarding server 'fluentd-0.fluentd-headless.ifs-monitoring.svc.cluster.local:24224' host="fluentd-0.fluentd-headless.ifs-monitoring.svc.cluster.local" port=24224 weight=60 plugin_id="object:ca44"/opt/bitnami/fluentd/gems/fluentd-1.16.0/lib/fluent/plugin/out_forward.rb:745:in `getaddrinfo': getaddrinfo: Name or service not known (SocketError) from /opt/bitnami/fluentd/gems/fluentd-1.16.0/lib/fluent/plugin/out_forward.rb:745:in `resolve_dns!' from /opt/bitnami/fluentd/gems/fluentd-1.16.0/lib/fluent/plugin/out_forward.rb:731:in `resolved_host' from /opt/bitnami/fluentd/gems/fluentd-1.16.0/lib/fluent/plugin/out_forward.rb:591:in `validate_host_resolution!' from /opt/bitnami/fluentd/gems/fluentd-1.16.0/lib/fluent/plugin/out_forward.rb:244:in `block in configure' from /opt/bitnami/fluentd/gems/fluentd-1.16.0/lib/fluent/plugin/out_forward.rb:239:in `each' from /opt/bitnami/fluentd/gems/fluentd-1.16.0/lib/fluent/plugin/out_forward.rb:239:in `configure' from /opt/bitnami/fluentd/gems/fluentd-1.16.0/lib/fluent/plugin.rb:187:in `configure' from /opt/bitnami/fluentd/gems/fluentd-1.16.0/lib/fluent/agent.rb:132:in `add_match' from /opt/bitnami/fluentd/gems/fluentd-1.16.0/lib/fluent/agent.rb:74:in `block in configure' from /opt/bitnami/fluentd/gems/fluentd-1.16.0/lib/fluent/agent.rb:64:in `each' from /opt/bitnami/fluentd/gems/fluentd-1.16.0/lib/fluent/agent.rb:64:in `configure' from /opt/bitnami/fluentd/gems/fluentd-1.16.0/lib/fluent/root_agent.rb:149:in `configure' from /opt/bitnami/fluentd/gems/fluentd-1.16.0/lib/fluent/engine.rb:105:in `configure' from /opt/bitnami/fluentd/gems/fluentd-1.16.0/lib/fluent/engine.rb:80:in `run_configure' from /opt/bitnami/fluentd/gems/fluentd-1.16.0/lib/fluent/supervisor.rb:571:in `run_supervisor' from /opt/bitnami/fluentd/gems/fluentd-1.16.0/lib/fluent/command/fluentd.rb:352:in `' from :85:in `require' from :85:in `require' from /opt/bitnami/fluentd/gems/fluentd-1.16.0/bin/fluentd:15:in `' from /opt/bitnami/fluentd/bin/fluentd:25:in `load' from /opt/bitnami/fluentd/bin/fluentd:25:in `

'I checked our main_config.json file, and all of the properties are correct as indicated in the high availability setup instructions. Linuxhost points to the load balancer, Nodes points to the three middle tier servers, and LoadBalancerPrivateIP is the IP address of the load balancer.Has anyone had success getting ifs-monitoring installed in a high availability environment?- Jeff

SZYDPER · Answer

Hi,We hit the same problem installing ifs-monitoring on a 3‑node HA environment (three middle‑tier servers behind a load balancer). The installation worked fine on a single‑node cluster, but consistently failed on multi‑node.🔍 Root causeIt wasn’t Fluentd/Filebeat, the load balancer, or ingress. The issue comes from how the IFS monitoring Helm charts schedule infrastructure workloads (Elasticsearch, Kibana, Prometheus, Grafana). They are deployed with a PriorityClass:ifs-infra-node-criticalThis implicitly requires at least one node to host “infrastructure” workloads.On a single‑node cluster, scheduling succeeds because the only node accepts everything.On a multi‑node cluster without an “infra” node, Elasticsearch cannot be scheduled (pods stay Pending), the pre-install-kibana hook timeouts, and Prometheus/Grafana remain Pending as well.✅ FixDesignate at least one node as an infrastructure node:kubectl label node <your-node-name> node-role.kubernetes.io/infra=trueAfter this, Kubernetes can schedule:elasticsearch-master-0/1/2,then Kibana (pre‑install hook succeeds),then Prometheus and Grafana.⚠️ Important follow‑up if you use Longhorn (PVCs stuck in Pending)If your cluster uses Longhorn as the default StorageClass, you may still see PVCs in Pending (and pods in Pending) after adding the infra label.That’s because the PVCs were originally created before the node was marked as infra, so Longhorn selected attachment placement that no longer matches the new scheduling requirements.Solution: delete the old PVCs (and optionally the related pods) in the ifs-monitoring namespace, then re‑run the monitoring installation so volumes are recreated after the infra label is in place.kubectl delete pvc -n ifs-monitoring --allkubectl delete pod -n ifs-monitoring --all# then re-run your monitoring installOnce the PVCs are recreated, they will bind to the infra node and all monitoring components should become Ready.📐 UAT architecture suggestion (3‑node example)Node 1 → infra: Elasticsearch, Kibana, Prometheus, Grafana, Filebeat (and ingress/cert-manager if applicable)Node 2 → app: IFS Web / Projections / ConnectNode 3 → app: IFS Web / Projections / ConnectThis cleanly separates infra from app workloads and avoids scheduling conflicts.🎉 ResultAfter labeling one node as infra and recreating the Longhorn PVCs, the ifs-monitoring stack deployed cleanly with all pods and PVCs in Running/Bound state.Didier

🔍 Root cause

✅ Fix

⚠️ Important follow‑up if you use Longhorn (PVCs stuck in Pending)

📐 UAT architecture suggestion (3‑node example)

🎉 Result

Did you find what you're looking for? If not:

Sign up

Login to the community

Scanning file for viruses.

This file cannot be downloaded