Solved

Intermittent DNS Resolution Errors inside of pods

5 months ago
September 23, 2024
6 replies
168 views

mRiston
Sidekick (Customer)
11 replies

Has anyone ever had some of their pods have seemingly random DNS resolution errors? I’ve been fighting to get CWS up and running in my instance for entirely longer than reasonable, and with IFS’ help, they showed me how I can gain access to a prompt within one of my containers. Once I was able to do this, I was able to see if I could reliably reach out to my CWS instance internally, which, as shown in the extract below, flips around constantly.

ifsapp-reporting-cr-778cd9d965-75rgs:~$ curl -v https://myinternalcws.contoso.com/IFSCRWebSetup/IfsReportService.asmx 
Could not resolve host: myinternalcws.contoso.com 
shutting down connection #0 
curl: (6) Could not resolve host: myinternalcws.contoso.com

ifsapp-reporting-cr-778cd9d965-75rgs:~$ curl -v https://myinternalcws.contoso.com/IFSCRWebSetup/IfsReportService.asmx 
Host myinternalcws.contoso.com:443 was resolved. 
IPv6: (none)  
IPv4: 172.16.12.220 
Trying 172.16.12.220:443...
[…successful connection details here...] 
Connection #0 to host myinternalcws.contoso.com left intact 
ifsapp-reporting-cr-778cd9d965-75rgs:~$

I can continually repeat this curl command back to back from each other, and it’s an absolute coinflip if the DNS resolution happens without issue.

I was initially hopeful that I could find a setting in the ifscloud-values.yaml file or elsewhere that would allow me to explicitly tell my pods that I want to point `contoso.com` to an internal DNS server, however I’ve not been able to identify that. I’m also not able to validate this as a feasible option forward as when in the terminal of the reporting-cr pod, the logged in user doesn’t have sudo privileges.

It does seem that I’m able to reliably resolve `contoso.local` from the pod, however when configuring the pod with a self signed certificate, and putting said certificate in the `/secrets` directory and configuration blocks, the reporting pod still lashes back out with me about warnings over a self signed certificate, and drops the connection.

A case is open with IFS, but I’m just hoping someone else out here has seen something similar and could possibly give any suggestions.

Best answer by Ben Monroe

Hi Michael,

Thank you for the details. As you noticed, the default public Google DNS servers will naturally not be able to resolve internal DNS hosts.

While setting up the middle tier cluster, did you configure the cluster DNS with the SETK8SDNS option?

.\main.ps1 -resource 'SETK8SDNS'

Reference: link

Best regards -- Ben

View original

Did this topic help you find an answer to your question?

+14

Ben Monroe
Superhero (Employee)
156 replies
5 months ago
September 24, 2024

Hi Michael,

A few things to check:

Please share the "Dns" configuration in ifsremote\ifsroot\config\main_config.json.
Please share the output of the following command: kubectl get configmap coredns -n kube-system -o yaml
Does the coredns forward address(es) match the Dns configuration in main_config.json? Is this IP address the expected DNS server?
Are you able to resolve the same DNS from outside the cluster? Such as from the IFS Management server using nslookup.
Do you have a firewall or proxy between your Kubernetes cluster and DNS servers? If so please check the firewall or proxy logs for any dropped packets.

Best regards -- Ben

+10

hhanse
Hero (Employee)
171 replies
5 months ago
September 24, 2024

Hi,

you can look in the coredns pod logs… it normally shows lots of warnings and timeouts in my experience… but maybe you can find something obvious there.

kubectl logs coredns -n kube-system

mRiston
Author
Sidekick (Customer)
11 replies
5 months ago
September 26, 2024

Ben Monroe wrote:

Hi Michael,

A few things to check:

Please share the "Dns" configuration in ifsremote\ifsroot\config\main_config.json.
Please share the output of the following command: kubectl get configmap coredns -n kube-system -o yaml
Does the coredns forward address(es) match the Dns configuration in main_config.json? Is this IP address the expected DNS server?
Are you able to resolve the same DNS from outside the cluster? Such as from the IFS Management server using nslookup.
Do you have a firewall or proxy between your Kubernetes cluster and DNS servers? If so please check the firewall or proxy logs for any dropped packets.

Best regards -- Ben

Thanks so much for the response Ben! I just got back onto this and was able to check out your second suggestion and instantly noticed that it’s trying to use Google DNS, despite my main_config.json pointing at internal servers.

`main_config.json` has the values "Dns": "172.16.28.50 172.16.28.51",

However, running `kubectl get configmap coredns -n kube-system -o yaml` results in;

> kubectl get configmap coredns -n kube-system -o yaml
apiVersion: v1
data:
  Corefile: |
    .:53 {
        errors
        health {
          lameduck 5s
        }
        ready
        log . {
          class error
        }
        kubernetes cluster.local in-addr.arpa ip6.arpa {
          pods insecure
          fallthrough in-addr.arpa ip6.arpa
        }
        prometheus :9153
        forward . 8.8.8.8 8.8.4.4 /etc/resolv.conf
        cache 30
        loop
        reload
        loadbalance
    }
kind: ConfigMap
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","data":{"Corefile":".:53 {\n    errors\n    health {\n      lameduck 5s\n    }\n    ready\n    log . {\n      class error\n    }\n    kubernetes cluster.local in-addr.arpa ip6.arpa {\n      pods insecure\n      fallthrough in-addr.arpa ip6.arpa\n    }\n    prometheus :9153\n    forward . 8.8.8.8 8.8.4.4 /etc/resolv.conf\n    cache 30\n    loop\n    reload\n    loadbalance\n}\n"},"kind":"ConfigMap","metadata":{"annotations":{},"creationTimestamp":"2024-08-16T13:45:50Z","labels":{"addonmanager.kubernetes.io/mode":"EnsureExists","k8s-app":"kube-dns"},"name":"coredns","namespace":"kube-system","resourceVersion":"700","uid":"0ac3a074-5657-4af5-85a5-6ba9e548344c"}}
  creationTimestamp: "2024-08-16T13:45:50Z"
  labels:
    addonmanager.kubernetes.io/mode: EnsureExists
    k8s-app: kube-dns
  name: coredns
  namespace: kube-system
  resourceVersion: "782"
  uid: 0ac3a074-5657-4af5-85a5-6ba9e548344c

Presuming my linux VM was pulling it’s DNS settings from /etc/resolv.conf, I went and confirmed status of resolv with `systemd-resolve --status` and can confirm that the DNS Domains listed include my various contoso.com, contoso.local, etc, and the listed DNS servers on eth0 are correct. Interestingly enough, after a reboot, my other subdomains (specifically contoso.com), was gone from systemd-resolve --status… Digging into this further

+14

Ben Monroe
Superhero (Employee)
156 replies
Answer
5 months ago
September 26, 2024

Hi Michael,

Thank you for the details. As you noticed, the default public Google DNS servers will naturally not be able to resolve internal DNS hosts.

While setting up the middle tier cluster, did you configure the cluster DNS with the SETK8SDNS option?

.\main.ps1 -resource 'SETK8SDNS'

Reference: link

Best regards -- Ben

+10

hhanse
Hero (Employee)
171 replies
5 months ago
September 27, 2024

Ben is most certainly right in that ‘SETK8SDNS’ need to be run in your case.

FYI - I came across similar issue in an Azure deployed environment earlier this week, where we got intermittent database connectivity over a Oracle SCAN address.
In this case there was a list of 6 DNS server IP addresses where at least 2 (8.8.8.8 and 8.8.4.4) of them probably didn’t resolve the hostname to a ExaData SCAN address correctly. I replaced all 6 DNS servers with the the Azure default DNS IP 168.63.129.16, after that the SCAN address was correctly resolved every time.

mRiston
Author
Sidekick (Customer)
11 replies
5 months ago
September 27, 2024

Ben Monroe wrote:

Hi Michael,

Thank you for the details. As you noticed, the default public Google DNS servers will naturally not be able to resolve internal DNS hosts.

While setting up the middle tier cluster, did you configure the cluster DNS with the SETK8SDNS option?

.\main.ps1 -resource 'SETK8SDNS'

Reference: link

Best regards -- Ben

I hate the fact that I have to say this, but no, I didn’t run SETK8DNS. I was told during our handoff calls for our upgrade build, I was told everything we needed was contained and ran through .\main.ps1 without the resource callouts specifically.

I’ve now run SETK8DNS, and have deleted/recreated my MT, and tests are working now seemingly reliably. I’m going to sit here and try to break it =-). kubectl get configmap coredns -n kube-system -o yaml also now reports appropriate DNS servers.

Thank you so much for your help on this one! You’re a life saver!

Reply

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos

Reply

Related Topics

Alternative way to approve an observation checklist?icon

Who can schedule Observation Checklists?icon

Observation Checklist - Schedule Override?icon

Will the completion of an observation checklist by the learner or manager change the learner status to active usericon

Observation checklisticon

Did you find what you're looking for? If not:

Sign up

Login to the community

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings