Skip to main content

Has anyone ever had some of their pods have seemingly random DNS resolution errors? I’ve been fighting to get CWS up and running in my instance for entirely longer than reasonable, and with IFS’ help, they showed me how I can gain access to a prompt within one of my containers. Once I was able to do this, I was able to see if I could reliably reach out to my CWS instance internally, which, as shown in the extract below, flips around constantly.

ifsapp-reporting-cr-778cd9d965-75rgs:~$ curl -v https://myinternalcws.contoso.com/IFSCRWebSetup/IfsReportService.asmx

Could not resolve host: myinternalcws.contoso.com

shutting down connection #0

curl: (6) Could not resolve host: myinternalcws.contoso.com

ifsapp-reporting-cr-778cd9d965-75rgs:~$ curl -v https://myinternalcws.contoso.com/IFSCRWebSetup/IfsReportService.asmx

Host myinternalcws.contoso.com:443 was resolved.

IPv6: (none)

IPv4: 172.16.12.220

Trying 172.16.12.220:443...
2…successful connection details here...]

Connection #0 to host myinternalcws.contoso.com left intact

ifsapp-reporting-cr-778cd9d965-75rgs:~$



I can continually repeat this curl command back to back from each other, and it’s an absolute coinflip if the DNS resolution happens without issue. 

 

I was initially hopeful that I could find a setting in the ifscloud-values.yaml file or elsewhere that would allow me to explicitly tell my pods that I want to point `contoso.com` to an internal DNS server, however I’ve not been able to identify that. I’m also not able to validate this as a feasible option forward as when in the terminal of the reporting-cr pod, the logged in user doesn’t have sudo privileges.

 

It does seem that I’m able to reliably resolve `contoso.local` from the pod, however when configuring the pod with a self signed certificate, and putting said certificate in the `/secrets` directory and configuration blocks, the reporting pod still lashes back out with me about warnings over a self signed certificate, and drops the connection.

 

A case is open with IFS, but I’m just hoping someone else out here has seen something similar and could possibly give any suggestions.

Hi Michael,

A few things to check:

  1. Please share the "Dns" configuration in ifsremote\ifsroot\config\main_config.json.
  2. Please share the output of the following command: kubectl get configmap coredns -n kube-system -o yaml
  3. Does the coredns forward address(es) match the Dns configuration in main_config.json? Is this IP address the expected DNS server?
  4. Are you able to resolve the same DNS from outside the cluster? Such as from the IFS Management server using nslookup.
  5. Do you have a firewall or proxy between your Kubernetes cluster and DNS servers? If so please check the firewall or proxy logs for any dropped packets.

Best regards -- Ben


Hi,

you can look in the coredns pod logs… it normally shows lots of warnings and timeouts in my experience… but maybe you can find something obvious there.

kubectl logs coredns -n kube-system


Hi Michael,

A few things to check:

  1. Please share the "Dns" configuration in ifsremote\ifsroot\config\main_config.json.
  2. Please share the output of the following command: kubectl get configmap coredns -n kube-system -o yaml
  3. Does the coredns forward address(es) match the Dns configuration in main_config.json? Is this IP address the expected DNS server?
  4. Are you able to resolve the same DNS from outside the cluster? Such as from the IFS Management server using nslookup.
  5. Do you have a firewall or proxy between your Kubernetes cluster and DNS servers? If so please check the firewall or proxy logs for any dropped packets.

Best regards -- Ben

Thanks so much for the response Ben! I just got back onto this and was able to check out your second suggestion and instantly noticed that it’s trying to use Google DNS, despite my main_config.json pointing at internal servers.

`main_config.json` has the values "Dns":  "172.16.28.50 172.16.28.51",

However, running `kubectl get configmap coredns -n kube-system -o yaml` results in; 

> kubectl get configmap coredns -n kube-system -o yaml
apiVersion: v1
data:
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
ready
log . {
class error
}
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
forward . 8.8.8.8 8.8.4.4 /etc/resolv.conf
cache 30
loop
reload
loadbalance
}
kind: ConfigMap
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","data":{"Corefile":".:53 {\n errors\n health {\n lameduck 5s\n }\n ready\n log . {\n class error\n }\n kubernetes cluster.local in-addr.arpa ip6.arpa {\n pods insecure\n fallthrough in-addr.arpa ip6.arpa\n }\n prometheus :9153\n forward . 8.8.8.8 8.8.4.4 /etc/resolv.conf\n cache 30\n loop\n reload\n loadbalance\n}\n"},"kind":"ConfigMap","metadata":{"annotations":{},"creationTimestamp":"2024-08-16T13:45:50Z","labels":{"addonmanager.kubernetes.io/mode":"EnsureExists","k8s-app":"kube-dns"},"name":"coredns","namespace":"kube-system","resourceVersion":"700","uid":"0ac3a074-5657-4af5-85a5-6ba9e548344c"}}
creationTimestamp: "2024-08-16T13:45:50Z"
labels:
addonmanager.kubernetes.io/mode: EnsureExists
k8s-app: kube-dns
name: coredns
namespace: kube-system
resourceVersion: "782"
uid: 0ac3a074-5657-4af5-85a5-6ba9e548344c

Presuming my linux VM was pulling it’s DNS settings from /etc/resolv.conf, I went and confirmed status of resolv with `systemd-resolve --status` and can confirm that the DNS Domains listed include my various contoso.com, contoso.local, etc, and the listed DNS servers on eth0 are correct. Interestingly enough, after a reboot, my other subdomains (specifically contoso.com), was gone from systemd-resolve --status… Digging into this further


Hi Michael,

Thank you for the details. As you noticed, the default public Google DNS servers will naturally not be able to resolve internal DNS hosts.

While setting up the middle tier cluster, did you configure the cluster DNS with the SETK8SDNS option?

.\main.ps1 -resource 'SETK8SDNS'

Reference: link

Best regards -- Ben


Ben is most certainly right in that ‘SETK8SDNS’  need to be run in your case.

FYI - I came across similar issue in an Azure deployed environment earlier this week, where we got intermittent database connectivity over a Oracle SCAN address.
In this case there was a list of 6 DNS server IP addresses where at least 2 (8.8.8.8 and 8.8.4.4) of them probably didn’t resolve the hostname to a ExaData SCAN address correctly. I replaced all 6 DNS servers with the the Azure default DNS IP 168.63.129.16, after that the SCAN address was correctly resolved every time.


Hi Michael,

Thank you for the details. As you noticed, the default public Google DNS servers will naturally not be able to resolve internal DNS hosts.

While setting up the middle tier cluster, did you configure the cluster DNS with the SETK8SDNS option?

.\main.ps1 -resource 'SETK8SDNS'

Reference: link

Best regards -- Ben

I hate the fact that I have to say this, but no, I didn’t run SETK8DNS. I was told during our handoff calls for our upgrade build, I was told everything we needed was contained and ran through .\main.ps1 without the resource callouts specifically. 

I’ve now run SETK8DNS, and have deleted/recreated my MT, and tests are working now seemingly reliably. I’m going to sit here and try to break it =-).  kubectl get configmap coredns -n kube-system -o yaml also now reports appropriate DNS servers.

 

Thank you so much for your help on this one! You’re a life saver! 


Reply