Upstream DNS servers not being updated on Gravity

We have a cluster (really just one node) that is using 5.5.19, and we’re noticing the following behavior.

On cluster creation (i.e. when installing), the node had two DNS servers defined in /etc/resolv.conf, let’s call these dns1.company.com and dns2.company.com. These DNS servers got encoded in /etc/coredns/coredns.conf in the Planet container, as well as in the coredns ConfigMap for the CoreDNS pods in Kubernetes itself.

At some later point in time, this node had its DNS servers changed, and so now in /etc/resolv.conf on the node, it just has dns3.company.com as the only server, and the other two have actually been decommissioned.

We see a few issues on the cluster:

  1. Inside the Planet container, /etc/resolv.conf is unchanged (it still has 127.0.0.2, dns1.company.com, dns2.company.com).
  2. Inside the Planet container, /etc/coredns/coredns.conf is unchanged (i.e. it still refers to dns1.company.com and dns2.company.com), so even when resolving against 127.0.0.2 inside the container it doesn’t work.
  3. Inside the coredns ConfigMap, it doesn’t change, so it still refers to dns1.company.com and dns2.company.com

We tried rebooting the machine, restarting the Gravity service using systemctl, but no luck. We had to manually go and edit these configurations to get things to work.

To be specific, the following things start failing:

  1. Fetching images from Docker failed since Docker inside the Planet container could not resolve the Docker registry URL at registry.docker.com (or whatever the specific DNS name is).
  2. Communication from inside the cluster to external services (like RDS) broke, since the DNS entry could not be resolved.

My expectation was that if the DNS resolution configuration on the host changed that Gravity would update it, or at least provide a way for me to update it so I didn’t have to go and edit all these files manually (I missed editing /etc/coredns/coredns.conf on the first try, for example).

Is this expected? Is it a bug? Anything we can do to make this easier?

In order to update DNS resolvers, currently the layers involved are the following:

planet: During startup Planet generates resolve.conf based on hosts in /etc/resolv.conf. After any updates planet will need to be restarted or drained for new hosts to be captured. So in the scenario you have mentioned I assume when the DNS server was changes this did not take place.

cluster-dns: At install/upgrade time configmap gets written based on installer nodes from resolv.conf when operation is restarted. The CoreDNS configuration can be edited after installation, by updating the kube-system/coredns ConfigMap.

Currently these are the updates necessary to update DNS resolvers, I will follow up with the team on opportunities to make this easier.

@abdu thank you for the response.

So you’re saying the planet DNS update should happen automatically, but that the in-cluster coredns configuration has to happen manually? (ie go to every cluster and update the resolvers).

No, this is definitely not how it is supposed to work. I would also expect to be able to reset both the coredns configuration/ConfigMap with an explicit container restart. Ideally, there should be part of the reconcile loop inside the container to pick such changes from host.