Replacing kube-dns with CoreDNS on GKE
By default, Google Kubernetes Engine is shipped with kube-dns as DNS resolver service. For most use cases, kube-dns works just fine. However, some advanced use cases might require functionality that kube-dns doesn't support.
In my case, I am running a GKE cluster with preemptible VM instances. A preemptible instance is an instance that lives at most 24 hours and can be replaced with another instance at any point. This requires all applications that are running in the cluster to be highly-available (HA) and be able to continue functioning with all but 1 node removed from the cluster.
In my case, all the deployments are highly available and have affinity configured to spread pods to all nodes, so that at all nodes theres a pod that can continue serving traffic even if all other pods are removed due to removal preemptible nodes. Yet, I still faced frequent downtimes when nodes were removed and added to the cluster.
After a short investigation, I figured out that the problem was in kube-dns, which didn't have affinity configured and thus sometimes placed all of its pods on a single node. When this node was removed, all kube-dns pods were removed and rescheduled on another node. While these pods were rescheduled, DNS resolution in the cluster was down, resulting in all the apps that depended on other services - like databases - being down. So I needed to either enable affinity for kube-dns, or replace it with CoreDNS with enabled affinity.
I tried enabling affinity for kube-dns first, but was unsuccessful. The GKE master plane automatically reconciles kube-dns changes and overrides all changes. So the only option left was to replace kube-dns with CoreDNS.
1. Generate deployment
git clone https://github.com/coredns/deployment.git cd deployment/kubernetes ./deploy.sh > coredns-deployment.yaml
2. On line 110, replace
affinity with the following lines:
affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - topologyKey: kubernetes.io/hostname labelSelector: matchExpressions: - key: k8s-app operator: In values: ["kube-dns"]
3. On line 90, change
replicas: 1 to the number of nodes you have in your cluster.
4. Install CoreDNS
kubectl apply -f coredns-deployment-yaml
5. Scale down kube-dns deployment. Removing the deployment is not an option as kubernetes control plane will automatically recreate it.
kubectl -n kube-system scale --replicas=0 deployment/kube-dns
You're all set! You can test the installation by simulating removal of one of your nodes.
kubectl cordon <your-node-name> kubectl drain <your-node-name> --ignore-daemonsets
After you're done with tests, bring the node back online.
kubectl uncordon <your-node-name>