Replacing kube-dns with CoreDNS on GKE

By default, Google Kubernetes Engine is shipped with kube-dns as DNS resolver service. For most use cases, kube-dns works just fine. However, some advanced use cases might require functionality that kube-dns doesn't support.

In my case, I am running a GKE cluster with preemptible VM instances. A preemptible instance is an instance that lives at most 24 hours and can be replaced with another instance at any point. This requires all applications that are running in the cluster to be highly-available (HA) and be able to continue functioning with all but 1 node removed from the cluster.

In my case, all the deployments are highly available and have affinity configured to spread pods to all nodes, so that at all nodes theres a pod that can continue serving traffic even if all other pods are removed due to removal preemptible nodes.  Yet, I still faced frequent downtimes when nodes were removed and added to the cluster.

After a short investigation, I figured out that the problem was in kube-dns, which didn't have affinity configured and thus sometimes placed all of its pods on a single node. When this node was removed, all kube-dns pods were removed and rescheduled on another node. While these pods were rescheduled, DNS resolution in the cluster was down, resulting in all the apps that depended on other services - like databases - being down. So I needed to either enable affinity for kube-dns, or replace it with CoreDNS with enabled affinity.

I tried enabling affinity for kube-dns first, but was unsuccessful. The GKE master plane automatically reconciles kube-dns changes and overrides all changes. So the only option left was to replace kube-dns with CoreDNS.

Installing CoreDNS

1. Generate deployment

git clone https://github.com/coredns/deployment.git
cd deployment/kubernetes
./deploy.sh > coredns-deployment.yaml

2. On line 110, replace affinity with the following lines:

affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      - topologyKey: kubernetes.io/hostname
        labelSelector:
          matchExpressions:
            - key: k8s-app
              operator: In
              values: ["kube-dns"]

3. On line 90, change replicas: 1 to the number of nodes you have in your cluster.

replicas: 2

4. Install CoreDNS

kubectl apply -f coredns-deployment-yaml

5. Scale down kube-dns deployment. Removing the deployment is not an option as kubernetes control plane will automatically recreate it.

kubectl -n kube-system scale --replicas=0 deployment/kube-dns

You're all set! You can test the installation by simulating removal of one of your nodes.

kubectl cordon <your-node-name>
kubectl drain <your-node-name> --ignore-daemonsets

After you're done with tests, bring the node back online.

kubectl uncordon <your-node-name>