+++ This bug was initially created as a clone of Bug #1880148 +++ Description of problem: Since the dns architecture is fault tolerant we should roll out the dns daemonset more agressively. Most other cluster wide daemonsets which are not critical to local workload availability are now using maxUnavailable of 10%. In 250 node clusters this typically reduces the rollout time from around 100 minutes to 10 minutes. Please update your daemonset and operator status code to work with maxUnavailable of 10% so that upgrade time doesn't scale linearly with node count. https://github.com/openshift/cluster-dns-operator/blob/master/pkg/operator/controller/dns_status.go#L84
The original fix needs a follow-up before the backport can proceed.
verified with cluster launched by cluster-bot and passed # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.ci.test-2020-12-10-031026-ci-ln-kgv0vgb True False 4m25s Cluster version is 4.6.0-0.ci.test-2020-12-10-031026-ci-ln-kgv0vgb # oc -n openshift-dns get ds/dns-default -oyaml <---snip---> updateStrategy: rollingUpdate: maxUnavailable: 10% type: RollingUpdate status: currentNumberScheduled: 6
*** Bug 1917579 has been marked as a duplicate of this bug. ***
Bumping severity because every 4.6.z release that doesn't have this fix is going to roll out slowly.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.6.16 security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:0308