Description of problem:
Because ds/tuned defaults to rollingUpdate maxUnavialable 1 the rollout is entirely serialized and thus very slow on large clusters. We can speed the rollout of daemonsets which don't immediately affect availability by allowing the maxUnavailable to scale with cluster size.
A quick test on a 250 node cluster shows that the current behavior takes around 100 minutes where as with maxUnavailable 10% it takes under 10 minutes.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Install a cluster that's got 20 or more hosts
2. Perform an upgrade
3. Observe that only one pod is unavailable at once and the amount of time the upgrade takes.
1 pod unavailable at a time, slow rollout
10% pods unavailable at most, faster / more parallel rollout
$ oc get clusterversions.config.openshift.io
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.6.0-0.nightly-2020-08-18-165040 True False 4h14m Cluster version is 4.6.0-0.nightly-2020-08-18-165040
$ oc get ds tuned -n openshift-cluster-node-tuning-operator -o json | jq ".spec.updateStrategy"
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.