Description of problem: Because ds/node-exporter defaults to rollingUpdate maxUnavialable 1 the rollout is entirely serialized and thus very slow on large clusters. We can speed the rollout of daemonsets which don't immediately affect availability by allowing the maxUnavailable to scale with cluster size. A quick test on a 250 node cluster shows that the current behavior takes around 100 minutes where as with maxUnavailable 10% it takes under 10 minutes. Version-Release number of selected component (if applicable): 4.4 How reproducible: 100% Steps to Reproduce: 1. Install a cluster that's got 20 or more hosts 2. Perform an upgrade 3. Observe that only one node-exporter pod is unavailable at once and the amount of time the upgrade takes. Actual results: 1 node-exporter unavailable at a time, slow rollout Expected results: 10% node-exporter pods unavailable at most, faster / more parallel rollout Additional info:
tested with 4.6.0-0.nightly-2020-09-12-230035, maxUnavailable for rollingUpdate is 10% # oc -n openshift-monitoring get ds node-exporter -oyaml -- updateStrategy: rollingUpdate: maxUnavailable: 10% type: RollingUpdate
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196