+++ This bug was initially created as a clone of Bug #1990065 +++ +++ This bug was initially created as a clone of Bug #1987009 +++ * This specifically refers to networking daemonsets* As for: https://github.com/openshift/enhancements/blob/master/CONVENTIONS.md#upgrade-and-reconfiguration please set .spec.updateStrategy.rollingUpdate.maxUnavailable = 10% on daemonsets daemonset. Description of problem: Currently, all the daemonsets managed by the OpenShift Virtualization Operator default to a maxUnavailable of 1. This means that on large clusters the upgrade of the OpenShift virtualization operator takes a long time. For example on 120 node cluster, it took 5.5 hours just for the OpenShift Virtualization operator to upgrade. When customers set aside maintenance windows to upgrade their platform, the OCP platform upgrade itself takes lesser time than CNV operator upgrade, so that will be a pain point. [kni@e16-h18-b03-fc640 ~]$ for i in `oc get ds | grep 10h | awk {'print$1'}`; do echo -n $i; oc get ds/$i -o yaml | grep -i maxunavailable; done bridge-marker f:maxUnavailable: {} maxUnavailable: 1 hostpath-provisioner f:maxUnavailable: {} maxUnavailable: 1 kube-cni-linux-bridge-plugin f:maxUnavailable: {} maxUnavailable: 1 kubevirt-node-labeller f:maxUnavailable: {} maxUnavailable: 1 nmstate-handler f:maxUnavailable: {} maxUnavailable: 1 ovs-cni-amd64 f:maxUnavailable: {} maxUnavailable: 1 virt-handler f:maxUnavailable: {} maxUnavailable: 1 Currently all cluster operators in OCP have a maxUnavailable of at least 10% set. Clayton also recommends this as per https://bugzilla.redhat.com/show_bug.cgi?id=1920209#c14 Couple of options here: 1. ump maxUnavailable to 10% 2. Investigate if any pods in any of the daemonsets do not handle SIGTERM properly and as a result take a while to exit. Inthat case we should lower the `terminationGracePeriodSeconds` to somehting like 10s. Version-Release number of selected component (if applicable): CNV 2.6.5 How reproducible: 100% Steps to Reproduce: 1. Deploy large cluster 2. Install CNV 3. Upgrade CNV oeprator Actual results: Upgrade of CNV on 120 node cluser takes 5.5 hours Expected results: OpenShift Cluster Operator upgrade itself takes around 3hours on a 120 node cluster, so the CNV operator takes longer than all of OpenShift to upgrade. Additional info: --- Additional comment from Dan Kenigsberg on 2021-07-30 14:47:58 CEST --- Idea for a workaround: use https://docs.openshift.com/container-platform/4.8/virt/install/virt-specifying-nodes-for-virtualization-components.html#node-placement-hco_virt-specifying-nodes-for-virtualization-components to limit cnv daemonsets to the few workers where VMs run. This, however, is going to disable knmstate on most nodes, so you may want to revert it after upgrade. Maybe there's a way to use https://github.com/kubevirt/hyperconverged-cluster-operator/blob/main/docs/cluster-configuration.md#jsonpatch-annotations (requires support exception) to explicitly allow only knmstate everywhere? --- Additional comment from Sai Sindhur Malleni on 2021-07-30 16:33:17 CEST --- (In reply to Dan Kenigsberg from comment #1) > Idea for a workaround: use > https://docs.openshift.com/container-platform/4.8/virt/install/virt- > specifying-nodes-for-virtualization-components.html#node-placement-hco_virt- > specifying-nodes-for-virtualization-components to limit cnv daemonsets to > the few workers where VMs run. > > This, however, is going to disable knmstate on most nodes, so you may want > to revert it after upgrade. > > Maybe there's a way to use > https://github.com/kubevirt/hyperconverged-cluster-operator/blob/main/docs/ > cluster-configuration.md#jsonpatch-annotations (requires support exception) > to explicitly allow only knmstate everywhere? Thanks Dan. Sure, later versions of CNV/OCP do support this but when upgrading from 4.6.17 (whatever CNV operator version is on that) -> 4.7.11 this feature is missing. While the workaround will help this case, I think we all agree that we want to make sure the operator itself upgrades quickly enough when deployed at scale, if a customer really wants to use 120 nodes for CNV. So I do believe we can speed this up even when running on 120 nodes. --- Additional comment from Adam Litke on 2021-08-02 16:18:53 CEST --- the hostpath-provisioner DS can set maxUnavailable to infinity. The DS only needs to run when the node is running actual workloads. --- Additional comment from Simone Tiraboschi on 2021-08-02 16:47:07 CEST --- Currently on our daemonsets I see: $ oc get daemonset -n openshift-cnv -o=jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.updateStrategy.rollingUpdate.maxUnavailable}{"\n"}{end}' bridge-marker 1 kube-cni-linux-bridge-plugin 1 nmstate-handler 1 ovs-cni-amd64 1 virt-handler 1 Personally I'm simply for statically setting 10% on each on that as Clayton Coleman recommends here: https://bugzilla.redhat.com/show_bug.cgi?id=1920209#c14 Adam, Stu, Petr, Dominik: do you have any specific concerns about that? --- Additional comment from Petr Horáček on 2021-08-04 11:04:00 CEST --- I don't have any concerns as none of these network components is critical cluster-wide. I would be happy to share the work on this. @Simone, would you mind if I cloned this BZ and took over the network components? --- Additional comment from Dominik Holler on 2021-08-04 13:37:05 CEST --- SSP is not affected, because SSP does not have any daemon set. --- Additional comment from Simone Tiraboschi on 2021-08-04 18:24:10 CEST --- https://github.com/openshift/enhancements/pull/854 got merged so now the agreement is officially about setting: .spec.updateStrategy.rollingUpdate.maxUnavailable = 10% on all of our dameonsets. > @Simone, would you mind if I cloned this BZ and took over the network components? Yes, please do. I'm going to create also bugs for the other affected components. --- Additional comment from Simone Tiraboschi on 2021-08-05 07:16:51 UTC --- Because the fix for this is trivial but the impact on upgrades time really visible on large clusters, I'm also asking if we can consider backporting this down to 2.5.z. --- Additional comment from Petr Horáček on 2021-08-05 07:20:13 UTC --- I would not mind. Just let me know once there will be a commitment for 2.6 and 2.5 backports. --- Additional comment from Petr Horáček on 2021-08-05 07:26:38 UTC --- --- Additional comment from Simone Tiraboschi on 2021-08-09 12:20:01 UTC --- Thanks, so let's commit for 2.6.7. --- Additional comment from Petr Horáček on 2021-08-17 11:45:39 UTC --- This BZ is tracking backport to 4.8.2. I will create a clone for 2.6.7.
Fix for this was backported to 4.8.2. However, it was quite a lot of work to bump multiple components and do multiple backports. Removing this further 2.6.z backport to save capacity. If this becomes more urgent, please raise it.