Created attachment 1751432 [details] reproducer cluster objects Description of problem: When applying sriovNetworkNodePolicy in conjunction to applying an MachineConfig that takes a while to apply (like switching to rt-kernel), SR-IOV reboot the node in the middle of that process. when node come back online it is left in an intermediate state it cannot reconsile IMHO this is a design bug, all node configuration changes should be done through MCO. Version-Release number of selected component (if applicable): 4.7 How reproducible: very often, with below steps Steps to Reproduce: to use it: this need a node with Intel SRIOV capable NIC. make sure to update the SriovNetworkNodePolicy with that NIC name then: 1. oc apply -f reproducer.yaml # it is expected to fail on missing CRDs 2. wait for cluster to settle and sriov-network-operator to become operational 3. apply worker-duprofile to node 4. oc apply -f reproducer.yaml # again to apply missing CRs 5. you can inspect sriov-daemon and machine-config-daemon on that node to see what happening Actual results: no kernel-rt on node Expected results: kernel-rt on node Additional info: this is the bz on MCO part - https://bugzilla.redhat.com/show_bug.cgi?id=1916169
seems like there's already a mechanism to interact with MCO u/s https://github.com/openshift/sriov-network-operator/commit/d45a8e35feec3d7b2e183052c07b56d93ff1e0a3 I dont think it resolve the issue (cause reqReboot can still be set) but I think it can be enhanced to solve the issue
More discussion here: http://post-office.corp.redhat.com/archives/aos-devel/2021-February/msg00086.html Updating the severity of this to urgent.
Zenghui, I believe this issue should be documented in the 4.7 Release notes. I am thinking something like this: Cause: To enact SRIOV changes on an Intel NIC, a reboot is required. SRIOV currently issues the reboot when it is ready. If this reboot coincides with changes in the Machine Config policy, the node can be left in an undetermined state. Machine Config Operator believes that updated policy has been applied when it actually has not. Note that this race condition can also be caused by adding a node to a machine config pool which has MCP and SRIOV changes. Consequence: The node is left in an indeterminate state. Workaround (if any): To avoid this issue, new nodes requiring SRIOV and MCO changes should do so in a step wise fashion. First apply all MCO configuration and wait for the nodes to settle. Then apply the SRIOV configuration. If a new node is being added to a machine config pool which includes SRIOV, this issue can be avoided by removing the SRIOV policy from the machine configuration pool and then adding the new worker. Then re-apply the SRIOV policy. Result: If the configuration of MCO and SRIOV is completed sequentially, the node will provision correctly. What do you think? By the way, the deadline for identifying bugs for Release Notes is close of business tomorrow. REgards, Ken Y
(In reply to Ken Young from comment #6) > Zenghui, > > I believe this issue should be documented in the 4.7 Release notes. I am > thinking something like this: > > Cause: To enact SRIOV changes on an Intel NIC, a reboot is required. SR-IOV config on Mellanox NIC would also require rebooting to take effect. Perhaps making this as a general statement that: "reboot is sometimes required to enact SR-IOV changes on supported NICs", wdyt? > currently issues the reboot when it is ready. If this reboot coincides with > changes in the Machine Config policy, the node can be left in an > undetermined state. Machine Config Operator believes that updated policy > has been applied when it actually has not. Note that this race condition > can also be caused by adding a node to a machine config pool which has MCP > and SRIOV changes. > > Consequence: The node is left in an indeterminate state. > > Workaround (if any): To avoid this issue, new nodes requiring SRIOV and MCO > changes should do so in a step wise fashion. First apply all MCO > configuration and wait for the nodes to settle. Then apply the SRIOV > configuration. If a new node is being added to a machine config pool which > includes SRIOV, this issue can be avoided by removing the SRIOV policy from > the machine configuration pool and then adding the new worker. Then > re-apply the SRIOV policy. > > Result: If the configuration of MCO and SRIOV is completed sequentially, the > node will provision correctly. > > What do you think? By the way, the deadline for identifying bugs for > Release Notes is close of business tomorrow. > The rest looks good to me.
This has been published in the 4.8 Release Notes here: https://access.redhat.com/documentation/en-us/openshift_container_platform/4.7/html/release_notes/ocp-4-7-release-notes
*** Bug 1928265 has been marked as a duplicate of this bug. ***
Here's the upstream PR https://github.com/k8snetworkplumbingwg/sriov-network-operator/pull/93
Hi, We just tried the fix and it seems that sriov is waiting for the wrong MCP. It is waiting for 'worker' MCP while it should wait to 'worker-duprofile' MCP. [root@cnfdd3-installer cnf-internal-deploy]# oc get node NAME STATUS ROLES AGE VERSION cnfdd3.clus2.t5g.lab.eng.bos.redhat.com NotReady worker,worker-duprofile 160m v1.20.0+5f82cdb dhcp19-17-102.clus2.t5g.lab.eng.bos.redhat.com Ready master,virtual 3h47m v1.20.0+5f82cdb dhcp19-17-118.clus2.t5g.lab.eng.bos.redhat.com Ready master,virtual 3h46m v1.20.0+5f82cdb dhcp19-17-128.clus2.t5g.lab.eng.bos.redhat.com Ready master,virtual 3h46m v1.20.0+5f82cdb dhcp19-17-5.clus2.t5g.lab.eng.bos.redhat.com Ready worker 3h17m v1.20.0+5f82cdb [root@cnfdd3-installer cnf-internal-deploy]# oc get mcp -A NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-35194b0693787f3b1f3134ea0a3488ec True False False 3 3 3 0 3h46m worker rendered-worker-2d38b8340641ab4c1b1af1479c7386d6 True False False 1 1 1 0 3h46m worker-duprofile rendered-worker-duprofile-73fff1411b4fe061d3f875cbdfc5816c False True False 1 0 1 0 149m [root@cnfdd3-installer ~]# oc logs -n openshift-sriov-network-operator sriov-network-config-daemon-pmrsl | grep -in mcp 246:I0413 11:01:42.802272 51194 daemon.go:786] getNodeMachinePool(): find node in MCP worker 251:I0413 11:01:49.166330 51194 daemon.go:839] drainNode(): MCP worker is ready 252:I0413 11:01:49.166343 51194 daemon.go:849] drainNode(): pause MCP worker 253:I0413 11:01:49.175907 51194 daemon.go:731] annotateNode(): Annotate node cnfdd3.clus2.t5g.lab.eng.bos.redhat.com with: Draining_MCP_Paused 254:I0413 11:01:49.205714 51194 daemon.go:839] drainNode(): MCP worker is ready 255:I0413 11:01:49.205728 51194 daemon.go:841] drainNode(): stop MCP informerworker
@saledort Could you help try again with new build?
The new build looks good. We got RT kernel on the node and the sriov policy was set successfully. [root@cnfdd3-installer cnf-internal-deploy]# oc get node -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME cnfdd3.clus2.t5g.lab.eng.bos.redhat.com Ready worker,worker-duprofile 162m v1.21.0-rc.0+2993be8 10.19.16.100 <none> Red Hat Enterprise Linux CoreOS 48.84.202104171300-0 (Ootpa) 4.18.0-293.rt7.59.el8.x86_64 cri-o://1.21.0-74.rhaos4.8.gitbc1ef35.el8 dhcp19-17-102.clus2.t5g.lab.eng.bos.redhat.com Ready master,virtual 3h25m v1.21.0-rc.0+2993be8 10.19.17.102 <none> Red Hat Enterprise Linux CoreOS 48.84.202104171300-0 (Ootpa) 4.18.0-293.el8.x86_64 cri-o://1.21.0-74.rhaos4.8.gitbc1ef35.el8 dhcp19-17-118.clus2.t5g.lab.eng.bos.redhat.com Ready master,virtual 3h25m v1.21.0-rc.0+2993be8 10.19.17.118 <none> Red Hat Enterprise Linux CoreOS 48.84.202104171300-0 (Ootpa) 4.18.0-293.el8.x86_64 cri-o://1.21.0-74.rhaos4.8.gitbc1ef35.el8 dhcp19-17-128.clus2.t5g.lab.eng.bos.redhat.com Ready master,virtual 3h25m v1.21.0-rc.0+2993be8 10.19.17.128 <none> Red Hat Enterprise Linux CoreOS 48.84.202104171300-0 (Ootpa) 4.18.0-293.el8.x86_64 cri-o://1.21.0-74.rhaos4.8.gitbc1ef35.el8 dhcp19-17-56.clus2.t5g.lab.eng.bos.redhat.com Ready worker 175m v1.21.0-rc.0+2993be8 10.19.17.56 <none> Red Hat Enterprise Linux CoreOS 48.84.202104171300-0 (Ootpa) 4.18.0-293.el8.x86_64 cri-o://1.21.0-74.rhaos4.8.gitbc1ef35.el8 Allocatable: cpu: 47 ephemeral-storage: 431049040797 hugepages-1Gi: 16Gi hugepages-2Mi: 0 memory: 79643004Ki openshift.io/mh_u_site_1_fqdn_worker1: 4 pods: 250 [root@cnfdd3-installer cnf-internal-deploy]# oc get mcp -A NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-9e5d7db0cf72ef9267abfa7a0c038380 True False False 3 3 3 0 3h21m worker rendered-worker-f5439a2461aae76736ae85ef75f8a3d2 True False False 1 1 1 0 3h21m worker-duprofile rendered-worker-duprofile-df0ea7c177cad5fc144601884ba055b8 True False False 1 1 1 0 84m [root@cnfdd3-installer cnf-internal-deploy]# oc logs -n openshift-sriov-network-operator sriov-network-config-daemon-4xrzs | grep MCP I0419 11:37:11.303454 9252 daemon.go:880] drainNode():MCP worker-duprofile is not ready: [{RenderDegraded False 2021-04-19 11:28:57 +0000 UTC } {NodeDegraded False 2021-04-19 11:29:02 +0000 UTC } {Degraded False 2021-04-19 11:29:02 +0000 UTC } {Updated False 2021-04-19 11:29:24 +0000 UTC } {Updating True 2021-04-19 11:29:24 +0000 UTC All nodes are updating to rendered-worker-duprofile-2e3f4a90e05613fa4e39dfb14226921a}], wait... I0419 11:37:41.302725 9252 daemon.go:880] drainNode():MCP worker-duprofile is not ready: [{RenderDegraded False 2021-04-19 11:28:57 +0000 UTC } {NodeDegraded False 2021-04-19 11:29:02 +0000 UTC } {Degraded False 2021-04-19 11:29:02 +0000 UTC } {Updated False 2021-04-19 11:29:24 +0000 UTC } {Updating True 2021-04-19 11:29:24 +0000 UTC All nodes are updating to rendered-worker-duprofile-2e3f4a90e05613fa4e39dfb14226921a}], wait... I0419 11:38:11.303793 9252 daemon.go:880] drainNode():MCP worker-duprofile is not ready: [{RenderDegraded False 2021-04-19 11:28:57 +0000 UTC } {NodeDegraded False 2021-04-19 11:29:02 +0000 UTC } {Degraded False 2021-04-19 11:29:02 +0000 UTC } {Updated False 2021-04-19 11:29:24 +0000 UTC } {Updating True 2021-04-19 11:29:24 +0000 UTC All nodes are updating to rendered-worker-duprofile-2e3f4a90e05613fa4e39dfb14226921a}], wait... I0419 11:38:41.304807 9252 daemon.go:880] drainNode():MCP worker-duprofile is not ready: [{RenderDegraded False 2021-04-19 11:28:57 +0000 UTC } {NodeDegraded False 2021-04-19 11:29:02 +0000 UTC } {Degraded False 2021-04-19 11:29:02 +0000 UTC } {Updated False 2021-04-19 11:29:24 +0000 UTC } {Updating True 2021-04-19 11:29:24 +0000 UTC All nodes are updating to rendered-worker-duprofile-2e3f4a90e05613fa4e39dfb14226921a}], wait... I0419 11:39:11.305213 9252 daemon.go:880] drainNode():MCP worker-duprofile is not ready: [{RenderDegraded False 2021-04-19 11:28:57 +0000 UTC } {NodeDegraded False 2021-04-19 11:29:02 +0000 UTC } {Degraded False 2021-04-19 11:29:02 +0000 UTC } {Updated False 2021-04-19 11:29:24 +0000 UTC } {Updating True 2021-04-19 11:29:24 +0000 UTC All nodes are updating to rendered-worker-duprofile-2e3f4a90e05613fa4e39dfb14226921a}], wait... I0419 11:39:41.306091 9252 daemon.go:880] drainNode():MCP worker-duprofile is not ready: [{RenderDegraded False 2021-04-19 11:28:57 +0000 UTC } {NodeDegraded False 2021-04-19 11:29:02 +0000 UTC } {Degraded False 2021-04-19 11:29:02 +0000 UTC } {Updated False 2021-04-19 11:29:24 +0000 UTC } {Updating True 2021-04-19 11:29:24 +0000 UTC All nodes are updating to rendered-worker-duprofile-2e3f4a90e05613fa4e39dfb14226921a}], wait... I0419 11:40:11.306728 9252 daemon.go:880] drainNode():MCP worker-duprofile is not ready: [{RenderDegraded False 2021-04-19 11:28:57 +0000 UTC } {NodeDegraded False 2021-04-19 11:29:02 +0000 UTC } {Degraded False 2021-04-19 11:29:02 +0000 UTC } {Updated False 2021-04-19 11:29:24 +0000 UTC } {Updating True 2021-04-19 11:29:24 +0000 UTC All nodes are updating to rendered-worker-duprofile-2e3f4a90e05613fa4e39dfb14226921a}], wait... I0419 11:40:41.307084 9252 daemon.go:880] drainNode():MCP worker-duprofile is not ready: [{RenderDegraded False 2021-04-19 11:28:57 +0000 UTC } {NodeDegraded False 2021-04-19 11:29:02 +0000 UTC } {Degraded False 2021-04-19 11:29:02 +0000 UTC } {Updated False 2021-04-19 11:29:24 +0000 UTC } {Updating True 2021-04-19 11:29:24 +0000 UTC All nodes are updating to rendered-worker-duprofile-2e3f4a90e05613fa4e39dfb14226921a}], wait... I0419 11:47:45.238599 21005 daemon.go:598] completeDrain(): resume MCP worker-duprofile
Thanks Sabina, Move this bug to verified according to comment 22
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438