Description of problem: Due to bz 1885605 - linux-bridge cannot be configured over the default NIC of a node in a cluster with OVN Kubernetes networking. When applying this configuration - the node gets to "NotReady" state, and doesn't recover, until rebooted. Version-Release number of selected component (if applicable): OCP 4.6 CNV 2.5 How reproducible: Always Steps to Reproduce: Follow the steps of the description of bz 1885605 (https://bugzilla.redhat.com/show_bug.cgi?id=1885605#c0) Actual results: Each node on which this configuration is applied on get to "NotReady" state: # oc get nodes NAME STATUS ROLES AGE VERSION worker-1 NotReady worker 101m v1.19.0+d59ce34 Expected results: Node recovers and gets back to "Ready" state. Additional info: Workaround: Reboot the node (I only managed to do it via virsh): # virsh list Id Name State --------------------------------- 18 ostest_master_1 running 19 ostest_master_2 running 20 ostest_master_0 running 23 ostest_worker_0 running 24 ostest_worker_1 running # virsh reboot ostest_worker_1
Created attachment 1738972 [details] nmstate handler pod logs
Deferring this to 2.7. OVN is still a tech preview and we document that it is not allowed to reconfigure the default iface using knmstate when OVN is used.
@yboaron can we retest this with latest CNV ?
Created attachment 1746999 [details] nmstate crictl logs cnv 2.6 This is the logs taking directly from the node since we lose TCP connectivity it's done using openstack novnc.
Looks like nmstate is not able to rollback this kind of configuration since it involves linux-bridge and ovs also the ping we do after rollback is failing (since nmstate is not able to do the rollback) and it ends with handler trying to mark NNCE as success (wich is wrong) but it cannot since apiserver connectivity is broken. Also I suspect that nmstate 1.0 will fix that since it does not allow from the beginning to have the same slave a multiple devices, so it should be fixed ad CNV 2.8.
Also note that restaring the node make it accessible again.
@ellorent , I think you tagged the wrong Yossi
Created attachment 1747053 [details] NetworkManager at debug level
Created attachment 1747055 [details] NodeNetworkState before apply the policy
Created attachment 1747056 [details] policy applied
Bug openned at nmstate team https://bugzilla.redhat.com/show_bug.cgi?id=1915850
Just as a sidenot restarting the worker restores the connectivity.
Moving this tracker to NEW. Keeping it until the linked nmstate bug gets resolved.
Rollback is working fine at CNV 4.8 with nmstdate 1.0.2, now we have to see if veth works fine too.
(In reply to Quique Llorente from comment #14) > Rollback is working fine at CNV 4.8 with nmstdate 1.0.2, now we have to see > if veth works fine too. The cluster was using openshift-sdn not OVNKubernetes.
This should be now addressed in the latest rebuild of 4.9.
Verified on cluster with OVN Kubernetes Networking. Version verified: kubernetes-nmstate-handler-container version is: v4.9.0-18 Steps verified: 1. Create and applied Linux Bridge over default NIC (Took from here: https://bugzilla.redhat.com/show_bug.cgi?id=1885605) 2. The nodes that the NNCP applied on recovered and are on status Ready: [cnv-qe-jenkins@onash-490-ovn-9nbdm-executor extract-cnv-image-versions]$ oc get nodes NAME STATUS ROLES AGE VERSION onash-490-ovn-9nbdm-master-0 Ready master 134m v1.21.1+8268f88 onash-490-ovn-9nbdm-master-1 Ready master 134m v1.21.1+8268f88 onash-490-ovn-9nbdm-master-2 Ready master 133m v1.21.1+8268f88 onash-490-ovn-9nbdm-worker-0-8btgf Ready worker 117m v1.21.1+8268f88 onash-490-ovn-9nbdm-worker-0-fvqv8 Ready worker 117m v1.21.1+8268f88 onash-490-ovn-9nbdm-worker-0-vzgqb Ready worker 113m v1.21.1+8268f88
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 4.9.0 Images security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:4104