Description of problem: Latest OVN-Kubernetes image ships with openvswitch2.17-2.17.0-8.el8fdp.x86_64 rpm. Starting here: https://github.com/openshift/cluster-network-operator/commit/d38a80372198130a1a43603d036f1b202ae8fbcb ovnkube-node pods periodically scrape ovs metrics by running commands like: - ovs-dpctl dump-dps - ovs-dpctl show ovs-system When running on OKD with openvswitch-2.15.0-8.fc35.x86_64 installed on the host OVS stops handling any pod traffic once `ovs-dpctl(2.17-2)` is called from the ovnkube-node pod. There is a version mismatch but the outcome seems really severe. Version-Release number of selected component (if applicable): - openvswitch2.17-2 - openvswitch-2.15.0 How reproducible: Always Steps to Reproduce: 1. Setup latest openshift OKD cluster without https://github.com/openshift/cluster-network-operator/commit/d38a80372198130a1a43603d036f1b202ae8fbcb to allow successful installation 2. Run "ovs-dpctl show" or "ovs-dpctl dump-dps" from any ovnkube-node pod 3. Pods running on the node hosting ovnkube-node pod from the previous step loose all connectivity. I think the same behavior should be observed just by using ovs-dpctl 2.17-2 with ovs 2.15 but I have not tried that.
The root cause is that ovs-dpctl uses the same interface as ovs-vswitchd to talk to the kernel, and that includes setting updated capabilities and such that vswitchd does when it starts. So running a newer dpctl basically overwrites what the vswitchd on the host is doing. Fixing this would require changes pretty deep down the call stack so it's quite unlikely to be fixed in OVS. Instead, ovn-kubernetes can use `ovs-appctl dpctl/*` with the same commands that will talk to vswitchd on the host (instead of directly to the kenrel) and get the same information. The response formatting should also be the same as with a direct dpctl. This will prevent the issue since vswitchd will be the only thing opening the netlink channel to the kernel.
*** Bug 2089148 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069