Bug 2091634 - OVS 2.15 stops handling traffic once ovs-dpctl(2.17.2) is used against it
Summary: OVS 2.15 stops handling traffic once ovs-dpctl(2.17.2) is used against it
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.11
Hardware: Unspecified
OS: Linux
unspecified
high
Target Milestone: ---
: 4.11.0
Assignee: Patryk Diak
QA Contact:
URL:
Whiteboard:
: 2089148 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-05-30 14:34 UTC by Patryk Diak
Modified: 2022-08-10 11:15 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-10 11:15:13 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift ovn-kubernetes pull 1118 0 None Merged Bug 2091634: Use ovs-appctl dpctl/* instead of ovs-dpctl 2022-07-11 20:52:29 UTC
Github ovn-org ovn-kubernetes pull 3007 0 None Merged Use ovs-appctl dpctl/* instead of ovs-dpctl 2022-06-02 13:53:21 UTC
Red Hat Issue Tracker FD-1999 0 None None None 2022-05-30 14:39:17 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 11:15:34 UTC

Description Patryk Diak 2022-05-30 14:34:15 UTC
Description of problem:

Latest OVN-Kubernetes image ships with openvswitch2.17-2.17.0-8.el8fdp.x86_64 rpm.
Starting here: https://github.com/openshift/cluster-network-operator/commit/d38a80372198130a1a43603d036f1b202ae8fbcb ovnkube-node pods periodically scrape ovs metrics by running commands like:
  - ovs-dpctl dump-dps
  - ovs-dpctl show ovs-system

When running on OKD with openvswitch-2.15.0-8.fc35.x86_64 installed on the host OVS stops handling any pod traffic once `ovs-dpctl(2.17-2)` is called from the ovnkube-node pod. 

There is a version mismatch but the outcome seems really severe.

Version-Release number of selected component (if applicable):
- openvswitch2.17-2
- openvswitch-2.15.0

How reproducible:
Always

Steps to Reproduce:
1. Setup latest openshift OKD cluster without https://github.com/openshift/cluster-network-operator/commit/d38a80372198130a1a43603d036f1b202ae8fbcb to allow successful installation
2. Run "ovs-dpctl show" or "ovs-dpctl dump-dps" from any ovnkube-node pod
3. Pods running on the node hosting ovnkube-node pod from the previous step loose all connectivity.

I think the same behavior should be observed just by using ovs-dpctl 2.17-2 with ovs 2.15 but I have not tried that.

Comment 1 Dan Williams 2022-05-31 16:43:31 UTC
The root cause is that ovs-dpctl uses the same interface as ovs-vswitchd to talk to the kernel, and that includes setting updated capabilities and such that vswitchd does when it starts. So running a newer dpctl basically overwrites what the vswitchd on the host is doing. Fixing this would require changes pretty deep down the call stack so it's quite unlikely to be fixed in OVS.

Instead, ovn-kubernetes can use `ovs-appctl dpctl/*` with the same commands that will talk to vswitchd on the host (instead of directly to the kenrel) and get the same information. The response formatting should also be the same as with a direct dpctl.

This will prevent the issue since vswitchd will be the only thing opening the netlink channel to the kernel.

Comment 5 Vadim Rutkovsky 2022-06-04 14:11:09 UTC
*** Bug 2089148 has been marked as a duplicate of this bug. ***

Comment 8 errata-xmlrpc 2022-08-10 11:15:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.