Bug 1775838 - OVS-CNI does not work with OVN
Summary: OVS-CNI does not work with OVN
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.3.0
Hardware: x86_64
OS: Linux
Target Milestone: ---
: 4.3.z
Assignee: Alexander Constantinescu
QA Contact: Meni Yakove
Depends On: 1806591
Blocks: 1771572
TreeView+ depends on / blocked
Reported: 2019-11-22 22:32 UTC by Jenifer Abrams
Modified: 2020-04-30 01:28 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1806591 (view as bug list)
Last Closed: 2020-04-30 01:28:07 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Github openshift cluster-network-operator pull 492 0 None closed Bug 1775838: Back-port, OVNKubernetes: introduce OVS anti-selector 2021-02-16 07:56:29 UTC
Red Hat Product Errata RHBA-2020:1529 0 None None None 2020-04-30 01:28:21 UTC

Description Jenifer Abrams 2019-11-22 22:32:25 UTC
Description of problem:
Running upstream tests using OVN in the OCP install-config:
  networkType: OVNKubernetes

Started upstream ovs-cni:
oc apply -f https://raw.githubusercontent.com/kubevirt/ovs-cni/master/examples/ovs-cni.yml

and I see that ovs-cni pods are all failing w/ CrashLoopBackoff

Only the marker reports logs:
# oc logs -f -n kube-system  ovs-cni-amd64-n4l29 -c ovs-cni-marker
F1122 20:00:25.946505       1 main.go:42] failed to create a new marker object: Error creating the ovsdb connection: failed to connect to ovsdb error: Invalid socket file

Comparing OVNKubernetes to default OpenShiftSDN, there appear to be ovs path changes.

[root@worker0 ~]# ls /etc/openvswitch/
conf.db  system-id.conf
[root@worker0 ~]# ls /run/openvswitch/
br-int.mgmt  br-int.snoop  br-local.mgmt  br-local.snoop  br1ovs.mgmt  br1ovs.snoop  db.sock  ovn-controller.4227.ctl  ovn-controller.pid  ovnkube-node.pid  ovs-vswitchd.4242.ctl  ovs-vswitchd.pid  ovsdb-server.4189.ctl  ovsdb-server.pid

SDN(default ovs -- different cluster):
[root@worker-perf39 ~]# ls /var/run/openvswitch/
br0.mgmt  br0.snoop  br1.mgmt  br1ovs.mgmt  br1ovs.snoop  br1.snoop  db.sock  ovsdb-server.17232.ctl  ovsdb-server.pid  ovs-vswitchd.17443.ctl  ovs-vswitchd.pid
[root@worker-perf39 ~]# ls /var/lib/openvswitch/
conf.db  system-id.conf

I captured the pod description outputs..

SDN (default ovs -- from a different cluster):

Version-Release number of selected component (if applicable):
OCP 4.3.0-0.nightly-2019-11-11-182924
RHCOS 43.81.201911111553.0
Kubevirt v0.23.0

REPOSITORY                                       TAG                                                                       IMAGE ID       CREATED        SIZE
quay.io/kubevirt/ovs-cni-plugin                  latest                                                                    9e05c4d27e5b   2 weeks ago    114 MB
quay.io/kubevirt/ovs-cni-marker                  latest                                                                    20b56e0e6a31   2 weeks ago    142 MB

How reproducible:
Failure every time. 

Steps to Reproduce:
Start OVS-CNI on cluster using OVN.

I haven't tried the network operator yet, but I suspect it will have the same error if the paths have not been changed.

Comment 1 Jenifer Abrams 2019-11-22 22:47:15 UTC
Also in case it is interesting, here is the ovs-vsctl output from an OVN node: http://perf1.perf.lab.eng.bos.redhat.com/pub/jhopper/OCP4/debug/OVN/ovs-vsctl_OVN.txt

Comment 2 Petr Horáček 2019-11-28 11:22:05 UTC
Jenifer, thanks for testing OVN with CNV. We aim to tackle its support in future releases.

Comment 3 Petr Horáček 2019-12-05 11:39:13 UTC
Resolution for this issue is being tracked on Jira.

Comment 4 Nelly Credi 2020-01-01 14:10:47 UTC
we are tracking it in Jira. closing

Comment 5 Petr Horáček 2020-02-15 14:31:58 UTC
I'm reopening this issue. We need to run CNV on OCP 4.3 and this bug blocks us from successful deployment.

CNV's OVS bridge marker fails on OCP 4.3 with:

F0214 13:19:12.260672       1 main.go:42] failed to create a new marker object: Error creating the ovsdb connection: failed to connect to ovsdb error: Invalid socket file

The solution would be to backport https://github.com/openshift/cluster-network-operator/pull/357. In this PR, we split OVS and OVN pods, making them share the OVS socket on the host. With the socket available on the host, CNV's OVS bridge marker should be able to successfully start and render the deployment done.

Comment 18 Geetika Kapoor 2020-04-21 09:39:00 UTC
Test Environment:

$ oc version
Client Version: 4.4.0-0.nightly-2020-04-20-224655
Server Version: 4.4.0-rc.8
Kubernetes Version: v1.17.1

$ oc get csv -n openshift-cnv | awk ' { print $4 } ' | tail -n1

Test Steps:

1. Check status of all the pods under namespace openshift-cnv with app=ovs-cni.

$ oc get pod -n openshift-cnv | grep ovs-cni
ovs-cni-amd64-4jmxb                                   2/2     Running   3          67m
ovs-cni-amd64-5lzpz                                   2/2     Running   0          67m
ovs-cni-amd64-7k2xk                                   2/2     Running   0          67m
ovs-cni-amd64-c679d                                   2/2     Running   0          67m
ovs-cni-amd64-dxgck                                   2/2     Running   3          67m
ovs-cni-amd64-w28td                                   2/2     Running   3          67m

2. Check the logs and see if there are any socket exceptions as mentioned above .

$ for pod in $(oc get pod -n openshift-cnv -l app=ovs-cni --no-headers | awk '{print $1}'); do echo ===$pod=== && oc logs -n openshift-cnv $pod --all-containers=true; done

I0421 08:49:16.974668       1 main.go:44] Found the OVS socket
I0421 08:30:59.423366       1 main.go:44] Found the OVS socket
I0421 08:30:56.322587       1 main.go:44] Found the OVS socket
I0421 08:30:55.216315       1 main.go:44] Found the OVS socket
I0421 08:56:24.900515       1 main.go:44] Found the OVS socket
I0421 08:42:15.621076       1 main.go:44] Found the OVS socket

Comment 21 errata-xmlrpc 2020-04-30 01:28:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.