The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.
Bug 1952846 - [ovn-controller] OVS.Interface.external-ids:ovn-installed is not set if original OVS TXN failed.
Summary: [ovn-controller] OVS.Interface.external-ids:ovn-installed is not set if origi...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: ovn2.13
Version: FDP 20.H
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Dumitru Ceara
QA Contact: ying xu
URL:
Whiteboard: perfscale-ovn
Depends On:
Blocks: 1959200
TreeView+ depends on / blocked
 
Reported: 2021-04-23 11:20 UTC by Dumitru Ceara
Modified: 2021-08-24 20:28 UTC (History)
5 users (show)

Fixed In Version: ovn2.13-20.12.0-135
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-06-21 14:44:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker FD-1275 0 None None None 2021-08-22 04:50:09 UTC
Red Hat Product Errata RHBA-2021:2507 0 None None None 2021-06-21 14:46:02 UTC

Description Dumitru Ceara 2021-04-23 11:20:35 UTC
Description of problem:

OVN uses the OVS.Interface.external-ids:ovn-installed attribute to notify the CMS that an OVS port has been bound to an OVN port and that all required OVS flows have been installed.

However, if the ovsdb transaction to set this attribute in the local conf.db fails then ovn-controller doesn't retry.

The transaction can fail, especially at scale, and ovn-controller should be resilient enough to handle it.

Comment 3 Dumitru Ceara 2021-05-06 15:36:51 UTC
V2 patch:
http://patchwork.ozlabs.org/project/ovn/list/?series=242485&state=*

Comment 5 Tim Rozet 2021-05-12 16:03:23 UTC
It looks like this fix doesn't entirely fix the problem of ovn-installed being reported before the flows are installed. When I test with this fix I run a script that checks every .5 seconds to see if ovn-installed is added, as well as the flows in table8 during a pod create. I see this:


Wed May 12 14:39:56 UTC 2021 external_ids        : {attached_mac="0a:58:0a:97:0d:3d", iface-id=openshift-authentication_trozet1, ip_addresses="10.151.13.61/22", ovn-installed="true", sandbox="87a49511bcad42f70c952f6a67e386a58b270b60250b546d0cdd1e40e44ece75"}


Wed May 12 14:40:22 UTC 2021 cookie=0xfb844538, duration=0.135s, table=8, n_packets=0, n_bytes=0, idle_age=0, priority=50,reg14=0x13c,metadata=0x264,dl_src=0a:58:0a:97:0d:3d actions=resubmit(,9)

we can see the flow was installed much later (26 seconds or so) than when ovn-installed was added to a pod.

Comment 6 Dumitru Ceara 2021-05-17 07:56:25 UTC
(In reply to Tim Rozet from comment #5)
> It looks like this fix doesn't entirely fix the problem of ovn-installed
> being reported before the flows are installed. When I test with this fix I
> run a script that checks every .5 seconds to see if ovn-installed is added,
> as well as the flows in table8 during a pod create. I see this:
> 
> 
> Wed May 12 14:39:56 UTC 2021 external_ids        :
> {attached_mac="0a:58:0a:97:0d:3d",
> iface-id=openshift-authentication_trozet1, ip_addresses="10.151.13.61/22",
> ovn-installed="true",
> sandbox="87a49511bcad42f70c952f6a67e386a58b270b60250b546d0cdd1e40e44ece75"}
> 
> 
> Wed May 12 14:40:22 UTC 2021 cookie=0xfb844538, duration=0.135s, table=8,
> n_packets=0, n_bytes=0, idle_age=0,
> priority=50,reg14=0x13c,metadata=0x264,dl_src=0a:58:0a:97:0d:3d
> actions=resubmit(,9)
> 
> we can see the flow was installed much later (26 seconds or so) than when
> ovn-installed was added to a pod.

Based on https://bugzilla.redhat.com/show_bug.cgi?id=1959200#c4, this is a
different issue, which I don't think we can fix in OVN itself.  AFAICT,
the only option is to ensure that the CMS doesn't reuse logical port names.

Comment 10 ying xu 2021-06-04 10:34:59 UTC
Dumitru Ceara said this bug is very hard to reproduce, he suggests to do sanity test.

So I just do the regression test.

set verified as sanity-only

Comment 12 errata-xmlrpc 2021-06-21 14:44:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (ovn2.13 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2507

Comment 13 Joe Talerico 2021-06-23 10:13:53 UTC
We are still seeing this with the latest 4.9 nightly compose.

kube-apiserver            4.9.0-0.nightly-2021-06-21-191858   True        True          True       13h     InstallerPodContainerWaitingDegraded: Pod "installer-9-ip-10-0-161-94.us-west-2.compute.internal" on node "ip-10-0-161-94.us-west-2.compute.internal" container "installer" is waiting since 2021-06-23 08:11:54 +0000 UTC because ContainerCreating
InstallerPodNetworkingDegraded: Pod "installer-9-ip-10-0-161-94.us-west-2.compute.internal" on node "ip-10-0-161-94.us-west-2.compute.internal" observed degraded networking: Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_installer-9-ip-10-0-161-94.us-west-2.compute.internal_openshift-kube-apiserver_39a7beab-7f9b-4f21-b2a9-9d2e302f7998_0(77e08343ec87696849117f1313ae37f8902f86c8bcc9080945c78c9feed02172): [openshift-kube-apiserver/installer-9-ip-10-0-161-94.us-west-2.compute.internal:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[openshift-kube-apiserver/installer-9-ip-10-0-161-94.us-west-2.compute.internal 77e08343ec87696849117f1313ae37f8902f86c8bcc9080945c78c9feed02172] [openshift-kube-apiserver/installer-9-ip-10-0-161-94.us-west-2.compute.internal 77e08343ec87696849117f1313ae37f8902f86c8bcc9080945c78c9feed02172] failed to configure pod interface: error while waiting on OVS.Interface.external-ids:ovn-installed for pod: timed out while waiting for OVS port binding
InstallerPodNetworkingDegraded: '

OCP Version 4.9.0-0.nightly-2021-06-21-191858

OVS bits. 
openvswitch2.15-2.15.0-9.el8fdp.x86_64
openvswitch2.15-devel-2.15.0-9.el8fdp.x86_64
ovn2.13-20.12.0-140.el8fdp.x86_64
ovn2.13-host-20.12.0-140.el8fdp.x86_64
openvswitch-selinux-extra-policy-1.0-28.el8fdp.noarch
python3-openvswitch2.15-2.15.0-9.el8fdp.x86_64
openvswitch2.15-ipsec-2.15.0-9.el8fdp.x86_64
ovn2.13-central-20.12.0-140.el8fdp.x86_64
ovn2.13-vtep-20.12.0-140.el8fdp.x86_64

Comment 14 Dumitru Ceara 2021-06-23 11:43:37 UTC
(In reply to Joe Talerico from comment #13)
> We are still seeing this with the latest 4.9 nightly compose.
> 
> kube-apiserver            4.9.0-0.nightly-2021-06-21-191858   True       
> True          True       13h     InstallerPodContainerWaitingDegraded: Pod
> "installer-9-ip-10-0-161-94.us-west-2.compute.internal" on node
> "ip-10-0-161-94.us-west-2.compute.internal" container "installer" is waiting
> since 2021-06-23 08:11:54 +0000 UTC because ContainerCreating
> InstallerPodNetworkingDegraded: Pod
> "installer-9-ip-10-0-161-94.us-west-2.compute.internal" on node
> "ip-10-0-161-94.us-west-2.compute.internal" observed degraded networking:
> Failed to create pod sandbox: rpc error: code = Unknown desc = failed to
> create pod network sandbox
> k8s_installer-9-ip-10-0-161-94.us-west-2.compute.internal_openshift-kube-
> apiserver_39a7beab-7f9b-4f21-b2a9-
> 9d2e302f7998_0(77e08343ec87696849117f1313ae37f8902f86c8bcc9080945c78c9feed021
> 72):
> [openshift-kube-apiserver/installer-9-ip-10-0-161-94.us-west-2.compute.
> internal:ovn-kubernetes]: error adding container to network
> "ovn-kubernetes": CNI request failed with status 400:
> '[openshift-kube-apiserver/installer-9-ip-10-0-161-94.us-west-2.compute.
> internal 77e08343ec87696849117f1313ae37f8902f86c8bcc9080945c78c9feed02172]
> [openshift-kube-apiserver/installer-9-ip-10-0-161-94.us-west-2.compute.
> internal 77e08343ec87696849117f1313ae37f8902f86c8bcc9080945c78c9feed02172]
> failed to configure pod interface: error while waiting on
> OVS.Interface.external-ids:ovn-installed for pod: timed out while waiting
> for OVS port binding
> InstallerPodNetworkingDegraded: '
> 
> OCP Version 4.9.0-0.nightly-2021-06-21-191858
> 
> OVS bits. 
> openvswitch2.15-2.15.0-9.el8fdp.x86_64
> openvswitch2.15-devel-2.15.0-9.el8fdp.x86_64
> ovn2.13-20.12.0-140.el8fdp.x86_64
> ovn2.13-host-20.12.0-140.el8fdp.x86_64
> openvswitch-selinux-extra-policy-1.0-28.el8fdp.noarch
> python3-openvswitch2.15-2.15.0-9.el8fdp.x86_64
> openvswitch2.15-ipsec-2.15.0-9.el8fdp.x86_64
> ovn2.13-central-20.12.0-140.el8fdp.x86_64
> ovn2.13-vtep-20.12.0-140.el8fdp.x86_64

Per our discussion on Slack, we have bug 1959200 tracking the ovn-kubernetes issue.


Note You need to log in before you can comment on or make changes to this bug.