Bug 1952846
| Summary: | [ovn-controller] OVS.Interface.external-ids:ovn-installed is not set if original OVS TXN failed. | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux Fast Datapath | Reporter: | Dumitru Ceara <dceara> |
| Component: | ovn2.13 | Assignee: | Dumitru Ceara <dceara> |
| Status: | CLOSED ERRATA | QA Contact: | ying xu <yinxu> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | FDP 20.H | CC: | ctrautma, jishi, jtaleric, ralongi, trozet |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | perfscale-ovn | ||
| Fixed In Version: | ovn2.13-20.12.0-135 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-06-21 14:44:39 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1959200 | ||
|
Description
Dumitru Ceara
2021-04-23 11:20:35 UTC
Fix sent for review: http://patchwork.ozlabs.org/project/ovn/patch/20210423141752.15080.58931.stgit@dceara.remote.csb/ It looks like this fix doesn't entirely fix the problem of ovn-installed being reported before the flows are installed. When I test with this fix I run a script that checks every .5 seconds to see if ovn-installed is added, as well as the flows in table8 during a pod create. I see this:
Wed May 12 14:39:56 UTC 2021 external_ids : {attached_mac="0a:58:0a:97:0d:3d", iface-id=openshift-authentication_trozet1, ip_addresses="10.151.13.61/22", ovn-installed="true", sandbox="87a49511bcad42f70c952f6a67e386a58b270b60250b546d0cdd1e40e44ece75"}
Wed May 12 14:40:22 UTC 2021 cookie=0xfb844538, duration=0.135s, table=8, n_packets=0, n_bytes=0, idle_age=0, priority=50,reg14=0x13c,metadata=0x264,dl_src=0a:58:0a:97:0d:3d actions=resubmit(,9)
we can see the flow was installed much later (26 seconds or so) than when ovn-installed was added to a pod.
(In reply to Tim Rozet from comment #5) > It looks like this fix doesn't entirely fix the problem of ovn-installed > being reported before the flows are installed. When I test with this fix I > run a script that checks every .5 seconds to see if ovn-installed is added, > as well as the flows in table8 during a pod create. I see this: > > > Wed May 12 14:39:56 UTC 2021 external_ids : > {attached_mac="0a:58:0a:97:0d:3d", > iface-id=openshift-authentication_trozet1, ip_addresses="10.151.13.61/22", > ovn-installed="true", > sandbox="87a49511bcad42f70c952f6a67e386a58b270b60250b546d0cdd1e40e44ece75"} > > > Wed May 12 14:40:22 UTC 2021 cookie=0xfb844538, duration=0.135s, table=8, > n_packets=0, n_bytes=0, idle_age=0, > priority=50,reg14=0x13c,metadata=0x264,dl_src=0a:58:0a:97:0d:3d > actions=resubmit(,9) > > we can see the flow was installed much later (26 seconds or so) than when > ovn-installed was added to a pod. Based on https://bugzilla.redhat.com/show_bug.cgi?id=1959200#c4, this is a different issue, which I don't think we can fix in OVN itself. AFAICT, the only option is to ensure that the CMS doesn't reuse logical port names. Dumitru Ceara said this bug is very hard to reproduce, he suggests to do sanity test. So I just do the regression test. set verified as sanity-only Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (ovn2.13 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2507 We are still seeing this with the latest 4.9 nightly compose. kube-apiserver 4.9.0-0.nightly-2021-06-21-191858 True True True 13h InstallerPodContainerWaitingDegraded: Pod "installer-9-ip-10-0-161-94.us-west-2.compute.internal" on node "ip-10-0-161-94.us-west-2.compute.internal" container "installer" is waiting since 2021-06-23 08:11:54 +0000 UTC because ContainerCreating InstallerPodNetworkingDegraded: Pod "installer-9-ip-10-0-161-94.us-west-2.compute.internal" on node "ip-10-0-161-94.us-west-2.compute.internal" observed degraded networking: Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_installer-9-ip-10-0-161-94.us-west-2.compute.internal_openshift-kube-apiserver_39a7beab-7f9b-4f21-b2a9-9d2e302f7998_0(77e08343ec87696849117f1313ae37f8902f86c8bcc9080945c78c9feed02172): [openshift-kube-apiserver/installer-9-ip-10-0-161-94.us-west-2.compute.internal:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[openshift-kube-apiserver/installer-9-ip-10-0-161-94.us-west-2.compute.internal 77e08343ec87696849117f1313ae37f8902f86c8bcc9080945c78c9feed02172] [openshift-kube-apiserver/installer-9-ip-10-0-161-94.us-west-2.compute.internal 77e08343ec87696849117f1313ae37f8902f86c8bcc9080945c78c9feed02172] failed to configure pod interface: error while waiting on OVS.Interface.external-ids:ovn-installed for pod: timed out while waiting for OVS port binding InstallerPodNetworkingDegraded: ' OCP Version 4.9.0-0.nightly-2021-06-21-191858 OVS bits. openvswitch2.15-2.15.0-9.el8fdp.x86_64 openvswitch2.15-devel-2.15.0-9.el8fdp.x86_64 ovn2.13-20.12.0-140.el8fdp.x86_64 ovn2.13-host-20.12.0-140.el8fdp.x86_64 openvswitch-selinux-extra-policy-1.0-28.el8fdp.noarch python3-openvswitch2.15-2.15.0-9.el8fdp.x86_64 openvswitch2.15-ipsec-2.15.0-9.el8fdp.x86_64 ovn2.13-central-20.12.0-140.el8fdp.x86_64 ovn2.13-vtep-20.12.0-140.el8fdp.x86_64 (In reply to Joe Talerico from comment #13) > We are still seeing this with the latest 4.9 nightly compose. > > kube-apiserver 4.9.0-0.nightly-2021-06-21-191858 True > True True 13h InstallerPodContainerWaitingDegraded: Pod > "installer-9-ip-10-0-161-94.us-west-2.compute.internal" on node > "ip-10-0-161-94.us-west-2.compute.internal" container "installer" is waiting > since 2021-06-23 08:11:54 +0000 UTC because ContainerCreating > InstallerPodNetworkingDegraded: Pod > "installer-9-ip-10-0-161-94.us-west-2.compute.internal" on node > "ip-10-0-161-94.us-west-2.compute.internal" observed degraded networking: > Failed to create pod sandbox: rpc error: code = Unknown desc = failed to > create pod network sandbox > k8s_installer-9-ip-10-0-161-94.us-west-2.compute.internal_openshift-kube- > apiserver_39a7beab-7f9b-4f21-b2a9- > 9d2e302f7998_0(77e08343ec87696849117f1313ae37f8902f86c8bcc9080945c78c9feed021 > 72): > [openshift-kube-apiserver/installer-9-ip-10-0-161-94.us-west-2.compute. > internal:ovn-kubernetes]: error adding container to network > "ovn-kubernetes": CNI request failed with status 400: > '[openshift-kube-apiserver/installer-9-ip-10-0-161-94.us-west-2.compute. > internal 77e08343ec87696849117f1313ae37f8902f86c8bcc9080945c78c9feed02172] > [openshift-kube-apiserver/installer-9-ip-10-0-161-94.us-west-2.compute. > internal 77e08343ec87696849117f1313ae37f8902f86c8bcc9080945c78c9feed02172] > failed to configure pod interface: error while waiting on > OVS.Interface.external-ids:ovn-installed for pod: timed out while waiting > for OVS port binding > InstallerPodNetworkingDegraded: ' > > OCP Version 4.9.0-0.nightly-2021-06-21-191858 > > OVS bits. > openvswitch2.15-2.15.0-9.el8fdp.x86_64 > openvswitch2.15-devel-2.15.0-9.el8fdp.x86_64 > ovn2.13-20.12.0-140.el8fdp.x86_64 > ovn2.13-host-20.12.0-140.el8fdp.x86_64 > openvswitch-selinux-extra-policy-1.0-28.el8fdp.noarch > python3-openvswitch2.15-2.15.0-9.el8fdp.x86_64 > openvswitch2.15-ipsec-2.15.0-9.el8fdp.x86_64 > ovn2.13-central-20.12.0-140.el8fdp.x86_64 > ovn2.13-vtep-20.12.0-140.el8fdp.x86_64 Per our discussion on Slack, we have bug 1959200 tracking the ovn-kubernetes issue. |