Bug 1812009

Summary: ovn_controller crashes after setup is up for a day
Product: Red Hat OpenStack Reporter: Itzik Brown <itbrown>
Component: python-networking-ovnAssignee: Terry Wilson <twilson>
Status: CLOSED EOL QA Contact: Roman Safronov <rsafrono>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 16.0 (Train)CC: apevec, atragler, bhaley, choag, dalvarez, ekuris, ffernand, jlibosva, jschluet, juriarte, lhh, majopela, nusiddiq, rheinzma, rsafrono, scohen, stchen
Target Milestone: ---Keywords: TestBlockerForLayeredProduct, TestOnly, Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 1818844 (view as bug list) Environment:
Last Closed: 2020-11-04 20:38:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1818844    
Bug Blocks: 1819604, 1822542    
Attachments:
Description Flags
last lines from ovn controller log
none
openvswitch logs
none
ovn db none

Description Itzik Brown 2020-03-10 11:29:31 UTC
Created attachment 1668935 [details]
last lines from ovn controller log

Description of problem:
After a day,ovn_controller on a compute is stopped.

Version-Release number of selected component (if applicable):
RHOS_TRUNK-16.0-RHEL-8-20200226.n.1

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 3 Jakub Libosvar 2020-03-10 14:45:53 UTC
bridge not found for localnet port 'provnet-2804478a-b3af-4885-8470-5abc6ea5294a' with network name 'datacentre'

this sounds like somebody removed the provider bridge. The logs lack ovs-vswitchd.log and openvswitch commands to confirm provider bridge was present and bridge mappings configured correctly. Can you please provide such logs?

Comment 4 Roman Safronov 2020-03-10 15:12:26 UTC
Created attachment 1668973 [details]
openvswitch logs

Comment 8 Roman Safronov 2020-03-10 16:29:29 UTC
Created attachment 1668999 [details]
ovn db

Comment 12 Itzik Brown 2020-03-17 03:46:04 UTC
First we don't use DVR so I don't think mapping of br-ex in the compute is relevant here.
Second, Are you sure it's a duplicate of the other bug? We are not consuming FDP as far as I can tell

Comment 13 Jakub Libosvar 2020-03-17 08:03:12 UTC
(In reply to Itzik Brown from comment #12)
> First we don't use DVR so I don't think mapping of br-ex in the compute is
> relevant here.
Yes, that was a red herring.

> Second, Are you sure it's a duplicate of the other bug? We are not consuming
> FDP as far as I can tell

To the best of my knowledge, OVN is not cross-tagged for OSP16 but shipped via FDP channels. If you don't consume FDP, you can try to update ovn-controller to the version from comment 11 to see if it fixes the crashes for you. OVN version containing the fix was shipped with FDP 20.B.

Comment 16 Daniel Alvarez Sanchez 2020-03-17 19:16:03 UTC
The duplicated bug is not correct as it's just a BZ to track the move into OSP13. This BZ is against OSP16.
@Numan, @Kuba do we have the BZ against ovn2.11 that fixes the crash so that we can mark it as dup of this? Otherwise let's repoen this one and mark it as TestOnly.

Comment 17 Jakub Libosvar 2020-03-18 08:40:26 UTC
(In reply to Daniel Alvarez Sanchez from comment #16)
> The duplicated bug is not correct as it's just a BZ to track the move into
> OSP13. This BZ is against OSP16.
> @Numan, @Kuba do we have the BZ against ovn2.11 that fixes the crash so that
> we can mark it as dup of this? Otherwise let's repoen this one and mark it
> as TestOnly.

I don't have the original BZ against OVN. Re-opening this one as suggested.

Comment 25 Eran Kuris 2020-07-13 07:58:15 UTC
fix verified

+--------------------------------------+--------------+--------+------------+-------------+------------------------+
| ID                                   | Name         | Status | Task State | Power State | Networks               |
+--------------------------------------+--------------+--------+------------+-------------+------------------------+
| 08198253-8787-48a8-bda5-7619461959dd | compute-0    | ACTIVE | -          | Running     | ctlplane=192.168.24.34 |
| 49503a19-0af9-42fc-9fb3-0da31786285b | controller-0 | ACTIVE | -          | Running     | ctlplane=192.168.24.46 |
| 139a41fc-7439-4d2f-b3ed-ac8715e12d1c | controller-1 | ACTIVE | -          | Running     | ctlplane=192.168.24.50 |
| f385129c-6681-4bd8-a746-95be8577520f | controller-2 | ACTIVE | -          | Running     | ctlplane=192.168.24.43 |
+--------------------------------------+--------------+--------+------------+-------------+------------------------+
(failed reverse-i-search)`ssh hea': nova li^C
(undercloud) [stack@undercloud-0 ~]$ ssh heat-admin.24.34
Warning: Permanently added '192.168.24.34' (ECDSA) to the list of known hosts.
This system is not registered to Red Hat Insights. See https://cloud.redhat.com/
To register this system, run: insights-client --register

Last login: Mon Jul 13 06:22:20 2020 from 192.168.24.254
[heat-admin@compute-0 ~]$ sudo -s
[root@compute-0 heat-admin]# docker ps |grep ovn_con
Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
0015a2a95da9  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-ovn-controller:20200702.1              kolla_start           3 days ago   Up 3 days ago          ovn_controller


[root@controller-1 ~]# podman exec -it ovn-dbs-bundle-podman-1 /bin/bash
()[root@controller-1 /]# rpm -qa | grep ovn 
puppet-ovn-15.4.1-0.20200229002436.192ac4e.el8ost.noarch
rhosp-openvswitch-ovn-common-2.11-0.6.el8ost.noarch
rhosp-openvswitch-ovn-central-2.11-0.6.el8ost.noarch
ovn2.11-2.11.1-44.el8fdp.x86_64
ovn2.11-central-2.11.1-44.el8fdp.x86_64

[root@compute-0 ~]# podman exec -it ovn_controller /bin/bash
()[root@compute-0 /]# rpm -qa | grep ovn 
puppet-ovn-15.4.1-0.20200229002436.192ac4e.el8ost.noarch
rhosp-openvswitch-ovn-common-2.11-0.6.el8ost.noarch
ovn2.11-host-2.11.1-44.el8fdp.x86_64
ovn2.11-2.11.1-44.el8fdp.x86_64
rhosp-openvswitch-ovn-host-2.11-0.6.el8ost.noarch

Comment 26 Eran Kuris 2020-07-13 07:59:04 UTC
RHOS_TRUNK-16.0-RHEL-8-20200706.n.0 this is the puddle that was verified

Comment 27 Brian Haley 2020-11-03 17:41:09 UTC
Since this fix has been verified, bug can be closed.

Comment 28 stchen 2020-11-04 20:38:22 UTC
Closing EOL, OSP 16.0 has been retired as of Oct 27, 2020