Bug 1812009 - ovn_controller crashes after setup is up for a day
Summary: ovn_controller crashes after setup is up for a day
Keywords:
Status: CLOSED EOL
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-networking-ovn
Version: 16.0 (Train)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: ---
Assignee: Terry Wilson
QA Contact: Roman Safronov
URL:
Whiteboard:
Depends On: 1818844
Blocks: 1819604 1822542
TreeView+ depends on / blocked
 
Reported: 2020-03-10 11:29 UTC by Itzik Brown
Modified: 2020-11-04 20:38 UTC (History)
17 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1818844 (view as bug list)
Environment:
Last Closed: 2020-11-04 20:38:22 UTC
Target Upstream Version:


Attachments (Terms of Use)
last lines from ovn controller log (144.57 KB, text/plain)
2020-03-10 11:29 UTC, Itzik Brown
no flags Details
openvswitch logs (2.56 MB, application/octet-stream)
2020-03-10 15:12 UTC, Roman Safronov
no flags Details
ovn db (11.80 KB, application/gzip)
2020-03-10 16:29 UTC, Roman Safronov
no flags Details

Description Itzik Brown 2020-03-10 11:29:31 UTC
Created attachment 1668935 [details]
last lines from ovn controller log

Description of problem:
After a day,ovn_controller on a compute is stopped.

Version-Release number of selected component (if applicable):
RHOS_TRUNK-16.0-RHEL-8-20200226.n.1

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 3 Jakub Libosvar 2020-03-10 14:45:53 UTC
bridge not found for localnet port 'provnet-2804478a-b3af-4885-8470-5abc6ea5294a' with network name 'datacentre'

this sounds like somebody removed the provider bridge. The logs lack ovs-vswitchd.log and openvswitch commands to confirm provider bridge was present and bridge mappings configured correctly. Can you please provide such logs?

Comment 4 Roman Safronov 2020-03-10 15:12:26 UTC
Created attachment 1668973 [details]
openvswitch logs

Comment 8 Roman Safronov 2020-03-10 16:29:29 UTC
Created attachment 1668999 [details]
ovn db

Comment 12 Itzik Brown 2020-03-17 03:46:04 UTC
First we don't use DVR so I don't think mapping of br-ex in the compute is relevant here.
Second, Are you sure it's a duplicate of the other bug? We are not consuming FDP as far as I can tell

Comment 13 Jakub Libosvar 2020-03-17 08:03:12 UTC
(In reply to Itzik Brown from comment #12)
> First we don't use DVR so I don't think mapping of br-ex in the compute is
> relevant here.
Yes, that was a red herring.

> Second, Are you sure it's a duplicate of the other bug? We are not consuming
> FDP as far as I can tell

To the best of my knowledge, OVN is not cross-tagged for OSP16 but shipped via FDP channels. If you don't consume FDP, you can try to update ovn-controller to the version from comment 11 to see if it fixes the crashes for you. OVN version containing the fix was shipped with FDP 20.B.

Comment 16 Daniel Alvarez Sanchez 2020-03-17 19:16:03 UTC
The duplicated bug is not correct as it's just a BZ to track the move into OSP13. This BZ is against OSP16.
@Numan, @Kuba do we have the BZ against ovn2.11 that fixes the crash so that we can mark it as dup of this? Otherwise let's repoen this one and mark it as TestOnly.

Comment 17 Jakub Libosvar 2020-03-18 08:40:26 UTC
(In reply to Daniel Alvarez Sanchez from comment #16)
> The duplicated bug is not correct as it's just a BZ to track the move into
> OSP13. This BZ is against OSP16.
> @Numan, @Kuba do we have the BZ against ovn2.11 that fixes the crash so that
> we can mark it as dup of this? Otherwise let's repoen this one and mark it
> as TestOnly.

I don't have the original BZ against OVN. Re-opening this one as suggested.

Comment 25 Eran Kuris 2020-07-13 07:58:15 UTC
fix verified

+--------------------------------------+--------------+--------+------------+-------------+------------------------+
| ID                                   | Name         | Status | Task State | Power State | Networks               |
+--------------------------------------+--------------+--------+------------+-------------+------------------------+
| 08198253-8787-48a8-bda5-7619461959dd | compute-0    | ACTIVE | -          | Running     | ctlplane=192.168.24.34 |
| 49503a19-0af9-42fc-9fb3-0da31786285b | controller-0 | ACTIVE | -          | Running     | ctlplane=192.168.24.46 |
| 139a41fc-7439-4d2f-b3ed-ac8715e12d1c | controller-1 | ACTIVE | -          | Running     | ctlplane=192.168.24.50 |
| f385129c-6681-4bd8-a746-95be8577520f | controller-2 | ACTIVE | -          | Running     | ctlplane=192.168.24.43 |
+--------------------------------------+--------------+--------+------------+-------------+------------------------+
(failed reverse-i-search)`ssh hea': nova li^C
(undercloud) [stack@undercloud-0 ~]$ ssh heat-admin.24.34
Warning: Permanently added '192.168.24.34' (ECDSA) to the list of known hosts.
This system is not registered to Red Hat Insights. See https://cloud.redhat.com/
To register this system, run: insights-client --register

Last login: Mon Jul 13 06:22:20 2020 from 192.168.24.254
[heat-admin@compute-0 ~]$ sudo -s
[root@compute-0 heat-admin]# docker ps |grep ovn_con
Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
0015a2a95da9  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-ovn-controller:20200702.1              kolla_start           3 days ago   Up 3 days ago          ovn_controller


[root@controller-1 ~]# podman exec -it ovn-dbs-bundle-podman-1 /bin/bash
()[root@controller-1 /]# rpm -qa | grep ovn 
puppet-ovn-15.4.1-0.20200229002436.192ac4e.el8ost.noarch
rhosp-openvswitch-ovn-common-2.11-0.6.el8ost.noarch
rhosp-openvswitch-ovn-central-2.11-0.6.el8ost.noarch
ovn2.11-2.11.1-44.el8fdp.x86_64
ovn2.11-central-2.11.1-44.el8fdp.x86_64

[root@compute-0 ~]# podman exec -it ovn_controller /bin/bash
()[root@compute-0 /]# rpm -qa | grep ovn 
puppet-ovn-15.4.1-0.20200229002436.192ac4e.el8ost.noarch
rhosp-openvswitch-ovn-common-2.11-0.6.el8ost.noarch
ovn2.11-host-2.11.1-44.el8fdp.x86_64
ovn2.11-2.11.1-44.el8fdp.x86_64
rhosp-openvswitch-ovn-host-2.11-0.6.el8ost.noarch

Comment 26 Eran Kuris 2020-07-13 07:59:04 UTC
RHOS_TRUNK-16.0-RHEL-8-20200706.n.0 this is the puddle that was verified

Comment 27 Brian Haley 2020-11-03 17:41:09 UTC
Since this fix has been verified, bug can be closed.

Comment 28 stchen 2020-11-04 20:38:22 UTC
Closing EOL, OSP 16.0 has been retired as of Oct 27, 2020


Note You need to log in before you can comment on or make changes to this bug.