Bug 1655177
| Summary: | OSP 13, neutron ml2-ovs, DPDK - stop/wait/start of nova instance sometimes deletes tbr and yields Error executing command: RowNotFound: Cannot find Bridge with name=tbr-f4554113-b | |||
|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Andreas Karis <akaris> | |
| Component: | openstack-neutron | Assignee: | Nate Johnston <njohnston> | |
| Status: | CLOSED ERRATA | QA Contact: | Yariv <yrachman> | |
| Severity: | urgent | Docs Contact: | ||
| Priority: | urgent | |||
| Version: | 13.0 (Queens) | CC: | akaris, amuller, atelang, bhaley, bshephar, chrisw, dalvarez, jappleii, jjoyce, jraju, jschluet, mariel, msufiyan, njohnston, sandyada, skaplons, supadhya, yrachman | |
| Target Milestone: | z4 | Keywords: | TestOnly, Triaged, ZStream | |
| Target Release: | 13.0 (Queens) | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | openstack-neutron-12.0.5-3.el7ost | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1810451 (view as bug list) | Environment: | ||
| Last Closed: | 2019-03-14 13:51:31 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
|
Description
Andreas Karis
2018-11-30 21:08:53 UTC
[akaris@collab-shell compute-1]$ grep neutron installed-rpms openstack-neutron-12.0.4-2.el7ost.noarch Wed Nov 7 22:04:28 2018 openstack-neutron-common-12.0.4-2.el7ost.noarch Wed Nov 7 22:04:20 2018 openstack-neutron-l2gw-agent-12.0.2-0.20180412115803.a9f8009.el7ost.noarch Wed Nov 7 22:04:31 2018 openstack-neutron-lbaas-12.0.1-0.20180825022802.ed663f3.el7ost.noarch Wed Nov 7 22:04:29 2018 openstack-neutron-lbaas-ui-4.0.1-0.20180914200335.dc7d9e7.el7ost.noarch Wed Nov 7 22:04:50 2018 openstack-neutron-linuxbridge-12.0.4-2.el7ost.noarch Wed Nov 7 22:04:33 2018 openstack-neutron-metering-agent-12.0.4-2.el7ost.noarch Wed Nov 7 22:04:32 2018 openstack-neutron-ml2-12.0.4-2.el7ost.noarch Wed Nov 7 22:04:21 2018 openstack-neutron-openvswitch-12.0.4-2.el7ost.noarch Wed Nov 7 22:04:33 2018 openstack-neutron-sriov-nic-agent-12.0.4-2.el7ost.noarch Wed Nov 7 22:04:32 2018 puppet-neutron-12.4.1-1.3aa3109git.el7ost.noarch Wed Nov 7 22:07:52 2018 python2-neutronclient-6.7.0-1.el7ost.noarch Wed Nov 7 22:03:25 2018 python2-neutron-lib-1.13.0-1.el7ost.noarch Wed Nov 7 22:03:28 2018 python-neutron-12.0.4-2.el7ost.noarch Wed Nov 7 22:04:19 2018 python-neutron-lbaas-12.0.1-0.20180825022802.ed663f3.el7ost.noarch Wed Nov 7 22:04:22 2018 [akaris@collab-shell compute-1]$ Test 1 - reproduce issue:
~~~
nova stop vsbg_1_PL-1
~~~
~~~
[root@compute-1 ~]# date
Fri Nov 30 21:02:23 CET 2018
[root@compute-1 ~]# ovs-vsctl show | grep tbr
Bridge "tbr-c7d83e51-b"
Port "tbr-c7d83e51-b"
Interface "tbr-c7d83e51-b"
Bridge "tbr-f4554113-b"
Port "tbr-f4554113-b"
Interface "tbr-f4554113-b"
[root@compute-1 ~]# ovs-vsctl show | grep tbr
Bridge "tbr-c7d83e51-b"
Port "tbr-c7d83e51-b"
Interface "tbr-c7d83e51-b"
Bridge "tbr-f4554113-b"
Port "tbr-f4554113-b"
Interface "tbr-f4554113-b"
[root@compute-1 ~]# grep "tbr-f4554113-b" /var/log/openvswitch/ovs* | tail -n 5
/var/log/openvswitch/ovs-vswitchd.log:2018-11-30T20:01:51.679Z|02319|bridge|INFO|bridge tbr-f4554113-b: added interface vhu44d242e1-30 on port 1
/var/log/openvswitch/ovs-vswitchd.log:2018-11-30T20:01:53.028Z|02320|bridge|INFO|bridge tbr-f4554113-b: added interface tpt-44d242e1-30 on port 2
/var/log/openvswitch/ovs-vswitchd.log:2018-11-30T20:01:54.324Z|02322|bridge|INFO|bridge tbr-f4554113-b: added interface spt-7df8776f-4e on port 3
/var/log/openvswitch/ovs-vswitchd.log:2018-11-30T20:01:54.342Z|02323|bridge|INFO|bridge tbr-f4554113-b: added interface spt-e7476312-fc on port 4
/var/log/openvswitch/ovs-vswitchd.log:2018-11-30T20:01:54.350Z|02324|bridge|INFO|bridge tbr-f4554113-b: added interface spt-605ebd34-0e on port 5
~~~
~~~
nova stop vsbg_1_PL-1
~~~
This deletion here is not normal:
~~~
[root@compute-1 ~]# date
Fri Nov 30 21:02:54 CET 2018
[root@compute-1 ~]# grep "tbr-f4554113-b" /var/log/openvswitch/ovs* | tail -n 5
/var/log/openvswitch/ovs-vswitchd.log:2018-11-30T20:03:18.986Z|02529|bridge|INFO|bridge tbr-f4554113-b: deleted interface spt-7df8776f-4e on port 3
/var/log/openvswitch/ovs-vswitchd.log:2018-11-30T20:03:19.021Z|02531|bridge|INFO|bridge tbr-f4554113-b: deleted interface spt-e7476312-fc on port 4
/var/log/openvswitch/ovs-vswitchd.log:2018-11-30T20:03:19.113Z|02533|bridge|INFO|bridge tbr-f4554113-b: deleted interface tpt-44d242e1-30 on port 2
/var/log/openvswitch/ovs-vswitchd.log:2018-11-30T20:03:19.114Z|02534|bridge|INFO|bridge tbr-f4554113-b: deleted interface tbr-f4554113-b on port 65534
/var/log/openvswitch/ovs-vswitchd.log:2018-11-30T20:03:19.114Z|02535|bridge|INFO|bridge tbr-f4554113-b: deleted interface vhu44d242e1-30 on port 6
[root@compute-1 ~]# date
Fri Nov 30 21:03:31 CET 2018
[root@compute-1 ~]# grep "tbr-f4554113-b" /var/log/openvswitch/ovs* | tail -n 5
/var/log/openvswitch/ovs-vswitchd.log:2018-11-30T20:03:18.986Z|02529|bridge|INFO|bridge tbr-f4554113-b: deleted interface spt-7df8776f-4e on port 3
/var/log/openvswitch/ovs-vswitchd.log:2018-11-30T20:03:19.021Z|02531|bridge|INFO|bridge tbr-f4554113-b: deleted interface spt-e7476312-fc on port 4
/var/log/openvswitch/ovs-vswitchd.log:2018-11-30T20:03:19.113Z|02533|bridge|INFO|bridge tbr-f4554113-b: deleted interface tpt-44d242e1-30 on port 2
/var/log/openvswitch/ovs-vswitchd.log:2018-11-30T20:03:19.114Z|02534|bridge|INFO|bridge tbr-f4554113-b: deleted interface tbr-f4554113-b on port 65534
/var/log/openvswitch/ovs-vswitchd.log:2018-11-30T20:03:19.114Z|02535|bridge|INFO|bridge tbr-f4554113-b: deleted interface vhu44d242e1-30 on port 6
[root@compute-1 ~]# virsh list
Id Name State
----------------------------------------------------
4 instance-00000035 running
9 instance-00000038 paused
[root@compute-1 ~]# grep "tbr-f4554113-b" /var/log/openvswitch/ovs* | tail -n 5
/var/log/openvswitch/ovs-vswitchd.log:2018-11-30T20:03:18.986Z|02529|bridge|INFO|bridge tbr-f4554113-b: deleted interface spt-7df8776f-4e on port 3
/var/log/openvswitch/ovs-vswitchd.log:2018-11-30T20:03:19.021Z|02531|bridge|INFO|bridge tbr-f4554113-b: deleted interface spt-e7476312-fc on port 4
/var/log/openvswitch/ovs-vswitchd.log:2018-11-30T20:03:19.113Z|02533|bridge|INFO|bridge tbr-f4554113-b: deleted interface tpt-44d242e1-30 on port 2
/var/log/openvswitch/ovs-vswitchd.log:2018-11-30T20:03:19.114Z|02534|bridge|INFO|bridge tbr-f4554113-b: deleted interface tbr-f4554113-b on port 65534
/var/log/openvswitch/ovs-vswitchd.log:2018-11-30T20:03:19.114Z|02535|bridge|INFO|bridge tbr-f4554113-b: deleted interface vhu44d242e1-30 on port 6
[root@compute-1 ~]# grep "tbr-f4554113-b" /var/log/openvswitch/ovs* | tail -n 5
/var/log/openvswitch/ovs-vswitchd.log:2018-11-30T20:03:18.986Z|02529|bridge|INFO|bridge tbr-f4554113-b: deleted interface spt-7df8776f-4e on port 3
/var/log/openvswitch/ovs-vswitchd.log:2018-11-30T20:03:19.021Z|02531|bridge|INFO|bridge tbr-f4554113-b: deleted interface spt-e7476312-fc on port 4
/var/log/openvswitch/ovs-vswitchd.log:2018-11-30T20:03:19.113Z|02533|bridge|INFO|bridge tbr-f4554113-b: deleted interface tpt-44d242e1-30 on port 2
/var/log/openvswitch/ovs-vswitchd.log:2018-11-30T20:03:19.114Z|02534|bridge|INFO|bridge tbr-f4554113-b: deleted interface tbr-f4554113-b on port 65534
/var/log/openvswitch/ovs-vswitchd.log:2018-11-30T20:03:19.114Z|02535|bridge|INFO|bridge tbr-f4554113-b: deleted interface vhu44d242e1-30 on port
[root@compute-1 ~]# grep ERROR /var/log/containers/neutron/openvswitch-agent.log | tail -n10
2018-11-30 21:03:20.466 61270 ERROR ovsdbapp.backend.ovs_idl.command self.gen.next()
2018-11-30 21:03:20.466 61270 ERROR ovsdbapp.backend.ovs_idl.command File "/usr/lib/python2.7/site-packages/ovsdbapp/api.py", line 94, in transaction
2018-11-30 21:03:20.466 61270 ERROR ovsdbapp.backend.ovs_idl.command self._nested_txn = None
2018-11-30 21:03:20.466 61270 ERROR ovsdbapp.backend.ovs_idl.command File "/usr/lib/python2.7/site-packages/ovsdbapp/api.py", line 54, in __exit__
2018-11-30 21:03:20.466 61270 ERROR ovsdbapp.backend.ovs_idl.command self.result = self.commit()
2018-11-30 21:03:20.466 61270 ERROR ovsdbapp.backend.ovs_idl.command File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 62, in commit
2018-11-30 21:03:20.466 61270 ERROR ovsdbapp.backend.ovs_idl.command raise result.ex
2018-11-30 21:03:20.466 61270 ERROR ovsdbapp.backend.ovs_idl.command RowNotFound: Cannot find Bridge with name=tbr-f4554113-b
2018-11-30 21:03:20.466 61270 ERROR ovsdbapp.backend.ovs_idl.command
2018-11-30 21:03:20.467 61270 ERROR neutron.services.trunk.drivers.openvswitch.agent.ovsdb_handler [-] Failed to store metadata for trunk f4554113-b695-4e7d-b550-6927765c6679: Cannot find Bridge with name=tbr-f4554113-b: RowNotFound: Cannot find Bridge with name=tbr-f4554113-b
[root@compute-1 ~]# date
Fri Nov 30 21:06:21 CET 2018
[root@compute-1 ~]#
~~~
Test 2 - do not reproduce issue, expected behavior: [root@controller-0 EnvironmentFiles]# date Fri Nov 30 21:14:58 CET 2018 (overcloud-Queens) [root@controller-0 EnvironmentFiles]# nova stop vsbg_1_PL-1 Request to stop server vsbg_1_PL-1 has been accepted. (overcloud-Queens) [root@controller-0 EnvironmentFiles]# nova list | grep PL-1 | 1f23fafb-e370-4e53-9bf1-8d5b1d8f7387 | vsbg_1_PL-1 | ACTIVE | powering-off | Running | vsbg_conf_sp=10.2.0.26; vsbg_sig_sp=10.1.0.12; vsbg_om_sp=192.168.158.92; vsbg_1_vsbg_int_sp=192.168.0.29 | (overcloud-Queens) [root@controller-0 EnvironmentFiles]# nova list | grep PL-1 | 1f23fafb-e370-4e53-9bf1-8d5b1d8f7387 | vsbg_1_PL-1 | SHUTOFF | - | Shutdown | vsbg_conf_sp=10.2.0.26; vsbg_sig_sp=10.1.0.12; vsbg_om_sp=192.168.158.92; vsbg_1_vsbg_int_sp=192.168.0.29 | (overcloud-Queens) [root@controller-0 EnvironmentFiles]# date Fri Nov 30 21:15:19 CET 2018 (overcloud-Queens) [root@controller-0 EnvironmentFiles]# nova start vsbg_1_PL-1 Request to start server vsbg_1_PL-1 has been accepted. (overcloud-Queens) [root@controller-0 EnvironmentFiles]# nova list | grep PL-1 | 1f23fafb-e370-4e53-9bf1-8d5b1d8f7387 | vsbg_1_PL-1 | SHUTOFF | powering-on | Shutdown | vsbg_conf_sp=10.2.0.26; vsbg_sig_sp=10.1.0.12; vsbg_om_sp=192.168.158.92; vsbg_1_vsbg_int_sp=192.168.0.29 | (overcloud-Queens) [root@controller-0 EnvironmentFiles]# nova list | grep PL-1 | 1f23fafb-e370-4e53-9bf1-8d5b1d8f7387 | vsbg_1_PL-1 | ACTIVE | - | Running | vsbg_conf_sp=10.2.0.26; vsbg_sig_sp=10.1.0.12; vsbg_om_sp=192.168.158.92; vsbg_1_vsbg_int_sp=192.168.0.29 | (overcloud-Queens) [root@controller-0 EnvironmentFiles]# date Fri Nov 30 21:16:29 CET 2018 (overcloud-Queens) [root@controller-0 EnvironmentFiles]# A correction to this issue description. The customer made all VMs up and running and with trunk port attached, then stopped all VMs, checked tbrs: all tbr bridges were there; he waited for 15 mins, checked again, all tbr bridges were still there then he started all VMs with nova and checked again: 1 tbr disappeared, it caused 1 PL VM to be stuck in "powering-on" status. Based on the above, we think that tbr disappearing only happens when nova starts the VMs. Determined the right codepath to mitigate this issue. Filed upstream bug #1807239 for working this issue upstream. Filed change 623275 against neutron upstream master. Can you provide the steps that are taken to create the trunk bridge and then create VMs on it? I want to make sure that when I verify this I am testing the scenario te customer is facing as closely as possible. Hi, What's the status wrt this bug? Thanks, Andreas At this point should this bug go into RELEASE_PENDING or CLOSED? Apologies if I betray my ignorance of the post-hotfix part of the process. Hi, I messed up the hotfix part, anyway. This is a containerized environment, and hotfixes should be delivered with a special method. However, the official release is soon and the customer agreed to wait for the official release. So please feel free to go forward as if the hotfix had not been issued and as if this was a normal ticket. - Andreas Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0555 |