Bug 1541496

Summary: neutron-ovs-cleanup failing when there are too many ports
Product: Red Hat OpenStack Reporter: Terry Wilson <twilson>
Component: openstack-neutronAssignee: Terry Wilson <twilson>
Status: CLOSED ERRATA QA Contact: Toni Freger <tfreger>
Severity: high Docs Contact:
Priority: high    
Version: 13.0 (Queens)CC: amuller, bhaley, chrisw, dalvarez, fressi, jlibosva, jschluet, nyechiel, pmorey, ragiman, sclewis, srevivo, tfreger, twilson
Target Milestone: rcKeywords: Regression, Triaged
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-neutron-12.0.2-0.20180421011358.0ec54fd.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1541494 Environment:
Last Closed: 2018-06-27 13:43:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1528325, 1541494    
Bug Blocks:    

Description Terry Wilson 2018-02-02 17:36:50 UTC
+++ This bug was initially created as a clone of Bug #1541494 +++

+++ This bug was initially created as a clone of Bug #1528325 +++

Description of problem:

When OVS database is large, ovs-cleanup script times out and fails to clean the ports. In large environments this can be an issue because when there's leftovers they won't get cleaned up.


Version-Release number of selected component (if applicable):
openstack-neutron-10.0.3-1.el7ost.noarch 


Actual results:

2017-12-12 16:14:18.114 104948 INFO neutron.common.config [-] Logging enabled!
2017-12-12 16:14:18.114 104948 INFO neutron.common.config [-] /usr/bin/neutron-ovs-cleanup version 10.0.3
2017-12-12 16:14:18.314 104948 INFO neutron.agent.ovsdb.native.vlog [-] tcp:127.0.0.1:6640: connecting...
2017-12-12 16:14:18.314 104948 INFO neutron.agent.ovsdb.native.vlog [-] tcp:127.0.0.1:6640: connected
2017-12-12 16:14:37.369 104948 ERROR neutron.agent.ovsdb.native.commands [-] Error executing command
2017-12-12 16:14:37.369 104948 ERROR neutron.agent.ovsdb.native.commands Traceback (most recent call last):
2017-12-12 16:14:37.369 104948 ERROR neutron.agent.ovsdb.native.commands   File "/usr/lib/python2.7/site-packages/neutron/agent/ovsdb/native/commands.py", line 36, in execute
2017-12-12 16:14:37.369 104948 ERROR neutron.agent.ovsdb.native.commands     txn.add(self)
2017-12-12 16:14:37.369 104948 ERROR neutron.agent.ovsdb.native.commands   File "/usr/lib/python2.7/site-packages/neutron/agent/ovsdb/api.py", line 79, in __exit__
2017-12-12 16:14:37.369 104948 ERROR neutron.agent.ovsdb.native.commands     self.result = self.commit()
2017-12-12 16:14:37.369 104948 ERROR neutron.agent.ovsdb.native.commands   File "/usr/lib/python2.7/site-packages/neutron/agent/ovsdb/impl_idl.py", line 73, in commit
2017-12-12 16:14:37.369 104948 ERROR neutron.agent.ovsdb.native.commands     'timeout': self.timeout})
2017-12-12 16:14:37.369 104948 ERROR neutron.agent.ovsdb.native.commands TimeoutException: Commands [DbListCommand(if_exists=True, records=[u'ha-f9af3f28-c8', u'tap6f51366b-7f', u'tap9d6b4ac9-ea', u'tapc9bccd94-00', u'tap29ea8fb2-9d', 
.....
88aacf8d-2a', u'tap54e8b635-fc', u'tapa46a4411-41', u'ha-abc0de78-ae', u'tap48d691ea-f1', u'tap52afe9d6-62'], table=Interface, columns=['name', 'external_ids', 'ofport'])] exceeded timeout 10 seconds
2017-12-12 16:14:38.696 104948 ERROR neutron 


Additional info:

* Total ports are 4965:
$ cat ovs-vsctl_-t_5_show  | grep Port | wc -l
4965

* qr ports are 260:
$ cat ovs-vsctl_-t_5_show  | grep Port | grep "qr-" | wc -l
260

* qg ports are 260:
$ cat ovs-vsctl_-t_5_show  | grep Port | grep "qg-" | wc -l
263

* tap ports (DHCP) are 3312:
$ cat ovs-vsctl_-t_5_show  | grep Port | grep "tap" | wc -l
3312

* ha ports are 1107:
$ cat ovs-vsctl_-t_5_show  | grep Port | grep "ha-" | wc -l
1107

* vxlan ports are 15:
$ cat ovs-vsctl_-t_5_show  | grep Port | grep "vxlan" | wc -l
15

--- Additional comment from Jakub Libosvar on 2018-02-01 10:08:07 EST ---

Terry has already patch up for review in upstream

Comment 2 Jon Schlueter 2018-03-29 17:40:24 UTC
neutron patch is included in openstack-neutron-12.0.1-0.20180327195360.68b8980.el7ost

What else are we waiting on get this moved to MODIFIED?

Comment 3 Brian Haley 2018-03-29 19:37:56 UTC
Yes, I believe this should be in MODIFIED, updating.

Comment 6 Toni Freger 2018-04-09 09:07:36 UTC
Terry/ Brian 

I am moving this bug to assignee since I have already tested this fix, see https://bugzilla.redhat.com/show_bug.cgi?id=1541494
And the issue still reproduces.

Comment 8 Jakub Libosvar 2018-04-12 15:17:08 UTC
(In reply to Toni Freger from comment #6)
> Terry/ Brian 
> 
> I am moving this bug to assignee since I have already tested this fix, see
> https://bugzilla.redhat.com/show_bug.cgi?id=1541494
> And the issue still reproduces.

We'll need to override the default timeout value for the cleanup tool so it can handle 5000 ports.

Comment 18 errata-xmlrpc 2018-06-27 13:43:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086