Bug 1541494

Summary: neutron-ovs-cleanup failing when there are too many ports
Product: Red Hat OpenStack Reporter: Terry Wilson <twilson>
Component: openstack-neutronAssignee: Slawek Kaplonski <skaplons>
Status: CLOSED ERRATA QA Contact: Roee Agiman <ragiman>
Severity: high Docs Contact:
Priority: high    
Version: 12.0 (Pike)CC: amuller, chrisw, dalvarez, dmacpher, jlibosva, nyechiel, pmorey, ragiman, skaplons, srevivo, tfreger, twilson
Target Milestone: z3Keywords: Triaged, ZStream
Target Release: 12.0 (Pike)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-neutron-11.0.2-6.el7ost Doc Type: If docs needed, set a value
Doc Text:
neutron-ovs-cleanup exceeded the three hour timeout on large OVS databases. This left the OVS database only partially cleaned. This fix changes how the cleanup function looks up ports on an OVS bridge. This significantly reduces the time needed to clean the database.
Story Points: ---
Clone Of: 1528325
: 1541496 (view as bug list) Environment:
Last Closed: 2018-08-20 12:51:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1528325, 1548460    
Bug Blocks: 1541496    

Description Terry Wilson 2018-02-02 17:31:26 UTC
+++ This bug was initially created as a clone of Bug #1528325 +++

Description of problem:

When OVS database is large, ovs-cleanup script times out and fails to clean the ports. In large environments this can be an issue because when there's leftovers they won't get cleaned up.


Version-Release number of selected component (if applicable):
openstack-neutron-10.0.3-1.el7ost.noarch 


Actual results:

2017-12-12 16:14:18.114 104948 INFO neutron.common.config [-] Logging enabled!
2017-12-12 16:14:18.114 104948 INFO neutron.common.config [-] /usr/bin/neutron-ovs-cleanup version 10.0.3
2017-12-12 16:14:18.314 104948 INFO neutron.agent.ovsdb.native.vlog [-] tcp:127.0.0.1:6640: connecting...
2017-12-12 16:14:18.314 104948 INFO neutron.agent.ovsdb.native.vlog [-] tcp:127.0.0.1:6640: connected
2017-12-12 16:14:37.369 104948 ERROR neutron.agent.ovsdb.native.commands [-] Error executing command
2017-12-12 16:14:37.369 104948 ERROR neutron.agent.ovsdb.native.commands Traceback (most recent call last):
2017-12-12 16:14:37.369 104948 ERROR neutron.agent.ovsdb.native.commands   File "/usr/lib/python2.7/site-packages/neutron/agent/ovsdb/native/commands.py", line 36, in execute
2017-12-12 16:14:37.369 104948 ERROR neutron.agent.ovsdb.native.commands     txn.add(self)
2017-12-12 16:14:37.369 104948 ERROR neutron.agent.ovsdb.native.commands   File "/usr/lib/python2.7/site-packages/neutron/agent/ovsdb/api.py", line 79, in __exit__
2017-12-12 16:14:37.369 104948 ERROR neutron.agent.ovsdb.native.commands     self.result = self.commit()
2017-12-12 16:14:37.369 104948 ERROR neutron.agent.ovsdb.native.commands   File "/usr/lib/python2.7/site-packages/neutron/agent/ovsdb/impl_idl.py", line 73, in commit
2017-12-12 16:14:37.369 104948 ERROR neutron.agent.ovsdb.native.commands     'timeout': self.timeout})
2017-12-12 16:14:37.369 104948 ERROR neutron.agent.ovsdb.native.commands TimeoutException: Commands [DbListCommand(if_exists=True, records=[u'ha-f9af3f28-c8', u'tap6f51366b-7f', u'tap9d6b4ac9-ea', u'tapc9bccd94-00', u'tap29ea8fb2-9d', 
.....
88aacf8d-2a', u'tap54e8b635-fc', u'tapa46a4411-41', u'ha-abc0de78-ae', u'tap48d691ea-f1', u'tap52afe9d6-62'], table=Interface, columns=['name', 'external_ids', 'ofport'])] exceeded timeout 10 seconds
2017-12-12 16:14:38.696 104948 ERROR neutron 


Additional info:

* Total ports are 4965:
$ cat ovs-vsctl_-t_5_show  | grep Port | wc -l
4965

* qr ports are 260:
$ cat ovs-vsctl_-t_5_show  | grep Port | grep "qr-" | wc -l
260

* qg ports are 260:
$ cat ovs-vsctl_-t_5_show  | grep Port | grep "qg-" | wc -l
263

* tap ports (DHCP) are 3312:
$ cat ovs-vsctl_-t_5_show  | grep Port | grep "tap" | wc -l
3312

* ha ports are 1107:
$ cat ovs-vsctl_-t_5_show  | grep Port | grep "ha-" | wc -l
1107

* vxlan ports are 15:
$ cat ovs-vsctl_-t_5_show  | grep Port | grep "vxlan" | wc -l
15

--- Additional comment from Jakub Libosvar on 2018-02-01 10:08:07 EST ---

Terry has already patch up for review in upstream

Comment 7 Toni Freger 2018-03-20 12:28:33 UTC
Tested on compute with latest puddle, openstack-neutron-11.0.2-1.el7ost.noarch

1. 5000 ports created successfully -  
<for t in {0..5000} ;do ovs-vsctl add-port br-int port$t -- set Interface port$t type=internal -- set Interface port$t external_ids:attached-mac="ff:ff:ff:ff:ff:ff" -- set Interface port$t external_ids:iface-id="3ff9912f-8491-4a15-af9c-46af19df3dd6" ;echo port$t; done>

2. Run: neutron-ovs-cleanup --debug

Ports not deleted.

Guru meditation now registers SIGUSR1 and SIGUSR2 by default for backward compatibility. SIGUSR1 will no longer be registered in a future release, so please use SIGUSR2 to generate reports.
2018-03-20 12:23:30.941 750385 INFO neutron.common.config [-] Logging enabled!
2018-03-20 12:23:30.941 750385 INFO neutron.common.config [-] /bin/neutron-ovs-cleanup version 11.0.2
2018-03-20 12:23:30.941 750385 DEBUG neutron.common.config [-] command line: /bin/neutron-ovs-cleanup --debug setup_logging /usr/lib/python2.7/site-packages/neutron/common/config.py:104
2018-03-20 12:23:46.493 750385 WARNING ovsdbapp.backend.ovs_idl.vlog [-] tcp:127.0.0.1:6640: send error: Broken pipe
2018-03-20 12:23:46.493 750385 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn command(idx=0): ListBridgesCommand() do_commit /usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84
2018-03-20 12:23:46.493 750385 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Transaction caused no change do_commit /usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:107
2018-03-20 12:23:46.494 750385 WARNING ovsdbapp.backend.ovs_idl.vlog [-] tcp:127.0.0.1:6640: connection dropped (Broken pipe)
2018-03-20 12:23:46.494 750385 INFO neutron.cmd.ovs_cleanup [-] Cleaning bridge: br-int
2018-03-20 12:23:46.494 750385 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn command(idx=0): OvsCleanup(bridge=br-int, all_ports=False) do_commit /usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84
2018-03-20 12:23:47.596 750385 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] OVSDB transaction returned TRY_AGAIN, retrying do_commit /usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:93
2018-03-20 12:23:47.597 750385 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn command(idx=0): OvsCleanup(bridge=br-int, all_ports=False) do_commit /usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84
2018-03-20 12:23:48.730 750385 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] OVSDB transaction returned TRY_AGAIN, retrying do_commit /usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:93
2018-03-20 12:23:48.731 750385 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn command(idx=0): OvsCleanup(bridge=br-int, all_ports=False) do_commit /usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84
2018-03-20 12:23:49.934 750385 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] OVSDB transaction returned TRY_AGAIN, retrying do_commit /usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:93
2018-03-20 12:23:49.935 750385 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn command(idx=0): OvsCleanup(bridge=br-int, all_ports=False) do_commit /usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84
2018-03-20 12:23:51.069 750385 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] OVSDB transaction returned TRY_AGAIN, retrying do_commit /usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:93
2018-03-20 12:23:51.069 750385 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn command(idx=0): OvsCleanup(bridge=br-int, all_ports=False) do_commit /usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84
2018-03-20 12:23:52.199 750385 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] OVSDB transaction returned TRY_AGAIN, retrying do_commit /usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:93
2018-03-20 12:23:52.200 750385 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn command(idx=0): OvsCleanup(bridge=br-int, all_ports=False) do_commit /usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84
2018-03-20 12:23:53.312 750385 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] OVSDB transaction returned TRY_AGAIN, retrying do_commit /usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:93
2018-03-20 12:23:53.313 750385 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn command(idx=0): OvsCleanup(bridge=br-int, all_ports=False) do_commit /usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84
2018-03-20 12:23:54.412 750385 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] OVSDB transaction returned TRY_AGAIN, retrying do_commit /usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:93
2018-03-20 12:23:54.413 750385 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn command(idx=0): OvsCleanup(bridge=br-int, all_ports=False) do_commit /usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84
2018-03-20 12:23:55.501 750385 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] OVSDB transaction returned TRY_AGAIN, retrying do_commit /usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:93
2018-03-20 12:23:55.501 750385 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn command(idx=0): OvsCleanup(bridge=br-int, all_ports=False) do_commit /usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84
2018-03-20 12:23:56.544 750385 ERROR ovsdbapp.backend.ovs_idl.command [-] Error executing command: TimeoutException: Commands [<neutron.agent.ovsdb.impl_idl.OvsCleanup object at 0xe68e050>] exceeded timeout 10 seconds
2018-03-20 12:23:56.544 750385 ERROR ovsdbapp.backend.ovs_idl.command Traceback (most recent call last):
2018-03-20 12:23:56.544 750385 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/command.py", line 30, in execute
2018-03-20 12:23:56.544 750385 ERROR ovsdbapp.backend.ovs_idl.command     txn.add(self)
2018-03-20 12:23:56.544 750385 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib64/python2.7/contextlib.py", line 24, in __exit__
2018-03-20 12:23:56.544 750385 ERROR ovsdbapp.backend.ovs_idl.command     self.gen.next()
2018-03-20 12:23:56.544 750385 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python2.7/site-packages/ovsdbapp/api.py", line 94, in transaction
2018-03-20 12:23:56.544 750385 ERROR ovsdbapp.backend.ovs_idl.command     self._nested_txn = None
2018-03-20 12:23:56.544 750385 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python2.7/site-packages/ovsdbapp/api.py", line 54, in __exit__
2018-03-20 12:23:56.544 750385 ERROR ovsdbapp.backend.ovs_idl.command     self.result = self.commit()
2018-03-20 12:23:56.544 750385 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 57, in commit
2018-03-20 12:23:56.544 750385 ERROR ovsdbapp.backend.ovs_idl.command     timeout=self.timeout)
2018-03-20 12:23:56.544 750385 ERROR ovsdbapp.backend.ovs_idl.command TimeoutException: Commands [<neutron.agent.ovsdb.impl_idl.OvsCleanup object at 0xe68e050>] exceeded timeout 10 seconds
2018-03-20 12:23:56.544 750385 ERROR ovsdbapp.backend.ovs_idl.command 
2018-03-20 12:23:56.687 750385 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] OVSDB transaction returned TRY_AGAIN, retrying do_commit /usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:93
2018-03-20 12:23:56.688 750385 CRITICAL neutron [-] Unhandled error: TimeoutException: Commands [<neutron.agent.ovsdb.impl_idl.OvsCleanup object at 0xe68e050>] exceeded timeout 10 seconds
2018-03-20 12:23:56.688 750385 ERROR neutron Traceback (most recent call last):
2018-03-20 12:23:56.688 750385 ERROR neutron   File "/bin/neutron-ovs-cleanup", line 10, in <module>
2018-03-20 12:23:56.688 750385 ERROR neutron     sys.exit(main())
2018-03-20 12:23:56.688 750385 ERROR neutron   File "/usr/lib/python2.7/site-packages/neutron/cmd/ovs_cleanup.py", line 87, in main
2018-03-20 12:23:56.688 750385 ERROR neutron     do_main(conf)
2018-03-20 12:23:56.688 750385 ERROR neutron   File "/usr/lib/python2.7/site-packages/neutron/cmd/ovs_cleanup.py", line 107, in do_main
2018-03-20 12:23:56.688 750385 ERROR neutron     conf.ovs_all_ports).execute(check_error=True)
2018-03-20 12:23:56.688 750385 ERROR neutron   File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/command.py", line 30, in execute
2018-03-20 12:23:56.688 750385 ERROR neutron     txn.add(self)
2018-03-20 12:23:56.688 750385 ERROR neutron   File "/usr/lib64/python2.7/contextlib.py", line 24, in __exit__
2018-03-20 12:23:56.688 750385 ERROR neutron     self.gen.next()
2018-03-20 12:23:56.688 750385 ERROR neutron   File "/usr/lib/python2.7/site-packages/ovsdbapp/api.py", line 94, in transaction
2018-03-20 12:23:56.688 750385 ERROR neutron     self._nested_txn = None
2018-03-20 12:23:56.688 750385 ERROR neutron   File "/usr/lib/python2.7/site-packages/ovsdbapp/api.py", line 54, in __exit__
2018-03-20 12:23:56.688 750385 ERROR neutron     self.result = self.commit()
2018-03-20 12:23:56.688 750385 ERROR neutron   File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 57, in commit
2018-03-20 12:23:56.688 750385 ERROR neutron     timeout=self.timeout)
2018-03-20 12:23:56.688 750385 ERROR neutron TimeoutException: Commands [<neutron.agent.ovsdb.impl_idl.OvsCleanup object at 0xe68e050>] exceeded timeout 10 seconds

Comment 9 Terry Wilson 2018-04-12 13:44:27 UTC
For 5000 ports you will still probably need to bump the timeout up past 10 seconds. Maybe we could just override the timeout pretty high in neutron-ovs-cleanup? Anyway, can you try with a 1 minute timeout?

Comment 17 errata-xmlrpc 2018-08-20 12:51:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2514