1289995 – Vxlan tunnel to the controller is not created on a compute node after perform multiple vm autoevacuations

Bug 1289995 - Vxlan tunnel to the controller is not created on a compute node after perform multiple vm autoevacuations

Summary: Vxlan tunnel to the controller is not created on a compute node after perform...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-neutron
Sub Component:
Version:	6.0 (Juno)
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	async
Target Release:	6.0 (Juno)
Assignee:	anil venkata
QA Contact:	Alexander Stafeyev
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1289988 (view as bug list)
Depends On:
Blocks:	1328504 1328506
TreeView+	depends on / blocked

Reported:	2015-12-09 13:26 UTC by Anand Nande
Modified:	2023-02-22 23:02 UTC (History)
CC List:	17 users (show)
Fixed In Version:	openstack-neutron-2014.2.3-32.el7ost
Doc Type:	Bug Fix
Doc Text:	Previously, OpenStack Networking ports would sometimes become stuck in the BUILD state during migration of an instance. Because Layer 2 population uses the port state to determine if there are ports in an ACTIVE state on a node, and uses that metric to decide if a tunnel can be formed to that node, networking connectivity between nodes would be broken because a tunnel could not be formed correctly. Previously, when an OVS agent retrieves information about a port during a migration, it would cause the port to enter the BUILD state if performed at a specific timing. Now, this happens only if the port is unbound, or if the port is bound to the host requesting the information, so the port does not enter and stay in the BUILD state.
Clone Of:
Clones:	1328504 1328506 (view as bug list)
Environment:
Last Closed:	2016-05-24 14:53:59 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
OpenStack gerrit	163178	None	MERGED	ML2: Change port status only when it's bound to the host	2020-07-02 11:44:04 UTC
OpenStack gerrit	272566	None	MERGED	Ensure that tunnels are fully reset on ovs restart	2020-07-02 11:44:04 UTC
Red Hat Product Errata	RHBA-2016:1104	normal	SHIPPED_LIVE	openstack-neutron bug fix advisory	2016-05-24 18:53:36 UTC

Comment 2 Anand Nande 2015-12-09 13:30:48 UTC

*** Bug 1289988 has been marked as a duplicate of this bug. ***

Comment 4 Assaf Muller 2015-12-18 04:11:39 UTC

A work around would be to disable l2pop entirely in the cluster. Was this brought up with the customer?

Comment 5 Assaf Muller 2015-12-18 04:24:37 UTC

@Anil - Do you think this patch could explain the issue? It's the only L2pop fix I could find that seems relevant that is not already available in OSP 5.

Comment 6 GE Scott Knauss 2015-12-18 08:36:38 UTC

Assaf, Anil,


Nokia would gladly test this if we can make it available to them.  Please let me know as soon as possible as this issue is now blocking for Nokia.

Thank you,
Scott

Comment 7 GE Scott Knauss 2015-12-18 09:04:35 UTC

Assaf, Anil, Anande,
     It seems this bug was mis-filed. The customer having this issue is on OSP6 not OSP5. I've modified the BZ appropriately.

-Scott

Comment 8 Assaf Muller 2015-12-18 16:14:44 UTC

The patch I linked may be relevant. It's available from 2014.2.4 and the customer is on 2014.2.3-9. Can we try applying that patch, or doing a minor upgrade and see if it helps?

Comment 9 anil venkata 2015-12-18 19:57:35 UTC

Agree with Assaf.
"l2pop", which creates tunnels, in this case is called when either get_bound_port_context(through _commit_port_binding) or update_port_status called.
Can you please try Assaf's suggestion?
Can we get access to the setup to debug further?

Comment 10 GE Scott Knauss 2015-12-21 05:31:58 UTC

Anil, 
   By "access to the system, do you mean "Bomgar" ? If so, can you coordinate with customer through the SFDC/portal case 01514466? I know they will be available for this today.

Comment 15 Assaf Muller 2015-12-24 16:22:40 UTC

The patch is already available in 2014.2.4, just have them perform a minor upgrade.

Comment 19 Assaf Muller 2016-01-08 17:13:47 UTC

My mistake, we haven't released nor provided a build based on 2014.2.4. We'll work on it.

Comment 38 anil venkata 2016-02-02 10:09:44 UTC

Is the customer using multiple rpc and api workers?
If so, there is no fix for now and only solution is disable l2pop.
A bug is already raised for this https://review.openstack.org/#/c/269212/

Even with single api and rpc workers also the same issue can be seen during bulk migrations. So better to disable l2pop for now to solve migration issues.

Comment 39 anil venkata 2016-02-02 10:13:55 UTC

link to bug https://bugzilla.redhat.com/show_bug.cgi?id=1289995

Comment 45 anil venkata 2016-02-04 12:50:09 UTC

Martin Schuppert,
Can you please provide a build to customer with below patches 
https://review.openstack.org/#/c/272566
https://review.openstack.org/#/c/163178/ 
and let us know if it solves this issue?

Comment 47 anil venkata 2016-02-05 05:24:56 UTC

Thanks Martin Schuppert 
No objections. If the patch is solving this issue, we will take the patch into OSP6. Can you please share your build with customer and ask them to test?

Thanks
Anil

Comment 51 anil venkata 2016-02-29 14:58:24 UTC

Anand and  Martin Schuppert

Any update on this bug?

Thanks
Anil

Comment 71 Alexander Stafeyev 2016-05-15 05:54:54 UTC

Verified that the test is in. 

rpm -qa | grep neutr
openstack-neutron-openvswitch-2014.2.3-37.el7ost.noarch
python-neutronclient-2.3.9-2.el7ost.noarch
openstack-neutron-common-2014.2.3-37.el7ost.noarch
openstack-neutron-2014.2.3-37.el7ost.noarch
openstack-neutron-ml2-2014.2.3-37.el7ost.noarch
python-neutron-2014.2.3-37.el7ost.noarch

Comment 72 Alexander Stafeyev 2016-05-15 05:55:42 UTC

(In reply to Alexander Stafeyev from comment #71)
> Verified that the test is in. 
> 
> rpm -qa | grep neutr
> openstack-neutron-openvswitch-2014.2.3-37.el7ost.noarch
> python-neutronclient-2.3.9-2.el7ost.noarch
> openstack-neutron-common-2014.2.3-37.el7ost.noarch
> openstack-neutron-2014.2.3-37.el7ost.noarch
> openstack-neutron-ml2-2014.2.3-37.el7ost.noarch
> python-neutron-2014.2.3-37.el7ost.noarch

That the CODE***** is in . 

Sorry :)

Comment 74 errata-xmlrpc 2016-05-24 14:53:59 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1104.html

Note You need to log in before you can comment on or make changes to this bug.