Bug 1289995 - Vxlan tunnel to the controller is not created on a compute node after perform multiple vm autoevacuations
Vxlan tunnel to the controller is not created on a compute node after perform...
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-neutron (Show other bugs)
6.0 (Juno)
All Linux
high Severity high
: async
: 6.0 (Juno)
Assigned To: anil venkata
Alexander Stafeyev
: Reopened, Triaged, ZStream
: 1289988 (view as bug list)
Depends On:
Blocks: 1328504 1328506
  Show dependency treegraph
 
Reported: 2015-12-09 08:26 EST by Anand Nande
Modified: 2016-05-24 11:01 EDT (History)
19 users (show)

See Also:
Fixed In Version: openstack-neutron-2014.2.3-32.el7ost
Doc Type: Bug Fix
Doc Text:
Previously, OpenStack Networking ports would sometimes become stuck in the BUILD state during migration of an instance. Because Layer 2 population uses the port state to determine if there are ports in an ACTIVE state on a node, and uses that metric to decide if a tunnel can be formed to that node, networking connectivity between nodes would be broken because a tunnel could not be formed correctly. Previously, when an OVS agent retrieves information about a port during a migration, it would cause the port to enter the BUILD state if performed at a specific timing. Now, this happens only if the port is unbound, or if the port is bound to the host requesting the information, so the port does not enter and stay in the BUILD state.
Story Points: ---
Clone Of:
: 1328504 1328506 (view as bug list)
Environment:
Last Closed: 2016-05-24 10:53:59 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
OpenStack gerrit 163178 None None None 2015-12-17 23:11 EST
OpenStack gerrit 272566 None None None 2016-02-03 10:37 EST

  None (edit)
Comment 2 Anand Nande 2015-12-09 08:30:48 EST
*** Bug 1289988 has been marked as a duplicate of this bug. ***
Comment 4 Assaf Muller 2015-12-17 23:11:39 EST
A work around would be to disable l2pop entirely in the cluster. Was this brought up with the customer?
Comment 5 Assaf Muller 2015-12-17 23:24:37 EST
@Anil - Do you think this patch could explain the issue? It's the only L2pop fix I could find that seems relevant that is not already available in OSP 5.
Comment 6 GE Scott Knauss 2015-12-18 03:36:38 EST
Assaf, Anil,


Nokia would gladly test this if we can make it available to them.  Please let me know as soon as possible as this issue is now blocking for Nokia.

Thank you,
Scott
Comment 7 GE Scott Knauss 2015-12-18 04:04:35 EST
Assaf, Anil, Anande,
     It seems this bug was mis-filed. The customer having this issue is on OSP6 not OSP5. I've modified the BZ appropriately.

-Scott
Comment 8 Assaf Muller 2015-12-18 11:14:44 EST
The patch I linked may be relevant. It's available from 2014.2.4 and the customer is on 2014.2.3-9. Can we try applying that patch, or doing a minor upgrade and see if it helps?
Comment 9 anil venkata 2015-12-18 14:57:35 EST
Agree with Assaf.
"l2pop", which creates tunnels, in this case is called when either get_bound_port_context(through _commit_port_binding) or update_port_status called.
Can you please try Assaf's suggestion?
Can we get access to the setup to debug further?
Comment 10 GE Scott Knauss 2015-12-21 00:31:58 EST
Anil, 
   By "access to the system, do you mean "Bomgar" ? If so, can you coordinate with customer through the SFDC/portal case 01514466? I know they will be available for this today.
Comment 15 Assaf Muller 2015-12-24 11:22:40 EST
The patch is already available in 2014.2.4, just have them perform a minor upgrade.
Comment 19 Assaf Muller 2016-01-08 12:13:47 EST
My mistake, we haven't released nor provided a build based on 2014.2.4. We'll work on it.
Comment 38 anil venkata 2016-02-02 05:09:44 EST
Is the customer using multiple rpc and api workers?
If so, there is no fix for now and only solution is disable l2pop.
A bug is already raised for this https://review.openstack.org/#/c/269212/

Even with single api and rpc workers also the same issue can be seen during bulk migrations. So better to disable l2pop for now to solve migration issues.
Comment 39 anil venkata 2016-02-02 05:13:55 EST
link to bug https://bugzilla.redhat.com/show_bug.cgi?id=1289995
Comment 45 anil venkata 2016-02-04 07:50:09 EST
Martin Schuppert,
Can you please provide a build to customer with below patches 
https://review.openstack.org/#/c/272566
https://review.openstack.org/#/c/163178/ 
and let us know if it solves this issue?
Comment 47 anil venkata 2016-02-05 00:24:56 EST
Thanks Martin Schuppert 
No objections. If the patch is solving this issue, we will take the patch into OSP6. Can you please share your build with customer and ask them to test?

Thanks
Anil
Comment 51 anil venkata 2016-02-29 09:58:24 EST
Anand and  Martin Schuppert

Any update on this bug?

Thanks
Anil
Comment 71 Alexander Stafeyev 2016-05-15 01:54:54 EDT
Verified that the test is in. 

rpm -qa | grep neutr
openstack-neutron-openvswitch-2014.2.3-37.el7ost.noarch
python-neutronclient-2.3.9-2.el7ost.noarch
openstack-neutron-common-2014.2.3-37.el7ost.noarch
openstack-neutron-2014.2.3-37.el7ost.noarch
openstack-neutron-ml2-2014.2.3-37.el7ost.noarch
python-neutron-2014.2.3-37.el7ost.noarch
Comment 72 Alexander Stafeyev 2016-05-15 01:55:42 EDT
(In reply to Alexander Stafeyev from comment #71)
> Verified that the test is in. 
> 
> rpm -qa | grep neutr
> openstack-neutron-openvswitch-2014.2.3-37.el7ost.noarch
> python-neutronclient-2.3.9-2.el7ost.noarch
> openstack-neutron-common-2014.2.3-37.el7ost.noarch
> openstack-neutron-2014.2.3-37.el7ost.noarch
> openstack-neutron-ml2-2014.2.3-37.el7ost.noarch
> python-neutron-2014.2.3-37.el7ost.noarch

That the CODE***** is in . 

Sorry :)
Comment 74 errata-xmlrpc 2016-05-24 10:53:59 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1104.html

Note You need to log in before you can comment on or make changes to this bug.