Bug 1466081

Summary: Update binding: host_id for network:router_gateway interfaces
Product: Red Hat OpenStack Reporter: Ajay Kalambur <akalambu>
Component: openstack-neutronAssignee: Daniel Alvarez Sanchez <dalvarez>
Status: CLOSED ERRATA QA Contact: Alexander Stafeyev <astafeye>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 10.0 (Newton)CC: akalambu, akaris, amuller, chrisw, jjoyce, mmethot, mschuppe, nyechiel, oblaut, samccann, srevivo, tfreger, wlehman
Target Milestone: z4Keywords: Reopened, Triaged, ZStream
Target Release: 10.0 (Newton)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-neutron-9.3.1-7.el7ost Doc Type: Bug Fix
Doc Text:
The networking-vpp mechanism driver was not able to correctly set up the router interface when a neutron HA router fails over. This was because the host_id property of the ports owned by a router gateway were not updated to the new host. This fix updates the host_id property on a failover.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-09-06 17:17:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ajay Kalambur 2017-06-29 00:22:02 UTC
Description of problem:

When a neutron HA router fails over, the l3 agent where the router is now master informs neutron server of this transition. Thus the router port bindings (router port's binding:host_id) are  updated to point to the correct server.

networking-vpp mechanism driver (https://github.com/openstack/networking-vpp) depends on this to provide the L2 plumbing for the router interface on the new control server. 

There is a bug in the 
def _update_router_port_bindings(self, context, states, host) method in neutron/db/l3_hamode_db.py which prevents gateway ports from being identified and updated.

A bug fix has been applied to upstream neutron and is present in 
stable/newton@ https://github.com/openstack/neutron/commit
/f6b3d25c6ef335ff891030b8e34c1d27f45b896c
and master@https://github.com/openstack/neutron/commit/d8334b41d2c5bcd4916347d20008b1538d48b0ef

The current version of neutron in OSP10 is openstack-neutron.noarch 1:9.2.0-6.el7ost and it does not have it.

This bug request is to make the fix available.


Version-Release number of selected component (if applicable):
openstack-neutron.noarch 1:9.2.0-6.el7ost

Comment 1 Assaf Muller 2017-06-29 13:44:53 UTC
This will be automatically included in the next OSP 10 rebase and minor release.

Comment 2 Assaf Muller 2017-06-30 00:14:39 UTC
I see that you've attached a sev1 case. Can you help explain why is it a sev1? What is the impact of the issue? As far as I can tell the API will return an out of date host for the external router port binding, but there is no other effect. In any case, if it's indeed an urgent severity issue for the customer and they cannot wait for the next OSP 10 release (Estimated to be around 1 month) then you can always ask for a hotfix.

Comment 3 Ajay Kalambur 2017-06-30 15:35:01 UTC
Hi Assaf,

This is a sev1 bug because it breaks the HA implementation of neutron routers when using VPP as the mechanism driver (https://github.com/openstack/networking-vpp)

"I can tell the API will return an out of date host for the external router port binding, but there is no other effect"

It is not only an API update but an ML2 update_port_precommit() call is also made as part of this. Thus when a HA failover event happens on the neutron router, VPP depends on ML2 to make this call, to rebind all the router ports to the new correct host. 

With this bug, router gateway interfaces are not rebound correctly and the new (master) neutron router is not able to forward packets to the external world.

Hence we would need this fix to be backported as soon as possible.

Comment 20 errata-xmlrpc 2017-09-06 17:17:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2663