Bug 1311864 - Neutron L3 Agent shows duplicate ports
Summary: Neutron L3 Agent shows duplicate ports
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-neutron
Version: 7.0 (Kilo)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: async
: 7.0 (Kilo)
Assignee: John Schwarz
QA Contact: Alexander Stafeyev
URL:
Whiteboard:
Depends On:
Blocks: 1273812
TreeView+ depends on / blocked
 
Reported: 2016-02-25 08:57 UTC by Pablo Iranzo Gómez
Modified: 2019-11-14 07:29 UTC (History)
10 users (show)

Fixed In Version: openstack-neutron-2015.1.4-1.el7ost
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-07-20 23:53:55 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 143297 0 None None None 2016-03-07 23:50:07 UTC
OpenStack gerrit 238122 0 None None None 2016-03-02 02:25:50 UTC
OpenStack gerrit 296339 0 None None None 2016-03-23 09:40:07 UTC
Red Hat Product Errata RHSA-2016:1474 0 normal SHIPPED_LIVE Low: openstack-neutron security, bug fix, and enhancement update 2016-07-21 03:53:34 UTC

Description Pablo Iranzo Gómez 2016-02-25 08:57:29 UTC
Description of problem:

When executing `neutron l3-agent-list-hosting-router`:


+--------------------------------------+-------------+----------------+-------+----------+
| id                                   | host        | admin_state_up | alive | ha_state |
+--------------------------------------+-------------+----------------+-------+----------+
| 04867f8c-5632-412a-8ce7-79bfccc2f620 | neutron-n-2 | True           | :-)   | active   |  ***
| 8d24b6e8-fd63-4f84-93cd-9361ee5b9e4a | neutron-n-1 | True           | :-)   | standby  |
| 04867f8c-5632-412a-8ce7-79bfccc2f620 | neutron-n-2 | True           | :-)   | active   |  ***
| fb833e11-ce79-4287-bd6b-6ef5aeed5814 | neutron-n-0 | True           | :-)   | standby  |
+--------------------------------------+-------------+----------------+-------+----------+

We see, for a 3 controller deployment, 3 different hosts, but 4 ports, in this case, the active port with same ID and same Host is listed as alive, active.


Version-Release number of selected component (if applicable):
openstack-neutron-2015.1.2-9.el7ost.noarch
openstack-neutron-common-2015.1.2-9.el7ost.noarch
openstack-neutron-lbaas-2015.1.2-1.el7ost.noarch
openstack-neutron-ml2-2015.1.2-9.el7ost.noarch
openstack-neutron-openvswitch-2015.1.2-9.el7ost.noarch
openstack-neutron-vpnaas-2015.1.2-1.el7ost.noarch
python-neutron-2015.1.2-9.el7ost.noarch
python-neutron-lbaas-2015.1.2-1.el7ost.noarch
python-neutron-vpnaas-2015.1.2-1.el7ost.noarch
python-neutronclient-2.4.0-2.el7ost.noarch



This was reported to also be happening on OSP6 version before it was upgraded to OSP7

Comment 3 Miguel Angel Ajo 2016-02-25 09:10:35 UTC
This could be related to the on-flight patches @assaf and @jschwarz are working on U/S to fix a few race conditions existing in the l3_ha part.

Comment 4 John Schwarz 2016-02-25 09:51:05 UTC
This is a different issue than what me and @assaf are working on U/S - we're dealing with not enough l3 ha ports, not too many.

I've looked at the attached logs but did not find anything about the port's UUID in question (04867f8c-5632-412a-8ce7-79bfccc2f620), so not a lot to go on. Pablo, can you perhaps try giving a rough outline of what scenario was running on the servers, so that we might be able to reproduce this?

Comment 5 Pablo Iranzo Gómez 2016-02-25 09:57:57 UTC
Hi John,
I'm asking my customer on this, the background at the moment is that they had this in OSP6 and after the upgrade it's still there.

Not sure from the comments if this was cleaned up before upgrade or if it's an issue appearing in OSP6 and carried over to OSP7 setup.

Initial request from them was on how to properly clean this up and the availability implications of the cleaning procedure.

Thanks,
Pablo

Comment 6 Miguel Angel Ajo 2016-02-25 12:20:13 UTC
One possible option could be to delete the specific agents via neutron client, and wait for heart beat to come back, so they are re-registered. But probably, that would also disassociate the routers from the agent.

@pablo, could we check that procedure/workaround in an OSP7:

1) Create a few routers, in HA
2) List l3-agent-list-hosting-router for one of the routers
3) Delete the agent holding the ACTIVE instance of the router
4) Wait for hearbeat to come back so agent appears in neutron agent-list again
5) List l3-agent-list-hosting-router for the same router as in (2)
6) If the agent is not there, we could do: 
      neutron l3-agent-router-add $agent-id $router
7) repeat 5 and verify the list is ok (2 agents backup, one active)

I believe such procedure would be harmless even if (6) happened, one of the backup routers would take the traffic until we do (7).

But it's better if we could verify this first.

Comment 7 John Schwarz 2016-03-01 11:30:50 UTC
I've added a patch that should be backported from upstream to the tracker. Once the upstream patch has been merged we can continue work on this patch.

Comment 16 Assaf Muller 2016-06-04 20:09:19 UTC
Please remember to flip this bug to MODIFIED with the appropriate 'Fixed in version' when you rebase OSP 7. Thank you.

Comment 17 Nir Magnezi 2016-06-30 13:04:45 UTC
The fix is incorporated in the rebase.
See bug 1350400

Comment 19 Alexander Stafeyev 2016-07-11 08:49:24 UTC
At this point there is no version to verify on. 

Tnx

Comment 21 Alexander Stafeyev 2016-07-11 12:44:49 UTC
openstack-neutron-2015.1.4-2.el7ost.noarch

neutron l3-agent-list-hosting-router Router_eNet
+--------------------------------------+------------------------------------+----------------+-------+----------+
| id                                   | host                               | admin_state_up | alive | ha_state |
+--------------------------------------+------------------------------------+----------------+-------+----------+
| e0ad8091-ef57-4950-9e7d-7549cc529b1d | overcloud-controller-1.localdomain | True           | :-)   | standby  |
| aa48e625-19a4-4a38-96d2-34e85fe7cf6c | overcloud-controller-2.localdomain | True           | :-)   | active   |
| caef1d9c-d65b-4ea3-bd05-6814efc5c934 | overcloud-controller-0.localdomain | True           | :-)   | standby  |
+--------------------------------------+------------------------------------+----------------+-------+----------+


[root@overcloud-controller-2 ~]# neutron l3-agent-router-remove caef1d9c-d65b-4ea3-bd05-6814efc5c934 Router_eNet
Removed router Router_eNet from L3 agent
[root@overcloud-controller-2 ~]# neutron l3-agent-list-hosting-router Router_eNet
+--------------------------------------+------------------------------------+----------------+-------+----------+
| id                                   | host                               | admin_state_up | alive | ha_state |
+--------------------------------------+------------------------------------+----------------+-------+----------+
| e0ad8091-ef57-4950-9e7d-7549cc529b1d | overcloud-controller-1.localdomain | True           | :-)   | standby  |
| aa48e625-19a4-4a38-96d2-34e85fe7cf6c | overcloud-controller-2.localdomain | True           | :-)   | active   |
+--------------------------------------+------------------------------------+----------------+-------+----------+

Comment 23 errata-xmlrpc 2016-07-20 23:53:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2016:1474


Note You need to log in before you can comment on or make changes to this bug.