Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1311864 - Neutron L3 Agent shows duplicate ports
Neutron L3 Agent shows duplicate ports
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-neutron (Show other bugs)
7.0 (Kilo)
Unspecified Unspecified
medium Severity medium
: async
: 7.0 (Kilo)
Assigned To: John Schwarz
Alexander Stafeyev
: ZStream
Depends On:
Blocks: 1273812
  Show dependency treegraph
 
Reported: 2016-02-25 03:57 EST by Pablo Iranzo Gómez
Modified: 2016-09-19 06:48 EDT (History)
11 users (show)

See Also:
Fixed In Version: openstack-neutron-2015.1.4-1.el7ost
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-07-20 19:53:55 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
OpenStack gerrit 143297 None None None 2016-03-07 18:50 EST
OpenStack gerrit 238122 None None None 2016-03-01 21:25 EST
OpenStack gerrit 296339 None None None 2016-03-23 05:40 EDT
Red Hat Product Errata RHSA-2016:1474 normal SHIPPED_LIVE Low: openstack-neutron security, bug fix, and enhancement update 2016-07-20 23:53:34 EDT

  None (edit)
Description Pablo Iranzo Gómez 2016-02-25 03:57:29 EST
Description of problem:

When executing `neutron l3-agent-list-hosting-router`:


+--------------------------------------+-------------+----------------+-------+----------+
| id                                   | host        | admin_state_up | alive | ha_state |
+--------------------------------------+-------------+----------------+-------+----------+
| 04867f8c-5632-412a-8ce7-79bfccc2f620 | neutron-n-2 | True           | :-)   | active   |  ***
| 8d24b6e8-fd63-4f84-93cd-9361ee5b9e4a | neutron-n-1 | True           | :-)   | standby  |
| 04867f8c-5632-412a-8ce7-79bfccc2f620 | neutron-n-2 | True           | :-)   | active   |  ***
| fb833e11-ce79-4287-bd6b-6ef5aeed5814 | neutron-n-0 | True           | :-)   | standby  |
+--------------------------------------+-------------+----------------+-------+----------+

We see, for a 3 controller deployment, 3 different hosts, but 4 ports, in this case, the active port with same ID and same Host is listed as alive, active.


Version-Release number of selected component (if applicable):
openstack-neutron-2015.1.2-9.el7ost.noarch
openstack-neutron-common-2015.1.2-9.el7ost.noarch
openstack-neutron-lbaas-2015.1.2-1.el7ost.noarch
openstack-neutron-ml2-2015.1.2-9.el7ost.noarch
openstack-neutron-openvswitch-2015.1.2-9.el7ost.noarch
openstack-neutron-vpnaas-2015.1.2-1.el7ost.noarch
python-neutron-2015.1.2-9.el7ost.noarch
python-neutron-lbaas-2015.1.2-1.el7ost.noarch
python-neutron-vpnaas-2015.1.2-1.el7ost.noarch
python-neutronclient-2.4.0-2.el7ost.noarch



This was reported to also be happening on OSP6 version before it was upgraded to OSP7
Comment 3 Miguel Angel Ajo 2016-02-25 04:10:35 EST
This could be related to the on-flight patches @assaf and @jschwarz are working on U/S to fix a few race conditions existing in the l3_ha part.
Comment 4 John Schwarz 2016-02-25 04:51:05 EST
This is a different issue than what me and @assaf are working on U/S - we're dealing with not enough l3 ha ports, not too many.

I've looked at the attached logs but did not find anything about the port's UUID in question (04867f8c-5632-412a-8ce7-79bfccc2f620), so not a lot to go on. Pablo, can you perhaps try giving a rough outline of what scenario was running on the servers, so that we might be able to reproduce this?
Comment 5 Pablo Iranzo Gómez 2016-02-25 04:57:57 EST
Hi John,
I'm asking my customer on this, the background at the moment is that they had this in OSP6 and after the upgrade it's still there.

Not sure from the comments if this was cleaned up before upgrade or if it's an issue appearing in OSP6 and carried over to OSP7 setup.

Initial request from them was on how to properly clean this up and the availability implications of the cleaning procedure.

Thanks,
Pablo
Comment 6 Miguel Angel Ajo 2016-02-25 07:20:13 EST
One possible option could be to delete the specific agents via neutron client, and wait for heart beat to come back, so they are re-registered. But probably, that would also disassociate the routers from the agent.

@pablo, could we check that procedure/workaround in an OSP7:

1) Create a few routers, in HA
2) List l3-agent-list-hosting-router for one of the routers
3) Delete the agent holding the ACTIVE instance of the router
4) Wait for hearbeat to come back so agent appears in neutron agent-list again
5) List l3-agent-list-hosting-router for the same router as in (2)
6) If the agent is not there, we could do: 
      neutron l3-agent-router-add $agent-id $router
7) repeat 5 and verify the list is ok (2 agents backup, one active)

I believe such procedure would be harmless even if (6) happened, one of the backup routers would take the traffic until we do (7).

But it's better if we could verify this first.
Comment 7 John Schwarz 2016-03-01 06:30:50 EST
I've added a patch that should be backported from upstream to the tracker. Once the upstream patch has been merged we can continue work on this patch.
Comment 16 Assaf Muller 2016-06-04 16:09:19 EDT
Please remember to flip this bug to MODIFIED with the appropriate 'Fixed in version' when you rebase OSP 7. Thank you.
Comment 17 Nir Magnezi 2016-06-30 09:04:45 EDT
The fix is incorporated in the rebase.
See bug 1350400
Comment 19 Alexander Stafeyev 2016-07-11 04:49:24 EDT
At this point there is no version to verify on. 

Tnx
Comment 21 Alexander Stafeyev 2016-07-11 08:44:49 EDT
openstack-neutron-2015.1.4-2.el7ost.noarch

neutron l3-agent-list-hosting-router Router_eNet
+--------------------------------------+------------------------------------+----------------+-------+----------+
| id                                   | host                               | admin_state_up | alive | ha_state |
+--------------------------------------+------------------------------------+----------------+-------+----------+
| e0ad8091-ef57-4950-9e7d-7549cc529b1d | overcloud-controller-1.localdomain | True           | :-)   | standby  |
| aa48e625-19a4-4a38-96d2-34e85fe7cf6c | overcloud-controller-2.localdomain | True           | :-)   | active   |
| caef1d9c-d65b-4ea3-bd05-6814efc5c934 | overcloud-controller-0.localdomain | True           | :-)   | standby  |
+--------------------------------------+------------------------------------+----------------+-------+----------+


[root@overcloud-controller-2 ~]# neutron l3-agent-router-remove caef1d9c-d65b-4ea3-bd05-6814efc5c934 Router_eNet
Removed router Router_eNet from L3 agent
[root@overcloud-controller-2 ~]# neutron l3-agent-list-hosting-router Router_eNet
+--------------------------------------+------------------------------------+----------------+-------+----------+
| id                                   | host                               | admin_state_up | alive | ha_state |
+--------------------------------------+------------------------------------+----------------+-------+----------+
| e0ad8091-ef57-4950-9e7d-7549cc529b1d | overcloud-controller-1.localdomain | True           | :-)   | standby  |
| aa48e625-19a4-4a38-96d2-34e85fe7cf6c | overcloud-controller-2.localdomain | True           | :-)   | active   |
+--------------------------------------+------------------------------------+----------------+-------+----------+
Comment 23 errata-xmlrpc 2016-07-20 19:53:55 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2016:1474

Note You need to log in before you can comment on or make changes to this bug.