Bug 1311864
Summary: | Neutron L3 Agent shows duplicate ports | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Pablo Iranzo Gómez <pablo.iranzo> |
Component: | openstack-neutron | Assignee: | John Schwarz <jschwarz> |
Status: | CLOSED ERRATA | QA Contact: | Alexander Stafeyev <astafeye> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 7.0 (Kilo) | CC: | amuller, astafeye, chrisw, jschluet, jschwarz, majopela, nyechiel, oblaut, pablo.iranzo, srevivo |
Target Milestone: | async | Keywords: | ZStream |
Target Release: | 7.0 (Kilo) | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | openstack-neutron-2015.1.4-1.el7ost | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-07-20 23:53:55 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1273812 |
Description
Pablo Iranzo Gómez
2016-02-25 08:57:29 UTC
This could be related to the on-flight patches @assaf and @jschwarz are working on U/S to fix a few race conditions existing in the l3_ha part. This is a different issue than what me and @assaf are working on U/S - we're dealing with not enough l3 ha ports, not too many. I've looked at the attached logs but did not find anything about the port's UUID in question (04867f8c-5632-412a-8ce7-79bfccc2f620), so not a lot to go on. Pablo, can you perhaps try giving a rough outline of what scenario was running on the servers, so that we might be able to reproduce this? Hi John, I'm asking my customer on this, the background at the moment is that they had this in OSP6 and after the upgrade it's still there. Not sure from the comments if this was cleaned up before upgrade or if it's an issue appearing in OSP6 and carried over to OSP7 setup. Initial request from them was on how to properly clean this up and the availability implications of the cleaning procedure. Thanks, Pablo One possible option could be to delete the specific agents via neutron client, and wait for heart beat to come back, so they are re-registered. But probably, that would also disassociate the routers from the agent. @pablo, could we check that procedure/workaround in an OSP7: 1) Create a few routers, in HA 2) List l3-agent-list-hosting-router for one of the routers 3) Delete the agent holding the ACTIVE instance of the router 4) Wait for hearbeat to come back so agent appears in neutron agent-list again 5) List l3-agent-list-hosting-router for the same router as in (2) 6) If the agent is not there, we could do: neutron l3-agent-router-add $agent-id $router 7) repeat 5 and verify the list is ok (2 agents backup, one active) I believe such procedure would be harmless even if (6) happened, one of the backup routers would take the traffic until we do (7). But it's better if we could verify this first. I've added a patch that should be backported from upstream to the tracker. Once the upstream patch has been merged we can continue work on this patch. Please remember to flip this bug to MODIFIED with the appropriate 'Fixed in version' when you rebase OSP 7. Thank you. The fix is incorporated in the rebase. See bug 1350400 At this point there is no version to verify on. Tnx openstack-neutron-2015.1.4-2.el7ost.noarch neutron l3-agent-list-hosting-router Router_eNet +--------------------------------------+------------------------------------+----------------+-------+----------+ | id | host | admin_state_up | alive | ha_state | +--------------------------------------+------------------------------------+----------------+-------+----------+ | e0ad8091-ef57-4950-9e7d-7549cc529b1d | overcloud-controller-1.localdomain | True | :-) | standby | | aa48e625-19a4-4a38-96d2-34e85fe7cf6c | overcloud-controller-2.localdomain | True | :-) | active | | caef1d9c-d65b-4ea3-bd05-6814efc5c934 | overcloud-controller-0.localdomain | True | :-) | standby | +--------------------------------------+------------------------------------+----------------+-------+----------+ [root@overcloud-controller-2 ~]# neutron l3-agent-router-remove caef1d9c-d65b-4ea3-bd05-6814efc5c934 Router_eNet Removed router Router_eNet from L3 agent [root@overcloud-controller-2 ~]# neutron l3-agent-list-hosting-router Router_eNet +--------------------------------------+------------------------------------+----------------+-------+----------+ | id | host | admin_state_up | alive | ha_state | +--------------------------------------+------------------------------------+----------------+-------+----------+ | e0ad8091-ef57-4950-9e7d-7549cc529b1d | overcloud-controller-1.localdomain | True | :-) | standby | | aa48e625-19a4-4a38-96d2-34e85fe7cf6c | overcloud-controller-2.localdomain | True | :-) | active | +--------------------------------------+------------------------------------+----------------+-------+----------+ Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2016:1474 |