Description of problem: The following test fails in OSP 10 DVR environment: neutron.tests.tempest.api.admin.test_l3_agent_scheduler.L3AgentSchedulerTestJSON.test_add_list_remove_router_on_l3_agent Version-Release number of selected component (if applicable): Latest OSP 10 How reproducible: 100% Steps to Reproduce: 1. Deploy OSP 10 with DVR 2. Run neutron test: neutron.tests.tempest.api.admin.test_l3_agent_scheduler.L3AgentSchedulerTestJSON.test_add_list_remove_router_on_l3_agent Actual results: Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/neutron/tests/tempest/api/admin/test_l3_agent_scheduler.py", line 91, in test_add_list_remove_router_on_l3_agent self.router['id']) File "/usr/lib/python2.7/site-packages/neutron/tests/tempest/services/network/json/network_client.py", line 470, in add_router_to_l3_agent resp, body = self.post(uri, body) File "/home/stack/tempest-dir/tempest/lib/common/rest_client.py", line 276, in post return self.request('POST', url, extra_headers, headers, body, chunked) File "/home/stack/tempest-dir/tempest/lib/common/rest_client.py", line 664, in request self._error_checker(resp, resp_body) File "/home/stack/tempest-dir/tempest/lib/common/rest_client.py", line 776, in _error_checker raise exceptions.Conflict(resp_body, resp=resp) tempest.lib.exceptions.Conflict: An object with that identifier already exists Details: {u'message': u'The router 2a61126e-5548-4984-a71b-5a63d61d4709 has been already hosted by the L3 Agent 4b036b53-3039-472d-8e05-f24dc22a9f23.', u'type': u'RouterHostedByL3Agent', u'detail': u''} Expected results: Test passed successfully
Jenkins is unstable, need to wait for further triaging.
Downstream jenkins was unstable from what I remember right before the shutdown, with failures unrelated to neutron. I will look at this again now.
It's still there in OSP 10, here's a newer link: https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/network/view/neutron/job/DFG-network-neutron-10_director-rhel-virthost-3cont_2comp-ipv4-vxlan-dvr/lastCompletedBuild/testReport/tempest.api.network.admin.test_l3_agent_scheduler/L3AgentSchedulerTestJSON/test_add_list_remove_router_on_l3_agent_id_9464e5e7_8625_49c3_8fd1_89c52be59d66_/ Probably just a missing backport.
So there is bug in tempest test in https://github.com/openstack/tempest/blob/master/tempest/api/network/admin/test_l3_agent_scheduler.py#L68 In case when router is not HA router it should be scheduled only to one L3 agent at a time. Sometimes in multimode environment it may happen that router is scheduled automatically to eg. agent 1 but test tries to add it to agent 2. That will fail with conflict, which is raised in neutron: https://github.com/openstack/neutron/blob/f6c6be78eeab6a4f621d3ecf95875c539cf4f0b2/neutron/db/l3_agentschedulers_db.py#L134 In case when router is HA, then there is no this problem as router can be then scheduled to more than one L3 agent. It don't happens in our OSP-13 and OSP-14 CI because we have there configured l3_ha=True in neutron config. So every router is HA by default. In OSP-10 we have this option configured to false so that's why it is failing sometimes. I think that this should be fixed on tempest side and we should always create ha router there - that will avoid similar issues regardless of neutron config.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0922