Description of problem: The failing test is "test_add_list_remove_router_on_l3_agent" in neutron/tests/tempest/api/admin/test_l3_agent_scheduler.py The test suite's setUp creates a router and if it's a dvr router (which it is in this case) it also uses 'router-gateway-set' to ensure it's scheduled. Then, the test itself tries to schedule the router to an agent. This will fail if the agent that was scheduled to during setUp is different than the one that's attempted to be scheduled to during the test itself. In other words, the test is faulty in that it should know that the router is already scheduled. The failure happens in Mitaka and Neuton version when running on DVR setup Stacktrace Traceback (most recent call last): testtools.testresult.real._StringException: Empty attachments: stderr stdout pythonlogging:'': {{{ 2017-03-24 21:34:26,554 7724 INFO [tempest.lib.common.rest_client] Request (L3AgentSchedulerTestJSON:test_add_list_remove_router_on_l3_agent): 409 POST http://10.0.0.107:9696/v2.0/agents/397fef47-1243-485e-9189-38842e0b2d77/l3-routers 0.353s 2017-03-24 21:34:26,555 7724 DEBUG [tempest.lib.common.rest_client] Request - Headers: {'Content-Type': 'application/json', 'Accept': 'application/json', 'X-Auth-Token': '<omitted>'} Body: {"router_id": "3c6003e9-d81e-4287-8a77-603f22b3b27b"} Response - Headers: {'status': '409', u'content-length': '205', 'content-location': 'http://10.0.0.107:9696/v2.0/agents/397fef47-1243-485e-9189-38842e0b2d77/l3-routers', u'date': 'Sat, 25 Mar 2017 01:34:27 GMT', u'content-type': 'application/json', u'connection': 'close', u'x-openstack-request-id': 'req-0f7b896c-7a3a-4c9e-9155-c2faa610d5ff'} Body: {"NeutronError": {"message": "The router 3c6003e9-d81e-4287-8a77-603f22b3b27b has been already hosted by the L3 Agent 9e974922-85c2-457c-a791-8fa4c97cf25e.", "type": "RouterHostedByL3Agent", "detail": ""}} }}} Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/neutron/tests/tempest/api/admin/test_l3_agent_scheduler.py", line 91, in test_add_list_remove_router_on_l3_agent self.router['id']) File "/usr/lib/python2.7/site-packages/neutron/tests/tempest/services/network/json/network_client.py", line 471, in add_router_to_l3_agent resp, body = self.post(uri, body) File "/usr/lib/python2.7/site-packages/tempest/lib/common/rest_client.py", line 276, in post return self.request('POST', url, extra_headers, headers, body, chunked) File "/usr/lib/python2.7/site-packages/tempest/lib/common/rest_client.py", line 665, in request self._error_checker(resp, resp_body) File "/usr/lib/python2.7/site-packages/tempest/lib/common/rest_client.py", line 777, in _error_checker raise exceptions.Conflict(resp_body, resp=resp) tempest.lib.exceptions.Conflict: An object with that identifier already exists Details: {u'message': u'The router 3c6003e9-d81e-4287-8a77-603f22b3b27b has been already hosted by the L3 Agent 9e974922-85c2-457c-a791-8fa4c97cf25e.', u'type': u'RouterHostedByL3Agent', u'detail': u''} Version-Release number of selected component (if applicable): python-neutron-10.0.0-8.el7ost.noarch openstack-neutron-10.0.0-8.el7ost.noarch python-neutronclient-6.1.0-1.el7ost.noarch puppet-neutron-10.3.0-1.el7ost.noarch python-neutron-lib-1.1.0-1.el7ost.noarch How reproducible: Failed 4 times in the last 7 runs. Stability: 42 % Steps to Reproduce: 1.run ci gate: https://rhos-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/RHOS/view/RHOS11/job/qe-DFG-neutron-11_director-rhel-7.3-virthost-3cont_2comp-ipv4-vxlan-lvm-lbaas-dvr 2. 3. Actual results: Expected results: Additional info:
I deployed a 2-node devstack environment with DVR on stable/ocata and I'm unable to reproduce the failure. I've run a script (see code below) which executes those tests 20 times in a row with a 100% of success ratio. Also, logstash shows only 1 hit for the past 6 months of this error message failing upstream as per [1] with same error [2] but it's a non-voting job (gate-tempest-dsvm-neutron-dvr-ha-multinode-full-ubuntu-xenial-nv). It's weird that in the past 6 months only happened once and it was a few days ago (28th March 2017). Plus, I've seen that last Build ran successfully [0] and I just triggered a new build (#14) which is currently executing. I'll try to re-run the build a couple times if it still succeeds in order to figure out if this has really been fixed and it might have been due to any recent changes in our CI... Will update this as I have more findings. [0] https://rhos-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/RHOS/view/RHOS11/job/qe-DFG-neutron-11_director-rhel-7.3-virthost-3cont_2comp-ipv4-vxlan-lvm-lbaas-dvr/13/testReport/junit/neutron.tests.tempest.api.admin.test_l3_agent_scheduler/L3AgentSchedulerTestJSON/test_add_list_remove_router_on_l3_agent_id_9464e5e7_8625_49c3_8fd1_89c52be59d66_/ [1] http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22has%20been%20already%20hosted%5C%22 [2] http://logs.openstack.org/07/410107/10/experimental/gate-tempest-dsvm-neutron-dvr-ha-multinode-full-ubuntu-xenial-nv/ecebbe4/console.html#_2017-03-28_11_36_18_608386 --------------------------------------------------------------------------- cd /opt/stack/tempest git checkout 14.0.0 tox -reall-plugin --notest source .tox/all-plugin/bin/activate total=20 failed=0 for i in $(seq 1 $total) do success=`ostestr neutron.tests.tempest.api.admin.test_l3_agent_scheduler | grep "... ok" | wc -l` if [ "$success" -ne "2" ]; then ((failed+=1)) echo Test $i FAILED else echo Test $i ok fi done echo Failed $failed tests out of $total deactivate
Genadi, Daniel Mellado and Daniel Alvarez will try to get the Tempest patch merged: https://review.openstack.org/#/c/322118/
artifacts : https://rhos-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/RHOS/view/RHOS11/job/qe-DFG-neutron-11_director-rhel-7.3-virthost-3cont_2comp-ipv4-vxlan-lvm-lbaas-dvr/lastCompletedBuild/artifact/
I don't mind pushing that patch but I'd like to understand why we need it if it's not failing upstream (as per my logstash search) and I can't repro it on stable/ocata (2 node DVR setup with and without l3_ha). What does our CI job have different from upsteam that makes it fail for us?
Ok, I've been able to reproduce it in my setup. The l3 agent on the subnode wasn't registered on the primary neutron server and it was like if i only had one l3 agent running so it didn't fail. Now it does: [vagrant@primary tempest]$ ostestr --pdb tempest.api.network.admin.test_l3_agent_scheduler {0} tempest.api.network.admin.test_l3_agent_scheduler.L3AgentSchedulerTestJSON.test_add_list_remove_router_on_l3_agent [0.265699s] ... FAILED Captured pythonlogging: ~~~~~~~~~~~~~~~~~~~~~~~ 2017-04-01 06:27:11,712 1457 INFO [tempest.lib.common.rest_client] Request (L3AgentSchedulerTestJSON:test_add_list_remove_router_on_l3_agent): 409 POST http://192.168.121.208:9696/v2.0/agents/3aec0b44-1f0d-4042-be6f-ea6f32ca4a20/l3-routers 0.262s 2017-04-01 06:27:11,713 1457 DEBUG [tempest.lib.common.rest_client] Request - Headers: {'Content-Type': 'application/json', 'Accept': 'application/json', 'X-Auth-Token': '<omitted>'} Body: {"router_id": "20ca7262-0353-4eea-a47b-549f689c4bbb"} Response - Headers: {'status': '409', 'content-length': '205', 'content-location': 'http://192.168.121.208:9696/v2.0/agents/3aec0b44-1f0d-4042-be6f-ea6f32ca4a20/l3-routers', 'date': 'Sat, 01 Apr 2017 06:27:11 GMT', 'content-type': 'application/json', 'connection': 'close', 'x-openstack-request-id': 'req-31626aa3-6358-47fd-b4ef-b9995488b4f4'} Body: {"NeutronError": {"message": "The router 20ca7262-0353-4eea-a47b-549f689c4bbb has been already hosted by the L3 Agent e8375a3d-023d-40b7-8d88-39fdec3448fb.", "type": "RouterHostedByL3Agent", "detail": ""}} [vagrant@primary tempest]$ neutron agent-list neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead. +--------------------------------------+--------------------+----------+-------------------+-------+----------------+---------------------------+ | id | agent_type | host | availability_zone | alive | admin_state_up | binary | +--------------------------------------+--------------------+----------+-------------------+-------+----------------+---------------------------+ | 3aec0b44-1f0d-4042-be6f-ea6f32ca4a20 | L3 agent | primary | nova | :-) | True | neutron-l3-agent | | 78944bb4-dddf-41e9-8833-0ca326084d50 | DHCP agent | primary | nova | :-) | True | neutron-dhcp-agent | | 847495b1-2a77-4477-8331-9c7fbe97315b | Open vSwitch agent | primary | | :-) | True | neutron-openvswitch-agent | | a618e5bb-1c32-44b8-af41-6e2ac35b7e75 | Metadata agent | primary | | :-) | True | neutron-metadata-agent | | e8375a3d-023d-40b7-8d88-39fdec3448fb | L3 agent | subnode1 | nova | :-) | True | neutron-l3-agent | +--------------------------------------+--------------------+----------+-------------------+-------+----------------+---------------------------+ I have tried the tempest patch and now it works since we're not adding an interface to a router on resource_setup() so it's not hosted by any L3 agent when the test tries to assign it to one. Right now, the author says that it can't be merged because tempest still supports Liberty and that code would break it (Liberty doesn't support to assign a router to an agent if it doesn't have at least one interface).
(In reply to Assaf Muller from comment #3) > Genadi, Daniel Mellado and Daniel Alvarez will try to get the Tempest patch > merged: > > https://review.openstack.org/#/c/322118/ This patch's been already merged. The author changed the original implementation since there was already a fix for it and it consists of setting "dvr_extra_resources" option to False by default in tempest.config. This will avoid to add an interface when creating a distributed router which would automatically assign an l3 agent to it. If tests are run against OSP8 (Liberty) or older, the interface has still to be added to the distributed router in order for the test to succeed.
Downstream patch got merged. I built the package already in brew and should be fixed now. The tempest tree patch got merged too and IIUC no extra work is needed since tempest downstream RPM package is simply packaging of upstream code.
fix verified : openstack-neutron-10.0.0-18.el7ost.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1245