Bug 1436576
Summary: | [Ocata] neutron.tests.tempest.api.admin.test_l3_agent_scheduler.L3AgentSchedulerTestJSON.test_add_list_remove_router_on_l3_agent | |||
---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Eran Kuris <ekuris> | |
Component: | openstack-neutron | Assignee: | Daniel Alvarez Sanchez <dalvarez> | |
Status: | CLOSED ERRATA | QA Contact: | Eran Kuris <ekuris> | |
Severity: | medium | Docs Contact: | ||
Priority: | medium | |||
Version: | 11.0 (Ocata) | CC: | amuller, chrisw, dalvarez, gcheresh, jschluet, lruzicka, nyechiel, oblaut, srevivo | |
Target Milestone: | rc | |||
Target Release: | 11.0 (Ocata) | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | openstack-neutron-10.0.0-15.el7ost | Doc Type: | Bug Fix | |
Doc Text: |
On DVR setups, the 'test_add_list_remove_router_on_l3_agent' from the 'test_l3_agent_scheduler.py' would not finish successfully. The testing procedure tried to bind a network interface to an L3 agent, although the interface had been bound to one previously, when a new router was created.
The problem has been fixed. Now the interface will not be added to the router and assigned to the L3 agent until the test does so. As a result, the test finishes successfully.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1436579 1436580 (view as bug list) | Environment: | ||
Last Closed: | 2017-05-17 20:14:57 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1436579, 1436580 |
Description
Eran Kuris
2017-03-28 08:45:03 UTC
I deployed a 2-node devstack environment with DVR on stable/ocata and I'm unable to reproduce the failure. I've run a script (see code below) which executes those tests 20 times in a row with a 100% of success ratio. Also, logstash shows only 1 hit for the past 6 months of this error message failing upstream as per [1] with same error [2] but it's a non-voting job (gate-tempest-dsvm-neutron-dvr-ha-multinode-full-ubuntu-xenial-nv). It's weird that in the past 6 months only happened once and it was a few days ago (28th March 2017). Plus, I've seen that last Build ran successfully [0] and I just triggered a new build (#14) which is currently executing. I'll try to re-run the build a couple times if it still succeeds in order to figure out if this has really been fixed and it might have been due to any recent changes in our CI... Will update this as I have more findings. [0] https://rhos-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/RHOS/view/RHOS11/job/qe-DFG-neutron-11_director-rhel-7.3-virthost-3cont_2comp-ipv4-vxlan-lvm-lbaas-dvr/13/testReport/junit/neutron.tests.tempest.api.admin.test_l3_agent_scheduler/L3AgentSchedulerTestJSON/test_add_list_remove_router_on_l3_agent_id_9464e5e7_8625_49c3_8fd1_89c52be59d66_/ [1] http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22has%20been%20already%20hosted%5C%22 [2] http://logs.openstack.org/07/410107/10/experimental/gate-tempest-dsvm-neutron-dvr-ha-multinode-full-ubuntu-xenial-nv/ecebbe4/console.html#_2017-03-28_11_36_18_608386 --------------------------------------------------------------------------- cd /opt/stack/tempest git checkout 14.0.0 tox -reall-plugin --notest source .tox/all-plugin/bin/activate total=20 failed=0 for i in $(seq 1 $total) do success=`ostestr neutron.tests.tempest.api.admin.test_l3_agent_scheduler | grep "... ok" | wc -l` if [ "$success" -ne "2" ]; then ((failed+=1)) echo Test $i FAILED else echo Test $i ok fi done echo Failed $failed tests out of $total deactivate Genadi, Daniel Mellado and Daniel Alvarez will try to get the Tempest patch merged: https://review.openstack.org/#/c/322118/ I don't mind pushing that patch but I'd like to understand why we need it if it's not failing upstream (as per my logstash search) and I can't repro it on stable/ocata (2 node DVR setup with and without l3_ha). What does our CI job have different from upsteam that makes it fail for us? Ok, I've been able to reproduce it in my setup. The l3 agent on the subnode wasn't registered on the primary neutron server and it was like if i only had one l3 agent running so it didn't fail. Now it does: [vagrant@primary tempest]$ ostestr --pdb tempest.api.network.admin.test_l3_agent_scheduler {0} tempest.api.network.admin.test_l3_agent_scheduler.L3AgentSchedulerTestJSON.test_add_list_remove_router_on_l3_agent [0.265699s] ... FAILED Captured pythonlogging: ~~~~~~~~~~~~~~~~~~~~~~~ 2017-04-01 06:27:11,712 1457 INFO [tempest.lib.common.rest_client] Request (L3AgentSchedulerTestJSON:test_add_list_remove_router_on_l3_agent): 409 POST http://192.168.121.208:9696/v2.0/agents/3aec0b44-1f0d-4042-be6f-ea6f32ca4a20/l3-routers 0.262s 2017-04-01 06:27:11,713 1457 DEBUG [tempest.lib.common.rest_client] Request - Headers: {'Content-Type': 'application/json', 'Accept': 'application/json', 'X-Auth-Token': '<omitted>'} Body: {"router_id": "20ca7262-0353-4eea-a47b-549f689c4bbb"} Response - Headers: {'status': '409', 'content-length': '205', 'content-location': 'http://192.168.121.208:9696/v2.0/agents/3aec0b44-1f0d-4042-be6f-ea6f32ca4a20/l3-routers', 'date': 'Sat, 01 Apr 2017 06:27:11 GMT', 'content-type': 'application/json', 'connection': 'close', 'x-openstack-request-id': 'req-31626aa3-6358-47fd-b4ef-b9995488b4f4'} Body: {"NeutronError": {"message": "The router 20ca7262-0353-4eea-a47b-549f689c4bbb has been already hosted by the L3 Agent e8375a3d-023d-40b7-8d88-39fdec3448fb.", "type": "RouterHostedByL3Agent", "detail": ""}} [vagrant@primary tempest]$ neutron agent-list neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead. +--------------------------------------+--------------------+----------+-------------------+-------+----------------+---------------------------+ | id | agent_type | host | availability_zone | alive | admin_state_up | binary | +--------------------------------------+--------------------+----------+-------------------+-------+----------------+---------------------------+ | 3aec0b44-1f0d-4042-be6f-ea6f32ca4a20 | L3 agent | primary | nova | :-) | True | neutron-l3-agent | | 78944bb4-dddf-41e9-8833-0ca326084d50 | DHCP agent | primary | nova | :-) | True | neutron-dhcp-agent | | 847495b1-2a77-4477-8331-9c7fbe97315b | Open vSwitch agent | primary | | :-) | True | neutron-openvswitch-agent | | a618e5bb-1c32-44b8-af41-6e2ac35b7e75 | Metadata agent | primary | | :-) | True | neutron-metadata-agent | | e8375a3d-023d-40b7-8d88-39fdec3448fb | L3 agent | subnode1 | nova | :-) | True | neutron-l3-agent | +--------------------------------------+--------------------+----------+-------------------+-------+----------------+---------------------------+ I have tried the tempest patch and now it works since we're not adding an interface to a router on resource_setup() so it's not hosted by any L3 agent when the test tries to assign it to one. Right now, the author says that it can't be merged because tempest still supports Liberty and that code would break it (Liberty doesn't support to assign a router to an agent if it doesn't have at least one interface). (In reply to Assaf Muller from comment #3) > Genadi, Daniel Mellado and Daniel Alvarez will try to get the Tempest patch > merged: > > https://review.openstack.org/#/c/322118/ This patch's been already merged. The author changed the original implementation since there was already a fix for it and it consists of setting "dvr_extra_resources" option to False by default in tempest.config. This will avoid to add an interface when creating a distributed router which would automatically assign an l3 agent to it. If tests are run against OSP8 (Liberty) or older, the interface has still to be added to the distributed router in order for the test to succeed. Downstream patch got merged. I built the package already in brew and should be fixed now. The tempest tree patch got merged too and IIUC no extra work is needed since tempest downstream RPM package is simply packaging of upstream code. fix verified : openstack-neutron-10.0.0-18.el7ost.noarch Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1245 |