Bug 1436576 - [Ocata] neutron.tests.tempest.api.admin.test_l3_agent_scheduler.L3AgentSchedulerTestJSON.test_add_list_remove_router_on_l3_agent
Summary: [Ocata] neutron.tests.tempest.api.admin.test_l3_agent_scheduler.L3AgentSchedu...
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-neutron
Version: 11.0 (Ocata)
Hardware: Unspecified
OS: Unspecified
Target Milestone: rc
: 11.0 (Ocata)
Assignee: Daniel Alvarez Sanchez
QA Contact: Eran Kuris
Depends On:
Blocks: 1436579 1436580
TreeView+ depends on / blocked
Reported: 2017-03-28 08:45 UTC by Eran Kuris
Modified: 2017-05-17 20:14 UTC (History)
9 users (show)

Fixed In Version: openstack-neutron-10.0.0-15.el7ost
Doc Type: Bug Fix
Doc Text:
On DVR setups, the 'test_add_list_remove_router_on_l3_agent' from the 'test_l3_agent_scheduler.py' would not finish successfully. The testing procedure tried to bind a network interface to an L3 agent, although the interface had been bound to one previously, when a new router was created. The problem has been fixed. Now the interface will not be added to the router and assigned to the L3 agent until the test does so. As a result, the test finishes successfully.
Clone Of:
: 1436579 1436580 (view as bug list)
Last Closed: 2017-05-17 20:14:57 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Launchpad 1590049 0 None None None 2017-04-03 13:41:16 UTC
OpenStack gerrit 322118 0 None None None 2017-04-03 13:42:47 UTC
OpenStack gerrit 454594 0 None None None 2017-04-07 17:32:09 UTC
Red Hat Product Errata RHEA-2017:1245 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 11.0 Bug Fix and Enhancement Advisory 2017-05-17 23:01:50 UTC

Description Eran Kuris 2017-03-28 08:45:03 UTC
Description of problem:
The failing test is "test_add_list_remove_router_on_l3_agent" in neutron/tests/tempest/api/admin/test_l3_agent_scheduler.py

The test suite's setUp creates a router and if it's a dvr router (which it is in this case) it also uses 'router-gateway-set' to ensure it's scheduled. Then, the test itself tries to schedule the router to an agent. This will fail if the agent that was scheduled to during setUp is different than the one that's attempted to be scheduled to during the test itself.

In other words, the test is faulty in that it should know that the router is already scheduled.

The failure happens in Mitaka and Neuton version when running on DVR setup

Traceback (most recent call last):
testtools.testresult.real._StringException: Empty attachments:

pythonlogging:'': {{{
2017-03-24 21:34:26,554 7724 INFO     [tempest.lib.common.rest_client] Request (L3AgentSchedulerTestJSON:test_add_list_remove_router_on_l3_agent): 409 POST 0.353s
2017-03-24 21:34:26,555 7724 DEBUG    [tempest.lib.common.rest_client] Request - Headers: {'Content-Type': 'application/json', 'Accept': 'application/json', 'X-Auth-Token': '<omitted>'}
        Body: {"router_id": "3c6003e9-d81e-4287-8a77-603f22b3b27b"}
    Response - Headers: {'status': '409', u'content-length': '205', 'content-location': '', u'date': 'Sat, 25 Mar 2017 01:34:27 GMT', u'content-type': 'application/json', u'connection': 'close', u'x-openstack-request-id': 'req-0f7b896c-7a3a-4c9e-9155-c2faa610d5ff'}
        Body: {"NeutronError": {"message": "The router 3c6003e9-d81e-4287-8a77-603f22b3b27b has been already hosted by the L3 Agent 9e974922-85c2-457c-a791-8fa4c97cf25e.", "type": "RouterHostedByL3Agent", "detail": ""}}

Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/neutron/tests/tempest/api/admin/test_l3_agent_scheduler.py", line 91, in test_add_list_remove_router_on_l3_agent
  File "/usr/lib/python2.7/site-packages/neutron/tests/tempest/services/network/json/network_client.py", line 471, in add_router_to_l3_agent
    resp, body = self.post(uri, body)
  File "/usr/lib/python2.7/site-packages/tempest/lib/common/rest_client.py", line 276, in post
    return self.request('POST', url, extra_headers, headers, body, chunked)
  File "/usr/lib/python2.7/site-packages/tempest/lib/common/rest_client.py", line 665, in request
    self._error_checker(resp, resp_body)
  File "/usr/lib/python2.7/site-packages/tempest/lib/common/rest_client.py", line 777, in _error_checker
    raise exceptions.Conflict(resp_body, resp=resp)
tempest.lib.exceptions.Conflict: An object with that identifier already exists
Details: {u'message': u'The router 3c6003e9-d81e-4287-8a77-603f22b3b27b has been already hosted by the L3 Agent 9e974922-85c2-457c-a791-8fa4c97cf25e.', u'type': u'RouterHostedByL3Agent', u'detail': u''}

Version-Release number of selected component (if applicable):
How reproducible:
 Failed 4 times in the last 7 runs. Stability: 42 %

Steps to Reproduce:
1.run ci gate: https://rhos-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/RHOS/view/RHOS11/job/qe-DFG-neutron-11_director-rhel-7.3-virthost-3cont_2comp-ipv4-vxlan-lvm-lbaas-dvr

Actual results:

Expected results:

Additional info:

Comment 2 Daniel Alvarez Sanchez 2017-04-03 10:26:55 UTC
I deployed a 2-node devstack environment with DVR on stable/ocata and I'm unable to reproduce the failure. I've run a script (see code below) which executes those tests 20 times in a row with a 100% of success ratio.

Also, logstash shows only 1 hit for the past 6 months of this error message failing upstream as per [1] with same error [2] but it's a non-voting job (gate-tempest-dsvm-neutron-dvr-ha-multinode-full-ubuntu-xenial-nv). It's weird that in the past 6 months only happened once and it was a few days ago (28th March 2017).

Plus, I've seen that last Build ran successfully [0] and I just triggered a new build (#14) which is currently executing. I'll try to re-run the build a couple times if it still succeeds in order to figure out if this has really been fixed and it might have been due to any recent changes in our CI...

Will update this as I have more findings.

[0] https://rhos-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/RHOS/view/RHOS11/job/qe-DFG-neutron-11_director-rhel-7.3-virthost-3cont_2comp-ipv4-vxlan-lvm-lbaas-dvr/13/testReport/junit/neutron.tests.tempest.api.admin.test_l3_agent_scheduler/L3AgentSchedulerTestJSON/test_add_list_remove_router_on_l3_agent_id_9464e5e7_8625_49c3_8fd1_89c52be59d66_/

[1] http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22has%20been%20already%20hosted%5C%22
[2] http://logs.openstack.org/07/410107/10/experimental/gate-tempest-dsvm-neutron-dvr-ha-multinode-full-ubuntu-xenial-nv/ecebbe4/console.html#_2017-03-28_11_36_18_608386

cd /opt/stack/tempest
git checkout 14.0.0
tox -reall-plugin --notest
source .tox/all-plugin/bin/activate


for i in $(seq 1 $total)
    success=`ostestr neutron.tests.tempest.api.admin.test_l3_agent_scheduler | grep "... ok" | wc -l`
    if [ "$success" -ne "2" ]; then
        echo Test $i FAILED
        echo Test $i ok

echo Failed $failed tests out of $total

Comment 3 Assaf Muller 2017-04-03 14:06:33 UTC
Genadi, Daniel Mellado and Daniel Alvarez will try to get the Tempest patch merged:


Comment 5 Daniel Alvarez Sanchez 2017-04-04 12:40:23 UTC
I don't mind pushing that patch but I'd like to understand why we need it if it's not failing upstream (as per my logstash search) and I can't repro it on stable/ocata (2 node DVR setup with and without l3_ha). What does our CI job have different from upsteam that makes it fail for us?

Comment 6 Daniel Alvarez Sanchez 2017-04-04 13:57:56 UTC
Ok, I've been able to reproduce it in my setup. The l3 agent on the subnode wasn't registered on the primary neutron server and it was like if i only had one l3 agent running so it didn't fail. Now it does:

[vagrant@primary tempest]$ ostestr --pdb tempest.api.network.admin.test_l3_agent_scheduler
{0} tempest.api.network.admin.test_l3_agent_scheduler.L3AgentSchedulerTestJSON.test_add_list_remove_router_on_l3_agent [0.265699s] ... FAILED

Captured pythonlogging:
    2017-04-01 06:27:11,712 1457 INFO     [tempest.lib.common.rest_client] Request (L3AgentSchedulerTestJSON:test_add_list_remove_router_on_l3_agent): 409 POST 0.262s
    2017-04-01 06:27:11,713 1457 DEBUG    [tempest.lib.common.rest_client] Request - Headers: {'Content-Type': 'application/json', 'Accept': 'application/json', 'X-Auth-Token': '<omitted>'}
            Body: {"router_id": "20ca7262-0353-4eea-a47b-549f689c4bbb"}
        Response - Headers: {'status': '409', 'content-length': '205', 'content-location': '', 'date': 'Sat, 01 Apr 2017 06:27:11 GMT', 'content-type': 'application/json', 'connection': 'close', 'x-openstack-request-id': 'req-31626aa3-6358-47fd-b4ef-b9995488b4f4'}
            Body: {"NeutronError": {"message": "The router 20ca7262-0353-4eea-a47b-549f689c4bbb has been already hosted by the L3 Agent e8375a3d-023d-40b7-8d88-39fdec3448fb.", "type": "RouterHostedByL3Agent", "detail": ""}}

[vagrant@primary tempest]$ neutron agent-list
neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead.
| id                                   | agent_type         | host     | availability_zone | alive | admin_state_up | binary                    |
| 3aec0b44-1f0d-4042-be6f-ea6f32ca4a20 | L3 agent           | primary  | nova              | :-)   | True           | neutron-l3-agent          |
| 78944bb4-dddf-41e9-8833-0ca326084d50 | DHCP agent         | primary  | nova              | :-)   | True           | neutron-dhcp-agent        |
| 847495b1-2a77-4477-8331-9c7fbe97315b | Open vSwitch agent | primary  |                   | :-)   | True           | neutron-openvswitch-agent |
| a618e5bb-1c32-44b8-af41-6e2ac35b7e75 | Metadata agent     | primary  |                   | :-)   | True           | neutron-metadata-agent    |
| e8375a3d-023d-40b7-8d88-39fdec3448fb | L3 agent           | subnode1 | nova              | :-)   | True           | neutron-l3-agent          |

I have tried the tempest patch and now it works since we're not adding an interface to a router on resource_setup() so it's not hosted by any L3 agent when the test tries to assign it to one. Right now, the author says that it can't be merged because tempest still supports Liberty and that code would break it (Liberty doesn't support to assign a router to an agent if it doesn't have at least one interface).

Comment 7 Daniel Alvarez Sanchez 2017-04-06 08:35:53 UTC
(In reply to Assaf Muller from comment #3)
> Genadi, Daniel Mellado and Daniel Alvarez will try to get the Tempest patch
> merged:
> https://review.openstack.org/#/c/322118/

This patch's been already merged. The author changed the original implementation since there was already a fix for it and it consists of setting "dvr_extra_resources" option to False by default in tempest.config. This will avoid to add an interface when creating a distributed router which would automatically assign an l3 agent to it.

If tests are run against OSP8 (Liberty) or older, the interface has still to be added to the distributed router in order for the test to succeed.

Comment 8 Daniel Alvarez Sanchez 2017-04-11 08:59:50 UTC
Downstream patch got merged. I built the package already in brew and should be fixed now. The tempest tree patch got merged too and IIUC no extra work is needed since tempest downstream RPM package is simply packaging of upstream code.

Comment 11 Eran Kuris 2017-04-19 06:36:42 UTC
fix verified : 

Comment 14 errata-xmlrpc 2017-05-17 20:14:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.