Bug 1956758

Summary: periodic-tripleo-ci-rhel-8-standalone-full-tempest-api-rhos-16.2 failing on networking related tempest tests, Neutron server have ERROR ovsdbapp.event neutron.plugins.ml2.common.exceptions.MechanismDriverError
Product: Red Hat OpenStack Reporter: Sandeep Yadav <sandyada>
Component: openstack-neutronAssignee: Slawek Kaplonski <skaplons>
Status: CLOSED NOTABUG QA Contact: Eran Kuris <ekuris>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 16.2 (Train)CC: amuller, chrisw, lmartins, ralonsoh, scohen
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-05-14 12:26:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sandeep Yadav 2021-05-04 11:23:39 UTC
Description of problem:

In 16.2 Integration pipeline, periodic-tripleo-ci-rhel-8-standalone-full-tempest-scenario-rhos-16.2 is failing on tempest tests.

Version-Release number of selected component (if applicable):
16.2

How reproducible:
Everytime


Multiple Networking related tempest tests are failing

~~~
2} tempest.api.compute.admin.test_auto_allocate_network.AutoAllocateNetworkTest.test_server_multi_create_auto_allocate [0.130464s] ... FAILED
{3 tempest.api.network.admin.test_dhcp_agent_scheduler.DHCPAgentSchedulersTestJSON.test_add_remove_network_from_dhcp_agent [2.486588s] ... FAILED
3} tempest.api.network.admin.test_dhcp_agent_scheduler.DHCPAgentSchedulersTestJSON.test_list_networks_hosted_by_one_dhcp [0.023769s] ... FAILED
3} neutron_tempest_plugin.api.admin.test_dhcp_agent_scheduler.DHCPAgentSchedulersTestJSON.test_add_remove_network_from_dhcp_agent [3.548264s] ... FAILED
3} neutron_tempest_plugin.api.admin.test_dhcp_agent_scheduler.DHCPAgentSchedulersTestJSON.test_dhcp_port_status_active [60.180165s] ... FAILED
3} neutron_tempest_plugin.api.admin.test_dhcp_agent_scheduler.DHCPAgentSchedulersTestJSON.test_list_networks_hosted_by_one_dhcp [0.045708s] ... FAILED
{1} neutron_tempest_plugin.api.test_ports.PortsTestJSON.test_create_update_port_with_dns_name [3.020825s] ... FAILED
~~~

Neutron Server have below errors:-

~~~
2021-05-03 15:42:00.811 15 ERROR neutron.plugins.ml2.managers Traceback (most recent call last):
2021-05-03 15:42:00.811 15 ERROR neutron.plugins.ml2.managers   File "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/managers.py", line 477, in _call_on_drivers
2021-05-03 15:42:00.811 15 ERROR neutron.plugins.ml2.managers     getattr(driver.obj, method_name)(context)
2021-05-03 15:42:00.811 15 ERROR neutron.plugins.ml2.managers   File "/usr/lib/python3.6/site-packages/networking_ovn/ml2/mech_driver.py", line 716, in update_port_postcommit
2021-05-03 15:42:00.811 15 ERROR neutron.plugins.ml2.managers     self._ovn_client.update_port(port, port_object=original_port)
2021-05-03 15:42:00.811 15 ERROR neutron.plugins.ml2.managers   File "/usr/lib/python3.6/site-packages/networking_ovn/common/ovn_client.py", line 527, in update_port
2021-05-03 15:42:00.811 15 ERROR neutron.plugins.ml2.managers     ovn_port = self._nb_idl.lookup('Logical_Switch_Port', port['id'])
2021-05-03 15:42:00.811 15 ERROR neutron.plugins.ml2.managers   File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/__init__.py", line 172, in lookup
2021-05-03 15:42:00.811 15 ERROR neutron.plugins.ml2.managers     return self._lookup(table, record)
2021-05-03 15:42:00.811 15 ERROR neutron.plugins.ml2.managers   File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/__init__.py", line 215, in _lookup
2021-05-03 15:42:00.811 15 ERROR neutron.plugins.ml2.managers     row = idlutils.row_by_value(self, rl.table, rl.column, record)
2021-05-03 15:42:00.811 15 ERROR neutron.plugins.ml2.managers   File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/idlutils.py", line 130, in row_by_value
2021-05-03 15:42:00.811 15 ERROR neutron.plugins.ml2.managers     raise RowNotFound(table=table, col=column, match=match)
2021-05-03 15:42:00.811 15 ERROR neutron.plugins.ml2.managers ovsdbapp.backend.ovs_idl.idlutils.RowNotFound: Cannot find Logical_Switch_Port with name=800e394f-cce4-4d14-82cf-df28be9a59d9
2021-05-03 15:42:00.811 15 ERROR neutron.plugins.ml2.managers 
2021-05-03 15:42:00.817 15 ERROR ovsdbapp.event [req-3d73696c-4ab0-4312-824f-4635bd9a0239 - - - - -] Unexpected exception in notify_loop: neutron.plugins.ml2.common.exceptions.MechanismDriverError
~~~

Comment 2 Sandeep Yadav 2021-05-04 11:49:17 UTC
Build History of job:

https://sf.hosted.upshift.rdu2.redhat.com/zuul/t/tripleo-ci-internal/builds?job_name=periodic-tripleo-ci-rhel-8-standalone-full-tempest-api-rhos-16.2

One of the affected job results:

Failing on tempest tests:

https://sf.hosted.upshift.rdu2.redhat.com/logs/openstack-periodic-integration-rhos-16.2/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-rhel-8-standalone-full-tempest-api-rhos-16.2/751cc5c/job-output.txt


Multiple Networking related tempest tests are failing

URL: https://sf.hosted.upshift.rdu2.redhat.com/logs/openstack-periodic-integration-rhos-16.2/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-rhel-8-standalone-full-tempest-api-rhos-16.2/751cc5c/logs/undercloud/var/log/tempest/stestr_results.html

https://sf.hosted.upshift.rdu2.redhat.com/logs/openstack-periodic-integration-rhos-16.2/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-rhel-8-standalone-full-tempest-api-rhos-16.2/751cc5c/logs/undercloud/var/log/tempest/tempest_run.log

~~~
2} tempest.api.compute.admin.test_auto_allocate_network.AutoAllocateNetworkTest.test_server_multi_create_auto_allocate [0.130464s] ... FAILED
{3 tempest.api.network.admin.test_dhcp_agent_scheduler.DHCPAgentSchedulersTestJSON.test_add_remove_network_from_dhcp_agent [2.486588s] ... FAILED
3} tempest.api.network.admin.test_dhcp_agent_scheduler.DHCPAgentSchedulersTestJSON.test_list_networks_hosted_by_one_dhcp [0.023769s] ... FAILED
3} neutron_tempest_plugin.api.admin.test_dhcp_agent_scheduler.DHCPAgentSchedulersTestJSON.test_add_remove_network_from_dhcp_agent [3.548264s] ... FAILED
3} neutron_tempest_plugin.api.admin.test_dhcp_agent_scheduler.DHCPAgentSchedulersTestJSON.test_dhcp_port_status_active [60.180165s] ... FAILED
3} neutron_tempest_plugin.api.admin.test_dhcp_agent_scheduler.DHCPAgentSchedulersTestJSON.test_list_networks_hosted_by_one_dhcp [0.045708s] ... FAILED
{1} neutron_tempest_plugin.api.test_ports.PortsTestJSON.test_create_update_port_with_dns_name [3.020825s] ... FAILED
~~~

Neutron Server have below errors:-

https://sf.hosted.upshift.rdu2.redhat.com/logs/openstack-periodic-integration-rhos-16.2/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-rhel-8-standalone-full-tempest-api-rhos-16.2/751cc5c/logs/undercloud/var/log/extra/errors.txt.txt
~~~
2021-05-03 15:42:00.811 15 ERROR neutron.plugins.ml2.managers Traceback (most recent call last):
2021-05-03 15:42:00.811 15 ERROR neutron.plugins.ml2.managers   File "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/managers.py", line 477, in _call_on_drivers
2021-05-03 15:42:00.811 15 ERROR neutron.plugins.ml2.managers     getattr(driver.obj, method_name)(context)
2021-05-03 15:42:00.811 15 ERROR neutron.plugins.ml2.managers   File "/usr/lib/python3.6/site-packages/networking_ovn/ml2/mech_driver.py", line 716, in update_port_postcommit
2021-05-03 15:42:00.811 15 ERROR neutron.plugins.ml2.managers     self._ovn_client.update_port(port, port_object=original_port)
2021-05-03 15:42:00.811 15 ERROR neutron.plugins.ml2.managers   File "/usr/lib/python3.6/site-packages/networking_ovn/common/ovn_client.py", line 527, in update_port
2021-05-03 15:42:00.811 15 ERROR neutron.plugins.ml2.managers     ovn_port = self._nb_idl.lookup('Logical_Switch_Port', port['id'])
2021-05-03 15:42:00.811 15 ERROR neutron.plugins.ml2.managers   File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/__init__.py", line 172, in lookup
2021-05-03 15:42:00.811 15 ERROR neutron.plugins.ml2.managers     return self._lookup(table, record)
2021-05-03 15:42:00.811 15 ERROR neutron.plugins.ml2.managers   File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/__init__.py", line 215, in _lookup
2021-05-03 15:42:00.811 15 ERROR neutron.plugins.ml2.managers     row = idlutils.row_by_value(self, rl.table, rl.column, record)
2021-05-03 15:42:00.811 15 ERROR neutron.plugins.ml2.managers   File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/idlutils.py", line 130, in row_by_value
2021-05-03 15:42:00.811 15 ERROR neutron.plugins.ml2.managers     raise RowNotFound(table=table, col=column, match=match)
2021-05-03 15:42:00.811 15 ERROR neutron.plugins.ml2.managers ovsdbapp.backend.ovs_idl.idlutils.RowNotFound: Cannot find Logical_Switch_Port with name=800e394f-cce4-4d14-82cf-df28be9a59d9
2021-05-03 15:42:00.811 15 ERROR neutron.plugins.ml2.managers 
2021-05-03 15:42:00.817 15 ERROR ovsdbapp.event [req-3d73696c-4ab0-4312-824f-4635bd9a0239 - - - - -] Unexpected exception in notify_loop: neutron.plugins.ml2.common.exceptions.MechanismDriverError
~~~

Comment 3 Sandeep Yadav 2021-05-05 10:24:43 UTC
The failed tests were skipped earlier, failed test is also skipped in upstream. 

This is caused by a tripleo-quickstart-extras patch, Some tests which were earlier skipped are running now, we are reverting that patch https://review.opendev.org/c/openstack/tripleo-quickstart-extras/+/789450

Comment 4 Lucas Alvares Gomes 2021-05-06 12:53:40 UTC
(In reply to Sandeep Yadav from comment #3)
> The failed tests were skipped earlier, failed test is also skipped in
> upstream. 
> 
> This is caused by a tripleo-quickstart-extras patch, Some tests which were
> earlier skipped are running now, we are reverting that patch
> https://review.opendev.org/c/openstack/tripleo-quickstart-extras/+/789450

Hi there,

Since the job is running with OVN the majority of these tests does not make sense in that context. For example all the DHCP Agent and DHCP Agent Schedulers ones will always fail with OVN because we do not deploy DHCP agents in an OVN setup. OVN has it's own built-in DHCP server and do not rely on the DHCP agent. So the tests below have to be skipped:

{3 tempest.api.network.admin.test_dhcp_agent_scheduler.DHCPAgentSchedulersTestJSON.test_add_remove_network_from_dhcp_agent [2.486588s] ... FAILED
3} tempest.api.network.admin.test_dhcp_agent_scheduler.DHCPAgentSchedulersTestJSON.test_list_networks_hosted_by_one_dhcp [0.023769s] ... FAILED
3} neutron_tempest_plugin.api.admin.test_dhcp_agent_scheduler.DHCPAgentSchedulersTestJSON.test_add_remove_network_from_dhcp_agent [3.548264s] ... FAILED
3} neutron_tempest_plugin.api.admin.test_dhcp_agent_scheduler.DHCPAgentSchedulersTestJSON.test_dhcp_port_status_active [60.180165s] ... FAILED
3} neutron_tempest_plugin.api.admin.test_dhcp_agent_scheduler.DHCPAgentSchedulersTestJSON.test_list_networks_hosted_by_one_dhcp [0.045708s] ... FAILED

The following test "tempest.api.compute.admin.test_auto_allocate_network.AutoAllocateNetworkTest.test_server_multi_create_auto_allocate" seems to have failed with a 400 error from nova:

Captured traceback:
~~~~~~~~~~~~~~~~~~~
    b'Traceback (most recent call last):'
    b'  File "/usr/lib/python3.6/site-packages/tempest/api/compute/admin/test_auto_allocate_network.py", line 176, in test_server_multi_create_auto_allocate'
    b'    min_count=3)'
    b'  File "/usr/lib/python3.6/site-packages/tempest/common/compute.py", line 200, in create_test_server'
    b'    **kwargs)'
    b'  File "/usr/lib/python3.6/site-packages/tempest/lib/services/compute/servers_client.py", line 103, in create_server'
    b"    resp, body = self.post('servers', post_body)"
    b'  File "/usr/lib/python3.6/site-packages/tempest/lib/common/rest_client.py", line 298, in post'
    b"    return self.request('POST', url, extra_headers, headers, body, chunked)"
    b'  File "/usr/lib/python3.6/site-packages/tempest/lib/services/compute/base_compute_client.py", line 48, in request'
    b'    method, url, extra_headers, headers, body, chunked)'
    b'  File "/usr/lib/python3.6/site-packages/tempest/lib/common/rest_client.py", line 702, in request'
    b'    self._error_checker(resp, resp_body)'
    b'  File "/usr/lib/python3.6/site-packages/tempest/lib/common/rest_client.py", line 813, in _error_checker'
    b'    raise exceptions.BadRequest(resp_body, resp=resp)'
    b'tempest.lib.exceptions.BadRequest: Bad request'
    b"Details: {'code': 400, 'message': 'Unable to automatically allocate a network for project 4c6518bb6f7043c4b13e7008f59c749c'}"
    b''

This will need further investigation but, since it's a 400 (Bad Request) and not a 5XX (Server errors) I imagine it can be a configuration problem ?

Comment 7 Sandeep Yadav 2021-05-14 12:26:21 UTC
The earlier failing tests shouldn't be run on the ML2/OVN backend

16.2 is currently using same skiplist as train https://opendev.org/openstack/openstack-tempest-skiplist/src/branch/master/roles/validate-tempest/vars/tempest_skip.yml#L902 , So these tests will not run.

Closing this bug as invalid