Bug 2055171 - Neutron requests answered with RouterNotFound
Summary: Neutron requests answered with RouterNotFound
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-networking-ovn
Version: 16.2 (Train)
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: z2
: 16.2 (Train on RHEL 8.4)
Assignee: Elvira
QA Contact: Eduardo Olivares
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-02-16 12:17 UTC by Eduardo Olivares
Modified: 2022-03-23 22:13 UTC (History)
10 users (show)

Fixed In Version: python-networking-ovn-7.4.2-2.20220113214853.el8ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-23 22:13:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 830509 0 None NEW [OVN] Handle RouterNotFound exception in set_gateway_mtu 2022-02-23 09:56:07 UTC
Red Hat Issue Tracker OSP-12718 0 None Closed etcdエラー 2022-04-19 08:53:58 UTC
Red Hat Product Errata RHBA-2022:1001 0 None None None 2022-03-23 22:13:15 UTC

Description Eduardo Olivares 2022-02-16 12:17:51 UTC
Description of problem:
Tempest tests are executed on several processes/workers in parallel. Some tests fail performing different operations on routers (the issue has been observed with DELETE and PUT requests) raising the following error:
RouterNotFound - Router <router-id> could not be found

---BEGIN EDIT DESCRIPTION---
I have reviewed again the tempest logs and I have to correct what I said above: the issue is only observed with PUT requests, and more specifically it only happens with add_router_interface and remove_router_interface PUT requests

All the occurrences of this bug during the execution of tempest can be obtained running the following script on the tempest logs from [1]:
j=0
for i in $(seq 1 $(zgrep -c "404 PUT.*router" tempest.log.gz))
do
    requestfirstline=$(zgrep -m$i -A4 "404 PUT.*router" tempest.log.gz | tail -5 | head -1)
    requestlastline=$(zgrep -m$i -A4 "404 PUT.*router" tempest.log.gz | tail -1)
    routernotfound=$(echo $requestlastline | grep "Router .* could not be found" | sed 's/.*Router \(.*\) could not be found.*/\1/')
    [ "$routernotfound" != "" ] && echo $requestfirstline| grep -v $routernotfound && echo "RouterNotFound: $routernotfound" && ((j=j+1)) && echo "^^^ Ocurrence number $j"
done
---END EDIT DESCRIPTION---


When this error happens, the request is sent for routerA, but the error response refers to routerB. And it seems routerB was used in a previous test and was deleted before.


I will paste here some tempest logs (obtained at [1]):
1) The tempest PID 302589 successfully deletes the router d4e8bfce-0004-4255-93c1-e88f370039f1 at 02:36:20
2022-02-14 02:36:20.355 302589 INFO tempest.lib.common.rest_client [req-4ebda313-720f-4459-887b-a27a750f9da0 ] Request (RoutersTest:_run_cleanups): 204 DELETE http://10.218.0.101:9696/v2.0/routers/d4e8bfce-0004-4255-93c1-e88f370039f1 3.315s

2) The tempest PID 302593 fails at disconnecting a subnet from the router 3ef3f26c-ea8c-4d86-a730-f371e5866807 at 02:36:27 - the error response says "Router d4e8bfce-0004-4255-93c1-e88f370039f1 could not be found":
2022-02-14 02:36:27.400 302593 INFO tempest.lib.common.rest_client [req-a94eda04-3140-4a71-8611-02e4887a5b0a ] Request (TestGettingAddress:_run_cleanups): 404 PUT http://10.218.0.101:9696/v2.0/routers/3ef3f26c-ea8c-4d86-a730-f371e5866807/remove_router_interface 12.749s
2022-02-14 02:36:27.400 302593 DEBUG tempest.lib.common.rest_client [req-a94eda04-3140-4a71-8611-02e4887a5b0a ] Request - Headers: {'Content-Type': 'application/json', 'Accept': 'application/json', 'X-Auth-Token': '<omitted>'}
        Body: {"subnet_id": "c305b87c-cb76-40f9-9723-23d59a86fe52"}
    Response - Headers: {'content-type': 'application/json', 'content-length': '135', 'x-openstack-request-id': 'req-a94eda04-3140-4a71-8611-02e4887a5b0a', 'date': 'Mon, 14 Feb 2022 02:36:27 GMT', 'connection': 'close', 'status': '404', 'content-location': 'http://10.218.0.101:9696/v2.0/routers/3ef3f26c-ea8c-4d86-a730-f371e5866807/remove_router_interface'}
        Body: b'{"NeutronError": {"type": "RouterNotFound", "message": "Router d4e8bfce-0004-4255-93c1-e88f370039f1 could not be found", "detail": ""}}' _log_request_full /usr/lib/python3.6/site-packages/tempest/lib/common/rest_client.py:457




The neutron server logs at controller-1 (see [2]) show some warning messages for the failed request mentioned above at 2), but the wrong router id is always seen on those logs:
$ zgrep "WARNING.*req-a94eda04-3140-4a71-8611-02e4887a5b0a" server.log.2.gz 
2022-02-14 02:36:24.898 15 WARNING neutron.scheduler.dhcp_agent_scheduler [req-a94eda04-3140-4a71-8611-02e4887a5b0a 3f4622530e9b4262be6f525269e61f1c 936c75b581f3439b88a58c06b0ff4be9 - default default] No more DHCP agents: neutron_lib.exceptions.l3.RouterNotFound: Router d4e8bfce-0004-4255-93c1-e88f370039f1 could not be found
2022-02-14 02:36:24.905 15 WARNING neutron.api.rpc.agentnotifiers.dhcp_rpc_agent_api [req-a94eda04-3140-4a71-8611-02e4887a5b0a 3f4622530e9b4262be6f525269e61f1c 936c75b581f3439b88a58c06b0ff4be9 - default default] Unable to schedule network a1a38b62-079b-4869-85e9-9c7c9223c967: no agents available; will retry on subsequent port and subnet creation events.: neutron_lib.exceptions.l3.RouterNotFound: Router d4e8bfce-0004-4255-93c1-e88f370039f1 could not be found
2022-02-14 02:36:27.103 15 WARNING neutron.scheduler.dhcp_agent_scheduler [req-a94eda04-3140-4a71-8611-02e4887a5b0a 3f4622530e9b4262be6f525269e61f1c 936c75b581f3439b88a58c06b0ff4be9 - default default] No more DHCP agents: neutron_lib.exceptions.l3.RouterNotFound: Router d4e8bfce-0004-4255-93c1-e88f370039f1 could not be found
2022-02-14 02:36:27.118 15 WARNING neutron.api.rpc.agentnotifiers.dhcp_rpc_agent_api [req-a94eda04-3140-4a71-8611-02e4887a5b0a 3f4622530e9b4262be6f525269e61f1c 936c75b581f3439b88a58c06b0ff4be9 - default default] Unable to schedule network a1a38b62-079b-4869-85e9-9c7c9223c967: no agents available; will retry on subsequent port and subnet creation events.: neutron_lib.exceptions.l3.RouterNotFound: Router d4e8bfce-0004-4255-93c1-e88f370039f1 could not be found





Several tempest tests from the following job failed due to this issue. See [3].



Version-Release number of selected component (if applicable):
RHOS-16.2-RHEL-8-20220210.n.1

How reproducible:
issue only found in one job, but affecting several tests



[1] http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-network-networking-ovn-16.2_director-rhel-virthost-3cont_2comp_3net-ipv4-geneve-composable-vlan-provider-network/64/undercloud-0/home/stack/tempest-dir/tempest.log.gz
[2] http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-network-networking-ovn-16.2_director-rhel-virthost-3cont_2comp_3net-ipv4-geneve-composable-vlan-provider-network/64/controller-1/var/log/containers/neutron/server.log.2.gz
[3] https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/DFG/view/network/view/networking-ovn/job/DFG-network-networking-ovn-16.2_director-rhel-virthost-3cont_2comp_3net-ipv4-geneve-composable-vlan-provider-network/64/

Comment 5 Elvira 2022-02-23 09:56:07 UTC
Upstream patch ready: https://review.opendev.org/c/openstack/neutron/+/830509

Comment 6 Eran Kuris 2022-02-27 07:51:14 UTC
(In reply to Elvira from comment #5)
> Upstream patch ready: https://review.opendev.org/c/openstack/neutron/+/830509


Hi Elvira, 
can you provide the d/s patch [backport of 2 branches trunk and non-trunk]

Comment 20 errata-xmlrpc 2022-03-23 22:13:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 16.2.2), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:1001


Note You need to log in before you can comment on or make changes to this bug.