Bug 1929829 - [OSP16.2]Ha router configuration as master error due to qg-xx interface down
Summary: [OSP16.2]Ha router configuration as master error due to qg-xx interface down
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-neutron
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: beta
: 16.2 (Train on RHEL 8.4)
Assignee: Rodolfo Alonso
QA Contact: Candido Campos
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-02-17 18:17 UTC by Candido Campos
Modified: 2022-09-05 13:25 UTC (History)
3 users (show)

Fixed In Version: openstack-neutron-15.2.1-1.20210310111336.el8ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-09-15 06:39:15 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
logs (18.03 MB, application/gzip)
2021-02-17 18:17 UTC, Candido Campos
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1916024 0 None None None 2021-02-18 11:48:04 UTC
OpenStack gerrit 776427 0 None NEW [L3][HA] Retry when setting HA router GW status. 2021-02-18 18:11:06 UTC
Red Hat Issue Tracker OSP-679 0 None None None 2022-09-05 13:25:55 UTC
Red Hat Product Errata RHSA-2021:3488 0 None None None 2021-09-15 06:39:27 UTC

Description Candido Campos 2021-02-17 18:17:39 UTC
Created attachment 1757631 [details]
logs

Description of problem:
Sometimes a router is created with all the instances in standby mode because the qg-xx interface is in down state and there isn't connectivity:


(overcloud) [stack@undercloud-0 ~]$ neutron l3-agent-list-hosting-router router1
neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead.
+--------------------------------------+---------------------------+----------------+-------+----------+
| id                                   | host                      | admin_state_up | alive | ha_state |
+--------------------------------------+---------------------------+----------------+-------+----------+
| 3b93ec23-48fa-4847-bbb2-f8903e9865f9 | networker-1.redhat.local  | True           | :-)   | standby  |
| 41b8d1a8-4695-445a-916a-d12db523eb91 | controller-0.redhat.local | True           | :-)   | standby  |
| 4533bd88-d2d1-4320-9e39-6fcb2a5cc236 | networker-0.redhat.local  | True           | :-)   | standby  |
+--------------------------------------+---------------------------+----------------+-------+----------+
(overcloud) [stack@undercloud-0 ~]$ 


Version-Release number of selected component (if applicable):

(overcloud) [stack@undercloud-0 ~]$ cat core_puddle_version 
RHOS-16.2-RHEL-8-20210129.n.3

How reproducible:

(overcloud) [stack@undercloud-0 ~]$ cat scripts/create.sh 
set -x

ips=(0 10.0.0.215 10.0.0.249 10.0.0.223 10.0.0.222 10.0.0.218 10.0.0.247 10.0.0.210 10.0.0.220 10.0.0.246 10.0.0.213 10.0.0.224 10.0.0.212 10.0.0.217 10.0.0.221 10.0.0.216)
ips=(0 10.0.0.220 10.0.0.216 10.0.0.235 10.0.0.232 10.0.0.245 10.0.0.226 10.0.0.217 10.0.0.211 10.0.0.221 10.0.0.230 10.0.0.248 10.0.0.228 10.0.0.223 10.0.0.212 10.0.0.225)
ips=(0 10.0.0.217 10.0.0.246 10.0.0.231 10.0.0.247 10.0.0.222 10.0.0.250 10.0.0.216 10.0.0.246 10.0.0.247 10.0.0.235 10.0.0.211 10.0.0.236 10.0.0.215 10.0.0.212 10.0.0.234)
openstack network create net$1
openstack subnet create --network net$1  --dns-nameserver 10.0.0.1 --gateway 10.$1.0.1  --subnet-range 10.$1.0.0/16 net$1
openstack router create router$1
openstack router add subnet router$1 net$1
openstack router set router$1 --external-gateway nova

openstack server create --flavor cirros --image cirros   --nic net-id=net$1 --security-group test --key-name mykey vm$1

openstack server add floating ip vm$1 ${ips[$1]}

ping ${ips[$1]} -c 10

overcloud) [stack@undercloud-0 ~]$ cat scripts/delete.sh 

openstack server delete vm$1
openstack router remove subnet router$1 net$1 
openstack network delete net$1
openstack router delete router$1



Steps to Reproduce:
1. Repeat:
2. for i in $(seq 10); do ./create.sh $i; done
3. check fip connectivity for detecting the error
4. for i in $(seq 10); do ./delete.sh $i; done


Additional info:


Seems be a race condition between l3 and keepalived configuring qg-interface:
...
115314:Feb 17 00:53:35 networker-1 kernel: device qg-3e872c7f-68 entered promiscuous mode
115315:Feb 17 00:53:35 networker-1 NetworkManager[1073]: <info>  [1613523215.1566] manager: (qg-3e872c7f-68): new Generic device (/org/freedesktop/NetworkManager/Devices/457)
115317:Feb 17 00:53:35 networker-1 systemd-udevd[531523]: Could not generate persistent MAC address for qg-3e872c7f-68: No such file or directory
115318:Feb 17 00:53:35 networker-1 Keepalived_vrrp[531427]: Interface qg-3e872c7f-68 added
115335:Feb 17 00:53:35 networker-1 Keepalived_vrrp[531427]: Sending gratuitous ARP on qg-3e872c7f-68 for 10.0.0.227
115336:Feb 17 00:53:35 networker-1 Keepalived_vrrp[531427]: Error sending gratuitous ARP on qg-3e872c7f-68 for 10.0.0.227
115337:Feb 17 00:53:35 networker-1 Keepalived_vrrp[531427]: (VR_35) Sending/queueing gratuitous ARPs on qg-3e872c7f-68 for 10.0.0.227
115340:Feb 17 00:53:35 networker-1 Keepalived_vrrp[531427]: Sending unsolicited Neighbour Advert on qg-3e872c7f-68 for 2620:52:0:13b8::1000:56
115341:Feb 17 00:53:35 networker-1 Keepalived_vrrp[531427]: VRRP: Error sending ndisc unsolicited neighbour advert on qg-3e872c7f-68 for 2620:52:0:13b8::1000:56
115342:Feb 17 00:53:35 networker-1 Keepalived_vrrp[531427]: (VR_35) Sending/queueing Unsolicited Neighbour Adverts on qg-3e872c7f-68 for 2620:52:0:13b8::1000:56
115345:Feb 17 00:53:35 networker-1 Keepalived_vrrp[531427]: Sending unsolicited Neighbour Advert on qg-3e872c7f-68 for fe80::f816:3eff:fedc:db5c
115346:Feb 17 00:53:35 networker-1 Keepalived_vrrp[531427]: VRRP: Error sending ndisc unsolicited neighbour advert on qg-3e872c7f-68 for fe80::f816:3eff:fedc:db5c
115347:Feb 17 00:53:35 networker-1 Keepalived_vrrp[531427]: (VR_35) Sending/queueing Unsolicited Neighbour Adverts on qg-3e872c7f-68 for fe80::f816:3eff:fedc:db5c
115349:Feb 17 00:53:35 networker-1 Keepalived_vrrp[531427]: Sending gratuitous ARP on qg-3e872c7f-68 for 10.0.0.227
115350:Feb 17 00:53:35 networker-1 Keepalived_vrrp[531427]: Error sending gratuitous ARP on qg-3e872c7f-68 for 10.0.0.227
115352:Feb 17 00:53:35 networker-1 Keepalived_vrrp[531427]: Sending unsolicited Neighbour Advert on qg-3e872c7f-68 for 2620:52:0:13b8::1000:56
115353:Feb 17 00:53:35 networker-1 Keepalived_vrrp[531427]: VRRP: Error sending ndisc unsolicited neighbour advert on qg-3e872c7f-68 for 2620:52:0:13b8::1000:56
115355:Feb 17 00:53:35 networker-1 Keepalived_vrrp[531427]: Sending unsolicited Neighbour Advert on qg-3e872c7f-68 for fe80::f816:3eff:fedc:db5c
... 

qg-xx interface pass to down and it seems be removed during l3 agent conf 

l3-agent logs:

2021-02-17 00:53:35.031 481499 DEBUG neutron.common.utils [-] Time-cost: call f45973da-36a4-4f90-82b1-a79687c86662 function get_routers took 4.177s seconds to run wrapper /usr/lib/python3.6/site-packages/oslo_utils/timeutils.py:391
2021-02-17 00:53:35.031 481499 DEBUG neutron_lib.callbacks.manager [-] Notify callbacks [] for router, before_update _notify_loop /usr/lib/python3.6/site-packages/neutron_lib/callbacks/manager.py:193
2021-02-17 00:53:35.032 481499 DEBUG neutron.agent.l3.router_info [-] Process updates, router 35078d89-eee3-4334-af3a-e96929fae5dd process /usr/lib/python3.6/site-packages/neutron/agent/l3/router_info.py:1224
2021-02-17 00:53:35.039 481499 INFO neutron.agent.l3.ha [-] Router 35078d89-eee3-4334-af3a-e96929fae5dd transitioned to master on agent networker-1.redhat.local
2021-02-17 00:53:35.040 481499 INFO neutron.agent.l3.ha_router [-] Set router 35078d89-eee3-4334-af3a-e96929fae5dd gateway device link state to up.
2021-02-17 00:53:35.081 482044 DEBUG neutron.privileged.agent.linux.ip_lib [-] Interface qg-3e872c7f-68 not found in namespace qrouter-35078d89-eee3-4334-af3a-e96929fae5dd get_link_id /usr/lib/python3.6/site-packages/neutron/privileged/agent/linux/ip_lib.py:290
2021-02-17 00:53:35.081 482044 DEBUG oslo.privsep.daemon [-] privsep: reply[140140633282560]: (4, False) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511
2021-02-17 00:53:35.083 481499 DEBUG oslo_concurrency.lockutils [-] Lock "l3-agent-pd" acquired by "neutron.agent.linux.pd.PrefixDelegation.sync_router" :: waited 0.000s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:327
2021-02-17 00:53:35.083 481499 DEBUG oslo_concurrency.lockutils [-] Lock "l3-agent-pd" released by "neutron.agent.linux.pd.PrefixDelegation.sync_router" :: held 0.000s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:339
2021-02-17 00:53:35.083 481499 DEBUG oslo_concurrency.lockutils [-] Acquired lock "router-lock-ns-qrouter-35078d89-eee3-4334-af3a-e96929fae5dd" lock /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:265
2021-02-17 00:53:35.084 481499 DEBUG neutron.common.coordination [-] Lock "router-lock-ns-qrouter-35078d89-eee3-4334-af3a-e96929fae5dd" acquired by "process_external" :: waited 0.000s _synchronized /usr/lib/python3.6/site-packages/neutron/common/coordination.py:82
2021-02-17 00:53:35.084 481499 DEBUG neutron.agent.linux.interface [-] Device qg-3e872c7f-68 may concurrently be deleted. set_link_status /usr/lib/python3.6/site-packages/neutron/agent/linux/interface.py:327
2021-02-17 00:53:35.084 481499 DEBUG oslo_concurrency.lockutils [-] Lock "l3-agent-pd" acquired by "neutron.agent.linux.pd.PrefixDelegation.get_preserve_ips" :: waited 0.000s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:327
2021-02-17 00:53:35.085 481499 DEBUG oslo_concurrency.lockutils [-] Lock "l3-agent-pd" released by "neutron.agent.linux.pd.PrefixDelegation.get_preserve_ips" :: held 0.000s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:339
2021-02-17 00:53:35.085 481499 DEBUG neutron.agent.linux.interface [-] init_router_port: device_name(qg-3e872c7f-68), namespace(qrouter-35078d89-eee3-4334-af3a-e96929fae5dd) init_router_port /usr/lib/python3.6/site-packages/neutron/agent/linux/interface.py:171
2021-02-17 00:53:35.112 482044 DEBUG neutron.privileged.agent.linux.ip_lib [-] Interface qg-3e872c7f-68 not found in namespace qrouter-35078d89-eee3-4334-af3a-e96929fae5dd get_link_id /usr/lib/python3.6/site-packages/neutron/privileged/agent/linux/ip_lib.py:290
2021-02-17 00:53:35.113 482044 DEBUG oslo.privsep.daemon [-] privsep: reply[140140634520160]: (4, False) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511
2021-02-17 00:53:35.136 482044 DEBUG oslo.privsep.daemon [-] privsep: reply[140140634520160]: (4, True) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511
2021-02-17 00:53:35.139 481499 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=0): DelPortCommand(port=qg-3e872c7f-68, bridge=None, if_exists=True) do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84
2021-02-17 00:53:35.139 481499 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Transaction caused no change do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:121
2021-02-17 00:53:35.140 481499 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=0): AddPortCommand(bridge=br-int, port=qg-3e872c7f-68, may_exist=False) do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84
2021-02-17 00:53:35.140 481499 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=1): DbSetCommand(table=Port, record=qg-3e872c7f-68, col_values=(('tag', 4095),)) do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84
2021-02-17 00:53:35.141 481499 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=2): DbSetCommand(table=Interface, record=qg-3e872c7f-68, col_values=(('type', 'internal'), ('external_ids', {'iface-id': '3e872c7f-685a-4514-a1e7-320da0bd2d62', 'iface-status': 'active', 'attached-mac': 'fa:16:3e:dc:db:5c'}))) do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84
2021-02-17 00:53:35.145 482044 DEBUG oslo.privsep.daemon [-] privsep: Exception during request[140140633282560]: Network interface qg-3e872c7f-68 not found in namespace qrouter-35078d89-eee3-4334-af3a-e96929fae5dd. _process_cmd /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:490
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/oslo_privsep/daemon.py", line 485, in _process_cmd
    ret = func(*f_args, **f_kwargs)
  File "/usr/lib/python3.6/site-packages/oslo_privsep/priv_context.py", line 247, in _wrap
    return func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/neutron/privileged/agent/linux/ip_lib.py", line 453, in get_link_attributes
    link = _run_iproute_link("get", device, namespace)[0]
  File "/usr/lib/python3.6/site-packages/neutron/privileged/agent/linux/ip_lib.py", line 298, in _run_iproute_link
    idx = get_link_id(device, namespace)
  File "/usr/lib/python3.6/site-packages/neutron/privileged/agent/linux/ip_lib.py", line 287, in get_link_id
    raise NetworkInterfaceNotFound(device=device, namespace=namespace)
neutron.privileged.agent.linux.ip_lib.NetworkInterfaceNotFound: Network interface qg-3e872c7f-68 not found in namespace qrouter-35078d89-eee3-4334-af3a-e96929fae5dd.
2021-02-17 00:53:35.146 482044 DEBUG oslo.privsep.daemon [-] privsep: reply[140140633282560]: (5, 'neutron.privileged.agent.linux.ip_lib.NetworkInterfaceNotFound', ('Network interface qg-3e872c7f-68 not found in namespace qrouter-35078d89-eee3-4334-af3a-e96929fae5dd.',)) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511
2021-02-17 00:53:35.166 482044 DEBUG oslo.privsep.daemon [-] privsep: reply[140140634520160]: (4, ({'header': {'length': 36, 'type': 2, 'flags': 256, 'sequence_number': 255, 'pid': 482044, 'error': None, 'stats': Stats(qsize=0, delta=0, delay=0)}, 'event': 'NLMSG_ERROR'},)) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511
2021-02-17 00:53:35.172 482044 DEBUG oslo.privsep.daemon [-] privsep: reply[140140634520160]: (4, ['qrouter-35078d89-eee3-4334-af3a-e96929fae5dd', 'qrouter-21b65ce2-738f-4d1c-b387-c5aee912d2db', 'qdhcp-20d1812f-e986-4973-8a28-188458b8d3d1']) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511
...

logs attached

Comment 17 errata-xmlrpc 2021-09-15 06:39:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenStack Platform 16.2 (openstack-neutron) security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3488


Note You need to log in before you can comment on or make changes to this bug.