Created attachment 1757631 [details] logs Description of problem: Sometimes a router is created with all the instances in standby mode because the qg-xx interface is in down state and there isn't connectivity: (overcloud) [stack@undercloud-0 ~]$ neutron l3-agent-list-hosting-router router1 neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead. +--------------------------------------+---------------------------+----------------+-------+----------+ | id | host | admin_state_up | alive | ha_state | +--------------------------------------+---------------------------+----------------+-------+----------+ | 3b93ec23-48fa-4847-bbb2-f8903e9865f9 | networker-1.redhat.local | True | :-) | standby | | 41b8d1a8-4695-445a-916a-d12db523eb91 | controller-0.redhat.local | True | :-) | standby | | 4533bd88-d2d1-4320-9e39-6fcb2a5cc236 | networker-0.redhat.local | True | :-) | standby | +--------------------------------------+---------------------------+----------------+-------+----------+ (overcloud) [stack@undercloud-0 ~]$ Version-Release number of selected component (if applicable): (overcloud) [stack@undercloud-0 ~]$ cat core_puddle_version RHOS-16.2-RHEL-8-20210129.n.3 How reproducible: (overcloud) [stack@undercloud-0 ~]$ cat scripts/create.sh set -x ips=(0 10.0.0.215 10.0.0.249 10.0.0.223 10.0.0.222 10.0.0.218 10.0.0.247 10.0.0.210 10.0.0.220 10.0.0.246 10.0.0.213 10.0.0.224 10.0.0.212 10.0.0.217 10.0.0.221 10.0.0.216) ips=(0 10.0.0.220 10.0.0.216 10.0.0.235 10.0.0.232 10.0.0.245 10.0.0.226 10.0.0.217 10.0.0.211 10.0.0.221 10.0.0.230 10.0.0.248 10.0.0.228 10.0.0.223 10.0.0.212 10.0.0.225) ips=(0 10.0.0.217 10.0.0.246 10.0.0.231 10.0.0.247 10.0.0.222 10.0.0.250 10.0.0.216 10.0.0.246 10.0.0.247 10.0.0.235 10.0.0.211 10.0.0.236 10.0.0.215 10.0.0.212 10.0.0.234) openstack network create net$1 openstack subnet create --network net$1 --dns-nameserver 10.0.0.1 --gateway 10.$1.0.1 --subnet-range 10.$1.0.0/16 net$1 openstack router create router$1 openstack router add subnet router$1 net$1 openstack router set router$1 --external-gateway nova openstack server create --flavor cirros --image cirros --nic net-id=net$1 --security-group test --key-name mykey vm$1 openstack server add floating ip vm$1 ${ips[$1]} ping ${ips[$1]} -c 10 overcloud) [stack@undercloud-0 ~]$ cat scripts/delete.sh openstack server delete vm$1 openstack router remove subnet router$1 net$1 openstack network delete net$1 openstack router delete router$1 Steps to Reproduce: 1. Repeat: 2. for i in $(seq 10); do ./create.sh $i; done 3. check fip connectivity for detecting the error 4. for i in $(seq 10); do ./delete.sh $i; done Additional info: Seems be a race condition between l3 and keepalived configuring qg-interface: ... 115314:Feb 17 00:53:35 networker-1 kernel: device qg-3e872c7f-68 entered promiscuous mode 115315:Feb 17 00:53:35 networker-1 NetworkManager[1073]: <info> [1613523215.1566] manager: (qg-3e872c7f-68): new Generic device (/org/freedesktop/NetworkManager/Devices/457) 115317:Feb 17 00:53:35 networker-1 systemd-udevd[531523]: Could not generate persistent MAC address for qg-3e872c7f-68: No such file or directory 115318:Feb 17 00:53:35 networker-1 Keepalived_vrrp[531427]: Interface qg-3e872c7f-68 added 115335:Feb 17 00:53:35 networker-1 Keepalived_vrrp[531427]: Sending gratuitous ARP on qg-3e872c7f-68 for 10.0.0.227 115336:Feb 17 00:53:35 networker-1 Keepalived_vrrp[531427]: Error sending gratuitous ARP on qg-3e872c7f-68 for 10.0.0.227 115337:Feb 17 00:53:35 networker-1 Keepalived_vrrp[531427]: (VR_35) Sending/queueing gratuitous ARPs on qg-3e872c7f-68 for 10.0.0.227 115340:Feb 17 00:53:35 networker-1 Keepalived_vrrp[531427]: Sending unsolicited Neighbour Advert on qg-3e872c7f-68 for 2620:52:0:13b8::1000:56 115341:Feb 17 00:53:35 networker-1 Keepalived_vrrp[531427]: VRRP: Error sending ndisc unsolicited neighbour advert on qg-3e872c7f-68 for 2620:52:0:13b8::1000:56 115342:Feb 17 00:53:35 networker-1 Keepalived_vrrp[531427]: (VR_35) Sending/queueing Unsolicited Neighbour Adverts on qg-3e872c7f-68 for 2620:52:0:13b8::1000:56 115345:Feb 17 00:53:35 networker-1 Keepalived_vrrp[531427]: Sending unsolicited Neighbour Advert on qg-3e872c7f-68 for fe80::f816:3eff:fedc:db5c 115346:Feb 17 00:53:35 networker-1 Keepalived_vrrp[531427]: VRRP: Error sending ndisc unsolicited neighbour advert on qg-3e872c7f-68 for fe80::f816:3eff:fedc:db5c 115347:Feb 17 00:53:35 networker-1 Keepalived_vrrp[531427]: (VR_35) Sending/queueing Unsolicited Neighbour Adverts on qg-3e872c7f-68 for fe80::f816:3eff:fedc:db5c 115349:Feb 17 00:53:35 networker-1 Keepalived_vrrp[531427]: Sending gratuitous ARP on qg-3e872c7f-68 for 10.0.0.227 115350:Feb 17 00:53:35 networker-1 Keepalived_vrrp[531427]: Error sending gratuitous ARP on qg-3e872c7f-68 for 10.0.0.227 115352:Feb 17 00:53:35 networker-1 Keepalived_vrrp[531427]: Sending unsolicited Neighbour Advert on qg-3e872c7f-68 for 2620:52:0:13b8::1000:56 115353:Feb 17 00:53:35 networker-1 Keepalived_vrrp[531427]: VRRP: Error sending ndisc unsolicited neighbour advert on qg-3e872c7f-68 for 2620:52:0:13b8::1000:56 115355:Feb 17 00:53:35 networker-1 Keepalived_vrrp[531427]: Sending unsolicited Neighbour Advert on qg-3e872c7f-68 for fe80::f816:3eff:fedc:db5c ... qg-xx interface pass to down and it seems be removed during l3 agent conf l3-agent logs: 2021-02-17 00:53:35.031 481499 DEBUG neutron.common.utils [-] Time-cost: call f45973da-36a4-4f90-82b1-a79687c86662 function get_routers took 4.177s seconds to run wrapper /usr/lib/python3.6/site-packages/oslo_utils/timeutils.py:391 2021-02-17 00:53:35.031 481499 DEBUG neutron_lib.callbacks.manager [-] Notify callbacks [] for router, before_update _notify_loop /usr/lib/python3.6/site-packages/neutron_lib/callbacks/manager.py:193 2021-02-17 00:53:35.032 481499 DEBUG neutron.agent.l3.router_info [-] Process updates, router 35078d89-eee3-4334-af3a-e96929fae5dd process /usr/lib/python3.6/site-packages/neutron/agent/l3/router_info.py:1224 2021-02-17 00:53:35.039 481499 INFO neutron.agent.l3.ha [-] Router 35078d89-eee3-4334-af3a-e96929fae5dd transitioned to master on agent networker-1.redhat.local 2021-02-17 00:53:35.040 481499 INFO neutron.agent.l3.ha_router [-] Set router 35078d89-eee3-4334-af3a-e96929fae5dd gateway device link state to up. 2021-02-17 00:53:35.081 482044 DEBUG neutron.privileged.agent.linux.ip_lib [-] Interface qg-3e872c7f-68 not found in namespace qrouter-35078d89-eee3-4334-af3a-e96929fae5dd get_link_id /usr/lib/python3.6/site-packages/neutron/privileged/agent/linux/ip_lib.py:290 2021-02-17 00:53:35.081 482044 DEBUG oslo.privsep.daemon [-] privsep: reply[140140633282560]: (4, False) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511 2021-02-17 00:53:35.083 481499 DEBUG oslo_concurrency.lockutils [-] Lock "l3-agent-pd" acquired by "neutron.agent.linux.pd.PrefixDelegation.sync_router" :: waited 0.000s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:327 2021-02-17 00:53:35.083 481499 DEBUG oslo_concurrency.lockutils [-] Lock "l3-agent-pd" released by "neutron.agent.linux.pd.PrefixDelegation.sync_router" :: held 0.000s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:339 2021-02-17 00:53:35.083 481499 DEBUG oslo_concurrency.lockutils [-] Acquired lock "router-lock-ns-qrouter-35078d89-eee3-4334-af3a-e96929fae5dd" lock /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:265 2021-02-17 00:53:35.084 481499 DEBUG neutron.common.coordination [-] Lock "router-lock-ns-qrouter-35078d89-eee3-4334-af3a-e96929fae5dd" acquired by "process_external" :: waited 0.000s _synchronized /usr/lib/python3.6/site-packages/neutron/common/coordination.py:82 2021-02-17 00:53:35.084 481499 DEBUG neutron.agent.linux.interface [-] Device qg-3e872c7f-68 may concurrently be deleted. set_link_status /usr/lib/python3.6/site-packages/neutron/agent/linux/interface.py:327 2021-02-17 00:53:35.084 481499 DEBUG oslo_concurrency.lockutils [-] Lock "l3-agent-pd" acquired by "neutron.agent.linux.pd.PrefixDelegation.get_preserve_ips" :: waited 0.000s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:327 2021-02-17 00:53:35.085 481499 DEBUG oslo_concurrency.lockutils [-] Lock "l3-agent-pd" released by "neutron.agent.linux.pd.PrefixDelegation.get_preserve_ips" :: held 0.000s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:339 2021-02-17 00:53:35.085 481499 DEBUG neutron.agent.linux.interface [-] init_router_port: device_name(qg-3e872c7f-68), namespace(qrouter-35078d89-eee3-4334-af3a-e96929fae5dd) init_router_port /usr/lib/python3.6/site-packages/neutron/agent/linux/interface.py:171 2021-02-17 00:53:35.112 482044 DEBUG neutron.privileged.agent.linux.ip_lib [-] Interface qg-3e872c7f-68 not found in namespace qrouter-35078d89-eee3-4334-af3a-e96929fae5dd get_link_id /usr/lib/python3.6/site-packages/neutron/privileged/agent/linux/ip_lib.py:290 2021-02-17 00:53:35.113 482044 DEBUG oslo.privsep.daemon [-] privsep: reply[140140634520160]: (4, False) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511 2021-02-17 00:53:35.136 482044 DEBUG oslo.privsep.daemon [-] privsep: reply[140140634520160]: (4, True) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511 2021-02-17 00:53:35.139 481499 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=0): DelPortCommand(port=qg-3e872c7f-68, bridge=None, if_exists=True) do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84 2021-02-17 00:53:35.139 481499 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Transaction caused no change do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:121 2021-02-17 00:53:35.140 481499 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=0): AddPortCommand(bridge=br-int, port=qg-3e872c7f-68, may_exist=False) do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84 2021-02-17 00:53:35.140 481499 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=1): DbSetCommand(table=Port, record=qg-3e872c7f-68, col_values=(('tag', 4095),)) do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84 2021-02-17 00:53:35.141 481499 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=2): DbSetCommand(table=Interface, record=qg-3e872c7f-68, col_values=(('type', 'internal'), ('external_ids', {'iface-id': '3e872c7f-685a-4514-a1e7-320da0bd2d62', 'iface-status': 'active', 'attached-mac': 'fa:16:3e:dc:db:5c'}))) do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84 2021-02-17 00:53:35.145 482044 DEBUG oslo.privsep.daemon [-] privsep: Exception during request[140140633282560]: Network interface qg-3e872c7f-68 not found in namespace qrouter-35078d89-eee3-4334-af3a-e96929fae5dd. _process_cmd /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:490 Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/oslo_privsep/daemon.py", line 485, in _process_cmd ret = func(*f_args, **f_kwargs) File "/usr/lib/python3.6/site-packages/oslo_privsep/priv_context.py", line 247, in _wrap return func(*args, **kwargs) File "/usr/lib/python3.6/site-packages/neutron/privileged/agent/linux/ip_lib.py", line 453, in get_link_attributes link = _run_iproute_link("get", device, namespace)[0] File "/usr/lib/python3.6/site-packages/neutron/privileged/agent/linux/ip_lib.py", line 298, in _run_iproute_link idx = get_link_id(device, namespace) File "/usr/lib/python3.6/site-packages/neutron/privileged/agent/linux/ip_lib.py", line 287, in get_link_id raise NetworkInterfaceNotFound(device=device, namespace=namespace) neutron.privileged.agent.linux.ip_lib.NetworkInterfaceNotFound: Network interface qg-3e872c7f-68 not found in namespace qrouter-35078d89-eee3-4334-af3a-e96929fae5dd. 2021-02-17 00:53:35.146 482044 DEBUG oslo.privsep.daemon [-] privsep: reply[140140633282560]: (5, 'neutron.privileged.agent.linux.ip_lib.NetworkInterfaceNotFound', ('Network interface qg-3e872c7f-68 not found in namespace qrouter-35078d89-eee3-4334-af3a-e96929fae5dd.',)) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511 2021-02-17 00:53:35.166 482044 DEBUG oslo.privsep.daemon [-] privsep: reply[140140634520160]: (4, ({'header': {'length': 36, 'type': 2, 'flags': 256, 'sequence_number': 255, 'pid': 482044, 'error': None, 'stats': Stats(qsize=0, delta=0, delay=0)}, 'event': 'NLMSG_ERROR'},)) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511 2021-02-17 00:53:35.172 482044 DEBUG oslo.privsep.daemon [-] privsep: reply[140140634520160]: (4, ['qrouter-35078d89-eee3-4334-af3a-e96929fae5dd', 'qrouter-21b65ce2-738f-4d1c-b387-c5aee912d2db', 'qdhcp-20d1812f-e986-4973-8a28-188458b8d3d1']) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511 ... logs attached
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenStack Platform 16.2 (openstack-neutron) security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3488