Description of problem: A tenant VM will stay unreachable forever after resetting all controllers. Description: This has been observed on OSP13 (OSP12 as well) and is not always reproduceable (i.e. a race). What is usually needed to reproduce, is to do the following: 1.1 Deploy a default OSP13 environment with three controllers and one compute 1.2 Spawn a VM on the overcloud and make sure it is pingable 1.3 Reset all controllers (virsh reset of all controller VMs is what we used) 1.4 Observe that the VM spawned at 1.2 will be unpingable forever Michele kindly did following analysis: From an initial analysis this looks to be a problem with the startup of the neutron-l3 agent during system startup. The reason that the bootup sequence is suspected, is that a simple 'docker restart neutron_l3_agent' on the controller which hosts the active router, will fix the connectivity and make the VM pingable again. Some initial analysis (beagles helped me take a look here) 2.1 The VM is s_rally_582a83c1_Uf2oyPmh +--------------------------------------+---------------------------+----------------------------------+--------+------------+-------------+------------------------------------------------+ | ID | Name | Tenant ID | Status | Task State | Power State | Networks | +--------------------------------------+---------------------------+----------------------------------+--------+------------+-------------+------------------------------------------------+ | c7f439b9-48e6-4190-9ec1-9316d0271385 | s_rally_582a83c1_Uf2oyPmh | e3403c04bb2c42038b35372fce17f08b | ACTIVE | - | Running | c_rally_582a83c1_RpOmhBL5=10.2.0.9, 10.0.0.212 | +--------------------------------------+---------------------------+----------------------------------+--------+------------+-------------+------------------------------------------------+ The VM is unreachable: (overcloud) [stack@undercloud-0 ~]$ ping -c2 -n 10.0.0.212 PING 10.0.0.212 (10.0.0.212) 56(84) bytes of data. From 10.0.0.82 icmp_seq=1 Destination Host Unreachable From 10.0.0.82 icmp_seq=2 Destination Host Unreachable --- 10.0.0.212 ping statistics --- 2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 999ms 2.2 Routers seem to be up and healthy (overcloud) [stack@undercloud-0 ~]$ openstack router list +--------------------------------------+---------------------------+--------+-------+-------------+------+----------------------------------+ | ID | Name | Status | State | Distributed | HA | Project | +--------------------------------------+---------------------------+--------+-------+-------------+------+----------------------------------+ | b8764221-6fb1-4087-bb9e-4e383d247fea | c_rally_582a83c1_AGyejBtL | ACTIVE | UP | False | True | e3403c04bb2c42038b35372fce17f08b | | d2a52176-d365-4ddf-80c8-754c36aeaaa8 | c_rally_aa0d80e5_ZSKtuFoP | ACTIVE | UP | False | True | 212245c6c9904c0ca6d7424ad6e167f9 | +--------------------------------------+---------------------------+--------+-------+-------------+------+----------------------------------+ Our router of interest is c_rally_582a83c1_AGyejBtL. 2.3 The L3 agents look healthy and the active one is on controller-0 (overcloud) [stack@undercloud-0 ~]$ neutron l3-agent-list-hosting-router c_rally_582a83c1_AGyejBtL +--------------------------------------+--------------------------+----------------+-------+----------+ | id | host | admin_state_up | alive | ha_state | +--------------------------------------+--------------------------+----------------+-------+----------+ | b82380f5-3bc5-404d-840e-165f5b98f814 | controller-0.localdomain | True | :-) | active | | e983154b-77d6-440d-9aac-0fa19b39015a | controller-2.localdomain | True | :-) | standby | | 0d0d2c3f-65c4-4c41-a749-40ba7c6c70eb | controller-1.localdomain | True | :-) | standby | +--------------------------------------+--------------------------+----------------+-------+----------+ 2.4 On controller-0 we see that the qrouter namespaces exist: [root@controller-0 ~]# ip netns qdhcp-b2896143-56fd-4914-89e3-534f8c6e8edc (id: 3) qdhcp-2b61b5d8-7b00-49e2-88f2-3e6bf52d2843 (id: 2) qrouter-d2a52176-d365-4ddf-80c8-754c36aeaaa8 (id: 0) qrouter-b8764221-6fb1-4087-bb9e-4e383d247fea (id: 1) So in our case we are interested in qrouter-b8764221-6fb1-4087-bb9e-4e383d247fea (since the router's ID is b8764221-6fb1-4087-bb9e-4e383d247fea) 2.5 IPs seem to be up [root@controller-0 ~]# ip netns exec qrouter-b8764221-6fb1-4087-bb9e-4e383d247fea ip a |grep 10.0.0.212 inet 10.0.0.212/32 scope global qg-48a2c3ad-2b 2.6 Iptables rules seem to be set up correctly [root@controller-0 ~]# ip netns exec qrouter-b8764221-6fb1-4087-bb9e-4e383d247fea iptables -t nat -nvL |grep 10.0.0.212 0 0 DNAT all -- * * 0.0.0.0/0 10.0.0.212 to:10.2.0.9 0 0 DNAT all -- * * 0.0.0.0/0 10.0.0.212 to:10.2.0.9 1 488 SNAT all -- * * 10.2.0.9 0.0.0.0/0 to:10.0.0.212 2.7 It seems the ping packets (an ICMP ping was running on the background while running the following) do not make it to the namespace: [root@controller-0 ~]# ip netns exec qrouter-b8764221-6fb1-4087-bb9e-4e383d247fea tcpdump -i any -nn icmp or arp tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes 0 packets captured 0 packets received by filter 0 packets dropped by kernel 2.8 In fact if we tcpdump the icmp on the host and not inside the qrouter namespace we see the following: [root@controller-0 ~]# tcpdump -i any -nn icmp or arp tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes 07:47:03.968657 ARP, Request who-has 10.0.0.212 tell 10.0.0.82, length 28 07:47:03.968657 ARP, Request who-has 10.0.0.212 tell 10.0.0.82, length 28 07:47:04.970617 ARP, Request who-has 10.0.0.212 tell 10.0.0.82, length 28 This suggests that the ARP broadcasts asking for 10.0.0.212 does not make it inside the qrouter-b8764221-6fb1-4087-bb9e-4e383d247fea namespace, because I see the ARP packets making it to the host but not to the qrouter namespace. 3.0 Openvswitch To me this state seems to imply that OVS on restart did not get the the memo as to which interfaces should be plugged into br-int. Let's look at the interfaces in the two qrouter namespaces that are present on controller-0: 3.0.1. interfaces on the qrouter associated to the unpingable VM [root@controller-0 ~]# ip netns exec qrouter-b8764221-6fb1-4087-bb9e-4e383d247fea ip -o l 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000\ link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 26: ha-355b7212-d8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000\ link/ether fa:16:3e:0b:3f2 brd ff:ff:ff:ff:ff:ff 28: qr-a80561b7-b8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000\ link/ether fa:16:3e:63:93:ac brd ff:ff:ff:ff:ff:ff 30: qg-48a2c3ad-2b: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000\ link/ether fa:16:3e:37:7b:01 brd ff:ff:ff:ff:ff:ff 3.0.2. interfaces on the other qrouter on controller-0 [root@controller-0 ~]# ip netns exec qrouter-d2a52176-d365-4ddf-80c8-754c36aeaaa8 ip -o l 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000\ link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 25: ha-696f2557-7f: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000\ link/ether fa:16:3e:ff:2d:aa brd ff:ff:ff:ff:ff:ff 27: qr-7494de2e-89: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000\ link/ether fa:16:3e:548:ed brd ff:ff:ff:ff:ff:ff 29: qg-9ab09778-cf: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000\ link/ether fa:16:3e:ef:41:af brd ff:ff:ff:ff:ff:ff 3.1. Let's look at openvswitch on br-int [root@controller-0 ~]# ovs-vsctl list-ports br-int ha-0b1ed539-19 ha-355b7212-d8 ha-696f2557-7f int-br-ex int-br-isolated patch-tun qg-48a2c3ad-2b qg-9ab09778-cf qr-7494de2e-89 qr-a80561b7-b8 tap1b405498-37 tap66899034-fe So to me all the interfaces seem to be correctly hooked up into br-int. So the question could be "why are arp packets not forwarded to the qrouter namespace?" (see 2.7) 3.2 Let's see what flows are associated with br-int [root@controller-0 ~]# ovs-ofctl dump-flows br-int cookie=0x34ac59012b2b7a03, duration=67506.488s, table=0, n_packets=2239, n_bytes=114058, priority=3,in_port="int-br-ex",vlan_tci=0x0000/0x1fff actions=mod_vlan_vid:3,resubmit(,60) cookie=0x34ac59012b2b7a03, duration=67606.822s, table=0, n_packets=0, n_bytes=0, priority=2,in_port="int-br-ex" actions=drop cookie=0x34ac59012b2b7a03, duration=67606.703s, table=0, n_packets=10, n_bytes=448, priority=2,in_port="int-br-isolated" actions=drop cookie=0x34ac59012b2b7a03, duration=67483.973s, table=0, n_packets=943, n_bytes=39826, priority=2,in_port="qg-48a2c3ad-2b" actions=drop cookie=0x34ac59012b2b7a03, duration=67705.230s, table=0, n_packets=69293, n_bytes=4430886, priority=0 actions=resubmit(,60) cookie=0x34ac59012b2b7a03, duration=67705.235s, table=23, n_packets=0, n_bytes=0, priority=0 actions=drop cookie=0x34ac59012b2b7a03, duration=67705.215s, table=24, n_packets=0, n_bytes=0, priority=0 actions=drop cookie=0x34ac59012b2b7a03, duration=67705.225s, table=60, n_packets=71532, n_bytes=4544944, priority=3 actions=NORMAL 3.3 If I try to simulate an ARP packet comint into br-int, it seems correct to me: [root@controller-0 ~]# ovs-appctl ofproto/trace br-int in_port=int-br-ex,dl_dst=ff:ff:ff:ff:ff:ff Flow: in_port=2,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=ff:ff:ff:ff:ff:ff,dl_type=0x0000 bridge("br-int") ---------------- 0. in_port=2,vlan_tci=0x0000/0x1fff, priority 3, cookie 0x34ac59012b2b7a03 push_vlan:0x8100 set_field:4099->vlan_vid goto_table:60 60. priority 3, cookie 0x34ac59012b2b7a03 NORMAL -> no learned MAC for destination, flooding bridge("br-tun") ---------------- 0. in_port=1, priority 1, cookie 0xcb2833856c776e1a goto_table:2 2. dl_dst=01:00:00:00:00:00/01:00:00:00:00:00, priority 0, cookie 0xcb2833856c776e1a goto_table:22 22. priority 0, cookie 0xcb2833856c776e1a drop bridge("br-isolated") --------------------- 0. in_port=7, priority 2, cookie 0x3c8251f853f5f884 drop Final flow: in_port=2,dl_vlan=3,dl_vlan_pcp=0,vlan_tci1=0x0000,dl_src=00:00:00:00:00:00,dl_dst=ff:ff:ff:ff:ff:ff,dl_type=0x0000 Megaflow: recirc_id=0,eth,in_port=2,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=ff:ff:ff:ff:ff:ff,dl_type=0x0000 Datapath actions: push_vlan(vid=3,pcp=0),7,pop_vlan,9 I.e. it floods all the ports with the request. Problem is we do not see it arrive on qrouter-b8764221-6fb1-4087-bb9e-4e383d247fea Version-Release number of selected component (if applicable): $ rpm -qa | grep neutron puppet-neutron-13.3.1-0.20180831211808.7d209c7.el7ost.noarch python-neutron-fwaas-13.0.1-0.20180830231353.5863c57.el7ost.noarch openstack-neutron-common-13.0.1-0.20180830212847.3cc89a9.el7ost.noarch python-neutron-lbaas-13.0.1-0.20180831185310.e0cca6e.el7ost.noarch openstack-neutron-13.0.1-0.20180830212847.3cc89a9.el7ost.noarch openstack-neutron-fwaas-13.0.1-0.20180830231353.5863c57.el7ost.noarch openstack-neutron-ml2-13.0.1-0.20180830212847.3cc89a9.el7ost.noarch python2-neutronclient-6.9.0-0.20180809172620.d090ea2.el7ost.noarch python2-neutron-lib-1.18.0-0.20180816094046.67865c7.el7ost.noarch python-neutron-13.0.1-0.20180830212847.3cc89a9.el7ost.noarch openstack-neutron-lbaas-13.0.1-0.20180831185310.e0cca6e.el7ost.noarch How reproducible: 30% (It does seem as racy behaviour). Steps to Reproduce: 1. Create an instance with pingable FIP 2. Ungracefully reset whole controller plane (overcloud controller nodes which host neutron service 3. Ping the instance after control plane is back online Actual results: Ping to the instance is lost "forever" Expected results: Connection to the instance should be restored and not lost Additional info: I can possibly reproduce if live situation is needed for debugging
Assaf pointed to some old patch https://review.openstack.org/#/c/162260/ which maybe can help with this issue. As I already checked this patch allows to try to bind port even if port is already in "binding_failed" state. But this isn't triggered when agent is revived. Maybe solution here can be to add such option to try to rebind all binding_failed ports from host when L2 agent is revived. I will investigate that.
U/S patch is merged to stable/queens now so we will have it in next sync with U/S
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0093