Description of problem: After tenant router reschedules to new L3 agent , router gateway is not reachable Version-Release number of selected component (if applicable): RHOSP 7.1 How reproducible: Steps to Reproduce: 1.create a tenant router , network , create a VM on the network , make sure all are rechable 2. reboot the openstack controller/L3 agent which hosts the router 3.Router gets rescheduled to new L3 agent , router namespace is created in the new L3 agent. Ping gateway from VM , you see ARP packets are getting received , How ever no response sent from name space port. Actual results: Expected results: Additional info: [root@overcloud-controller-0 heat-admin]# ip netns exec qrouter-7f7d64da-79ac-45cd-849a-835c6cb510be tcpdump -i qr-b76cd758-da tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on qr-b76cd758-da, link-type EN10MB (Ethernet), capture size 65535 bytes 17:41:03.149508 IP 10.17.0.1 > 224.0.0.5: OSPFv2, Hello, length 44 17:41:06.868205 ARP, Request who-has 1.1.1.1 tell 1.1.1.5, length 46 17:41:07.866440 ARP, Request who-has 1.1.1.1 tell 1.1.1.5, length 46 17:41:08.866271 ARP, Request who-has 1.1.1.1 tell 1.1.1.5, length 46 17:41:09.868217 ARP, Request who-has 1.1.1.1 tell 1.1.1.5, length 46 17:41:10.866293 ARP, Request who-has 1.1.1.1 tell 1.1.1.5, length 46 17:41:11.866308 ARP, Request who-has 1.1.1.1 tell 1.1.1.5, length 46 17:41:12.868542 ARP, Request who-has 1.1.1.1 tell 1.1.1.5, length 46 17:41:13.229388 IP 10.17.0.1 > 224.0.0.5: OSPFv2, Hello, length 44 17:41:13.866297 ARP, Request who-has 1.1.1.1 tell 1.1.1.5, length 46 17:41:14.866301 ARP, Request who-has 1.1.1.1 tell 1.1.1.5, length 46 ^C 11 packets captured 11 packets received by filter 0 packets dropped by kernel [root@overcloud-controller-0 heat-admin]# ip netns exec qrouter-7f7d64da-79ac-45cd-849a-835c6cb510be tcpdump -i qr-b76cd758-da -xxx tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on qr-b76cd758-da, link-type EN10MB (Ethernet), capture size 65535 bytes 17:41:18.869140 ARP, Request who-has 1.1.1.1 tell 1.1.1.5, length 46 0x0000: ffff ffff ffff fa16 3e8c 7e34 8100 0006 0x0010: 0806 0001 0800 0604 0001 fa16 3e8c 7e34 0x0020: 0101 0105 0000 0000 0000 0101 0101 0000 0x0030: 0000 0000 0000 0000 0000 0000 0000 0000 17:41:19.866283 ARP, Request who-has 1.1.1.1 tell 1.1.1.5, length 46 0x0000: ffff ffff ffff fa16 3e8c 7e34 8100 0006 0x0010: 0806 0001 0800 0604 0001 fa16 3e8c 7e34 0x0020: 0101 0105 0000 0000 0000 0101 0101 0000 0x0030: 0000 0000 0000 0000 0000 0000 0000 0000 [root@overcloud-controller-0 heat-admin]# ip netns exec qrouter-7f7d64da-79ac-45cd-849a-835c6cb510be route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 10.17.0.5 0.0.0.0 UG 0 0 0 qg-d92fd1ef-10 1.1.1.0 0.0.0.0 255.255.255.0 U 0 0 0 qr-b76cd758-da 1.1.2.0 0.0.0.0 255.255.255.0 U 0 0 0 qr-8ea964bd-d6 10.17.0.0 0.0.0.0 255.255.192.0 U 0 0 0 qg-d92fd1ef-10 [root@overcloud-controller-0 heat-admin]# ip netns exec qrouter-7f7d64da-79ac-45cd-849a-835c6cb510be ifconfig lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10<host> loop txqueuelen 0 (Local Loopback) RX packets 92 bytes 7652 (7.4 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 92 bytes 7652 (7.4 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 qg-d92fd1ef-10: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 10.17.0.14 netmask 255.255.192.0 broadcast 10.17.63.255 inet6 fe80::f816:3eff:fe06:1bef prefixlen 64 scopeid 0x20<link> ether fa:16:3e:06:1b:ef txqueuelen 1000 (Ethernet) RX packets 5003 bytes 331632 (323.8 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 19 bytes 1326 (1.2 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 qr-8ea964bd-d6: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 1.1.2.1 netmask 255.255.255.0 broadcast 1.1.2.255 inet6 fe80::f816:3eff:fe93:efaf prefixlen 64 scopeid 0x20<link> ether fa:16:3e:93:ef:af txqueuelen 1000 (Ethernet) RX packets 74 bytes 6218 (6.0 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 10 bytes 864 (864.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 qr-b76cd758-da: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 1.1.1.1 netmask 255.255.255.0 broadcast 1.1.1.255 inet6 fe80::f816:3eff:fec8:e3f8 prefixlen 64 scopeid 0x20<link> ether fa:16:3e:c8:e3:f8 txqueuelen 1000 (Ethernet) RX packets 5036 bytes 334168 (326.3 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 10 bytes 864 (864.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
After router reschedule , All the router port state set to "BUILD" which is why the port is not responding to any packets. [stack@rhel-dell-71 ~]$ neutron router-port-list 7f7d64da-79ac-45cd-849a-835c6cb510be +--------------------------------------+------+-------------------+-----------------------------------------------------------------------------------+ | id | name | mac_address | fixed_ips | +--------------------------------------+------+-------------------+-----------------------------------------------------------------------------------+ | 8ea964bd-d6e5-4b34-9b97-ab49557b976a | | fa:16:3e:93:ef:af | {"subnet_id": "647233bd-1406-4e73-ad03-436a07b7fcc5", "ip_address": "1.1.2.1"} | | b76cd758-da75-439a-831d-737912a5e03f | | fa:16:3e:c8:e3:f8 | {"subnet_id": "69323284-51f4-4b71-8567-8e7f376fb63e", "ip_address": "1.1.1.1"} | | d92fd1ef-10b5-4b3e-af5f-d3741ea563ae | | fa:16:3e:06:1b:ef | {"subnet_id": "a94db0a8-c137-41b1-bac4-f4d98676c25f", "ip_address": "10.17.0.14"} | +--------------------------------------+------+-------------------+-----------------------------------------------------------------------------------+ [stack@rhel-dell-71 ~]$ neutron port-show b76cd758-da75-439a-831d-737912a5e03f +-----------------------+--------------------------------------------------------------------------------+ | Field | Value | +-----------------------+--------------------------------------------------------------------------------+ | admin_state_up | True | | allowed_address_pairs | | | binding:host_id | overcloud-controller-0.localdomain | | binding:profile | {} | | binding:vif_details | {"port_filter": true, "ovs_hybrid_plug": true} | | binding:vif_type | ovs | | binding:vnic_type | normal | | device_id | 7f7d64da-79ac-45cd-849a-835c6cb510be | | device_owner | network:router_interface | | extra_dhcp_opts | | | fixed_ips | {"subnet_id": "69323284-51f4-4b71-8567-8e7f376fb63e", "ip_address": "1.1.1.1"} | | id | b76cd758-da75-439a-831d-737912a5e03f | | mac_address | fa:16:3e:c8:e3:f8 | | name | | | network_id | f5da3160-37dd-4609-b3ee-2268b1e5d9ba | | security_groups | | | status | BUILD | | tenant_id | d5e56f3c0c4d4474aeba54eb79d754a8 | +-----------------------+--------------------------------------------------------------------------------+ [stack@rhel-dell-71 ~]$
I gather you're not using HA routers, and that you enabled allow_automatic_l3agent_failover in neutron.conf? Can I ask why?
The reason is because of existing known bug which is mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1260298
Can we mark this one closed as a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1260298?
Is router HA must configuration for multiple L3 agent setup?
Did you try HA routers with the fix in https://bugzilla.redhat.com/show_bug.cgi?id=1253953?
Waiting for needinfo from February, closing for now. Please re-open if still relevant.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days