Bug 1271777 - router gateway is not reachable after tenant router reschdules to new L3 agent [NEEDINFO]
router gateway is not reachable after tenant router reschdules to new L3 agent
Status: CLOSED WONTFIX
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-neutron (Show other bugs)
7.0 (Kilo)
Unspecified Unspecified
unspecified Severity high
: z5
: 7.0 (Kilo)
Assigned To: lpeer
Ofer Blaut
: ZStream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-10-14 12:39 EDT by bigswitch
Modified: 2016-06-04 12:18 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-06-04 12:18:14 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
amuller: needinfo? (rhosp-bugs-internal)


Attachments (Terms of Use)

  None (edit)
Description bigswitch 2015-10-14 12:39:39 EDT
Description of problem:
After tenant router reschedules to new L3 agent , router gateway is not reachable

Version-Release number of selected component (if applicable):
RHOSP 7.1

How reproducible:


Steps to Reproduce:
1.create a tenant router , network , create a VM on the network , make sure all are rechable
2. reboot the openstack controller/L3 agent which hosts the router
3.Router gets rescheduled to new L3 agent , router namespace is created in the new L3 agent.

Ping gateway from VM , you see ARP packets are getting received , How ever no response sent from name space port.

Actual results:


Expected results:


Additional info:

[root@overcloud-controller-0 heat-admin]# ip netns exec qrouter-7f7d64da-79ac-45cd-849a-835c6cb510be tcpdump -i qr-b76cd758-da
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on qr-b76cd758-da, link-type EN10MB (Ethernet), capture size 65535 bytes
17:41:03.149508 IP 10.17.0.1 > 224.0.0.5: OSPFv2, Hello, length 44
17:41:06.868205 ARP, Request who-has 1.1.1.1 tell 1.1.1.5, length 46
17:41:07.866440 ARP, Request who-has 1.1.1.1 tell 1.1.1.5, length 46
17:41:08.866271 ARP, Request who-has 1.1.1.1 tell 1.1.1.5, length 46
17:41:09.868217 ARP, Request who-has 1.1.1.1 tell 1.1.1.5, length 46
17:41:10.866293 ARP, Request who-has 1.1.1.1 tell 1.1.1.5, length 46
17:41:11.866308 ARP, Request who-has 1.1.1.1 tell 1.1.1.5, length 46
17:41:12.868542 ARP, Request who-has 1.1.1.1 tell 1.1.1.5, length 46
17:41:13.229388 IP 10.17.0.1 > 224.0.0.5: OSPFv2, Hello, length 44
17:41:13.866297 ARP, Request who-has 1.1.1.1 tell 1.1.1.5, length 46
17:41:14.866301 ARP, Request who-has 1.1.1.1 tell 1.1.1.5, length 46
^C
11 packets captured
11 packets received by filter
0 packets dropped by kernel
[root@overcloud-controller-0 heat-admin]# ip netns exec qrouter-7f7d64da-79ac-45cd-849a-835c6cb510be tcpdump -i qr-b76cd758-da -xxx
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on qr-b76cd758-da, link-type EN10MB (Ethernet), capture size 65535 bytes
17:41:18.869140 ARP, Request who-has 1.1.1.1 tell 1.1.1.5, length 46
        0x0000:  ffff ffff ffff fa16 3e8c 7e34 8100 0006
        0x0010:  0806 0001 0800 0604 0001 fa16 3e8c 7e34
        0x0020:  0101 0105 0000 0000 0000 0101 0101 0000
        0x0030:  0000 0000 0000 0000 0000 0000 0000 0000
17:41:19.866283 ARP, Request who-has 1.1.1.1 tell 1.1.1.5, length 46
        0x0000:  ffff ffff ffff fa16 3e8c 7e34 8100 0006
        0x0010:  0806 0001 0800 0604 0001 fa16 3e8c 7e34
        0x0020:  0101 0105 0000 0000 0000 0101 0101 0000
        0x0030:  0000 0000 0000 0000 0000 0000 0000 0000

[root@overcloud-controller-0 heat-admin]# ip netns exec qrouter-7f7d64da-79ac-45cd-849a-835c6cb510be route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.17.0.5       0.0.0.0         UG    0      0        0 qg-d92fd1ef-10
1.1.1.0         0.0.0.0         255.255.255.0   U     0      0        0 qr-b76cd758-da
1.1.2.0         0.0.0.0         255.255.255.0   U     0      0        0 qr-8ea964bd-d6
10.17.0.0       0.0.0.0         255.255.192.0   U     0      0        0 qg-d92fd1ef-10

[root@overcloud-controller-0 heat-admin]# ip netns exec qrouter-7f7d64da-79ac-45cd-849a-835c6cb510be ifconfig
lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 0  (Local Loopback)
        RX packets 92  bytes 7652 (7.4 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 92  bytes 7652 (7.4 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

qg-d92fd1ef-10: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.17.0.14  netmask 255.255.192.0  broadcast 10.17.63.255
        inet6 fe80::f816:3eff:fe06:1bef  prefixlen 64  scopeid 0x20<link>
        ether fa:16:3e:06:1b:ef  txqueuelen 1000  (Ethernet)
        RX packets 5003  bytes 331632 (323.8 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 19  bytes 1326 (1.2 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

qr-8ea964bd-d6: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 1.1.2.1  netmask 255.255.255.0  broadcast 1.1.2.255
        inet6 fe80::f816:3eff:fe93:efaf  prefixlen 64  scopeid 0x20<link>
        ether fa:16:3e:93:ef:af  txqueuelen 1000  (Ethernet)
        RX packets 74  bytes 6218 (6.0 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 10  bytes 864 (864.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

qr-b76cd758-da: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 1.1.1.1  netmask 255.255.255.0  broadcast 1.1.1.255
        inet6 fe80::f816:3eff:fec8:e3f8  prefixlen 64  scopeid 0x20<link>
        ether fa:16:3e:c8:e3:f8  txqueuelen 1000  (Ethernet)
        RX packets 5036  bytes 334168 (326.3 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 10  bytes 864 (864.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
Comment 2 bigswitch 2015-10-14 14:17:39 EDT
After router reschedule , All the router port state set to "BUILD" which is why the port is not responding to any packets.

[stack@rhel-dell-71 ~]$ neutron router-port-list 7f7d64da-79ac-45cd-849a-835c6cb510be
+--------------------------------------+------+-------------------+-----------------------------------------------------------------------------------+
| id                                   | name | mac_address       | fixed_ips                                                                         |
+--------------------------------------+------+-------------------+-----------------------------------------------------------------------------------+
| 8ea964bd-d6e5-4b34-9b97-ab49557b976a |      | fa:16:3e:93:ef:af | {"subnet_id": "647233bd-1406-4e73-ad03-436a07b7fcc5", "ip_address": "1.1.2.1"}    |
| b76cd758-da75-439a-831d-737912a5e03f |      | fa:16:3e:c8:e3:f8 | {"subnet_id": "69323284-51f4-4b71-8567-8e7f376fb63e", "ip_address": "1.1.1.1"}    |
| d92fd1ef-10b5-4b3e-af5f-d3741ea563ae |      | fa:16:3e:06:1b:ef | {"subnet_id": "a94db0a8-c137-41b1-bac4-f4d98676c25f", "ip_address": "10.17.0.14"} |
+--------------------------------------+------+-------------------+-----------------------------------------------------------------------------------+
[stack@rhel-dell-71 ~]$ neutron port-show b76cd758-da75-439a-831d-737912a5e03f
+-----------------------+--------------------------------------------------------------------------------+
| Field                 | Value                                                                          |
+-----------------------+--------------------------------------------------------------------------------+
| admin_state_up        | True                                                                           |
| allowed_address_pairs |                                                                                |
| binding:host_id       | overcloud-controller-0.localdomain                                             |
| binding:profile       | {}                                                                             |
| binding:vif_details   | {"port_filter": true, "ovs_hybrid_plug": true}                                 |
| binding:vif_type      | ovs                                                                            |
| binding:vnic_type     | normal                                                                         |
| device_id             | 7f7d64da-79ac-45cd-849a-835c6cb510be                                           |
| device_owner          | network:router_interface                                                       |
| extra_dhcp_opts       |                                                                                |
| fixed_ips             | {"subnet_id": "69323284-51f4-4b71-8567-8e7f376fb63e", "ip_address": "1.1.1.1"} |
| id                    | b76cd758-da75-439a-831d-737912a5e03f                                           |
| mac_address           | fa:16:3e:c8:e3:f8                                                              |
| name                  |                                                                                |
| network_id            | f5da3160-37dd-4609-b3ee-2268b1e5d9ba                                           |
| security_groups       |                                                                                |
| status                | BUILD                                                                          |
| tenant_id             | d5e56f3c0c4d4474aeba54eb79d754a8                                               |
+-----------------------+--------------------------------------------------------------------------------+
[stack@rhel-dell-71 ~]$
Comment 3 Assaf Muller 2015-10-14 14:39:22 EDT
I gather you're not using HA routers, and that you enabled allow_automatic_l3agent_failover in neutron.conf? Can I ask why?
Comment 4 bigswitch 2015-10-14 17:23:59 EDT
The reason is because of existing known bug which is mentioned in 

https://bugzilla.redhat.com/show_bug.cgi?id=1260298
Comment 5 Nir Yechiel 2015-10-15 03:12:32 EDT
Can we mark this one closed as a duplicate of 
https://bugzilla.redhat.com/show_bug.cgi?id=1260298?
Comment 6 bigswitch 2015-10-16 11:46:42 EDT
Is router HA must configuration for multiple L3 agent setup?
Comment 12 Assaf Muller 2016-02-06 12:39:53 EST
Did you try HA routers with the fix in https://bugzilla.redhat.com/show_bug.cgi?id=1253953?
Comment 13 Assaf Muller 2016-06-04 12:18:14 EDT
Waiting for needinfo from February, closing for now. Please re-open if still relevant.

Note You need to log in before you can comment on or make changes to this bug.