Bug 1378530

Summary: Booting VM with a Floating IP and pinging it via that takes a long time with errors in L3-Agent logs when using DVR
Product: Red Hat OpenStack Reporter: Sai Sindhur Malleni <smalleni>
Component: openstack-neutronAssignee: Terry Wilson <twilson>
Status: CLOSED DUPLICATE QA Contact: Toni Freger <tfreger>
Severity: high Docs Contact:
Priority: high    
Version: 10.0 (Newton)CC: amuller, chrisw, nyechiel, srevivo
Target Milestone: ga   
Target Release: 10.0 (Newton)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-10-13 18:12:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Sai Sindhur Malleni 2016-09-22 16:31:03 UTC
Description of problem:
A Rally test to launch a VM, attach a floating IP and ping the VM via the floating IP 40 times in case of legacy routers vs DVR routers was done for comparison. Time taken to create network,subnet, launch VM, attach floating IP etc. are similar in legacy and DVR cases but for the VM to be pingable via the floating ip(after it has been booted with floating ip) it takes a lot more time in some iterations with DVR. The VM is ping ready(after booting and being given a floating ip) in less than a second not counting time to boot or attach floating ip in case of Legacy. However in case of DVR sometimes we see the VM being ping ready in less than 1 second whereas in some cases it takes around 250 seconds. Digging into the L3-agent logs on the computes we see this for the instances that were taking the most time to be pingable via the floating ip 
https://paste.fedoraproject.org/431117/74312098/

Version-Release number of selected component (if applicable):


How reproducible:
Happens intermittently. Suppose we create 40 FIPs happens in about 10 of them.

Steps to Reproduce:
1. Create DVR router, attach subnet
2. Launch VM on subnet
3. Attach FIP and ping

Rally-Plugin we used: https://github.com/openstack/browbeat/tree/master/rally/rally-plugins/netcreate-boot-ping

Actual results:
In some cases it was taking ~200 seconds for VM to be pingable via FIP. Correlating the FIPs that were taking a long time to L3-agent logs on computes, we see

2016-09-19 18:58:52.675 23696 DEBUG neutron.agent.linux.utils [-] Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'fip-790354c7-f286-4fd1-a4a1-ec9749c61fbf', 'arping', '-A', '-I', 'fg-6b5906d0-d9', '-c', '3', '-w', '4.5', '10.16.30.99'] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:99
2016-09-19 18:58:52.696 23696 ERROR neutron.agent.linux.utils [-] Exit code: 2; Stdin: ; Stdout: ; Stderr: bind: Cannot assign requested address

2016-09-19 18:58:52.697 23696 ERROR neutron.agent.linux.ip_lib [-] Failed sending gratuitous ARP to 10.16.30.99 on fg-6b5906d0-d9 in namespace fip-790354c7-f286-4fd1-a4a1-ec9749c61fbf

Rally-Plugin results:
http://8.43.86.1:8088/smalleni/20160919-172902-browbeat-netcreate-boot-ping-10-iteration-0.html#/BrowbeatPlugin.create_network_nova_boot_ping/details
The green spikes show ierations where it was taking a long time for VM to be pingable.

Expected results:
It should be pingable in reasonable small amount of time after FIp association. We see values less 1s for legacy routers.

Additional info:

Comment 2 Sai Sindhur Malleni 2016-10-05 20:01:18 UTC
Terry,
I still have the environment with me and can reproduce this. I might not have the environment for very long. Please let me know if you want to look. Happy to help if it makes things easier.

Comment 3 Nir Yechiel 2016-10-13 18:12:32 UTC

*** This bug has been marked as a duplicate of bug 1363661 ***