Bug 875309

Summary: An Hyper-V RHEL6.3 Guest is unreachable from the network after live migration
Product: Red Hat Enterprise Linux 6 Reporter: Claudio Latini <claudio.latini>
Component: kernelAssignee: jason wang <jasowang>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 6.3CC: bsarathy, ddeng, habdi, haiyangz, jasowang, jbian, juzhang, kys, leiwang, moli, qguan, shwang, tburke
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-2.6.32-347.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-02-21 06:56:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Claudio Latini 2012-11-10 11:41:56 UTC
Description of problem:
An RHEL 6.3 Hyper-V VM Guest is unreachable from the network after live migration. This happens because the VM doesn't send a gratuitous ARP (GARP) to inform the underlying network the node change. The VM is configured with MSFT Linux Integration Components (LIC) 3.4 and hv_netvsc syntetic NIC.

Version-Release number of selected component (if applicable):
Red Hat Enrterprise Linux 6.3 from installation media (2.6.32-279) and later.

How reproducible:
100%

Steps to Reproduce:
1. Obtain a at-least two-node 2008 R2 Hyper-V Cluster;
2. (IMPORTANT) Connect the cluster to a layer-2 network switch;
3. Create a RHEL 6.3 Hyper-V Linux Guest with LIC 3.4 and place it on a cluster node;
4. Configure VM to use the hv_netvsc syntetic driver and static MAC;
5. LIve migrate the VM to the other node.
 
Actual results:
The VM doen't send the GARP and after the migration is unreachable because the switch MAC address table is unchanged.

Expected results:
The VM must send the GARP advertising the underlying network.

Additional info:
The Hyper-v network driver calls netif_notify_peers() after live migration to perform the GARP task. However the RHEL 6.3 kernel code doesn't do this work unconditionally:

from net/ipv4/devinet.c:
---
case NETDEV_NOTIFY_PEERS:
case NETDEV_CHANGEADDR:
    /* Send gratuitous ARP to notify of link change */
    if (IN_DEV_ARP_NOTIFY(in_dev)) {
        struct in_ifaddr *ifa = in_dev->ifa_list;
        
        if (ifa)
            arp_send(ARPOP_REQUEST, ETH_P_ARP,
                    ifa->ifa_address, dev,
                    ifa->ifa_address, NULL,
                    dev->dev_addr, NULL);
    }
    break;
---

if the function IN_DEV_ARP_NOTIFY() return false the GARP is never sent.

The issue has been resolved in the upstream (see https://lkml.org/lkml/2011/3/30/536) and enhanced for secondary ip addresses (see https://lkml.org/lkml/2011/7/24/152)

So, Red Hat should also apply this patches to solve even in its kernel.

Comment 8 Dor Laor 2012-11-26 13:02:04 UTC
Arr, other hypervisors (such KVM) do the gratitious packet by the hyerpvisor and keep the guest OS outside of the scope). If that's upstream we can still fix it.

Comment 9 jason wang 2012-11-27 06:51:29 UTC
(In reply to comment #8)
> Arr, other hypervisors (such KVM) do the gratitious packet by the hyerpvisor
> and keep the guest OS outside of the scope). If that's upstream we can still
> fix it.

Btw, we plan to let guest (virtio-net) send the garp in the future (the guest driver part were already upstream).

Comment 11 RHEL Program Management 2012-11-28 20:11:55 UTC
This request was evaluated by Red Hat Product Management for
inclusion in a Red Hat Enterprise Linux release.  Product
Management has requested further review of this request by
Red Hat Engineering, for potential inclusion in a Red Hat
Enterprise Linux release for currently deployed products.
This request is not yet committed for inclusion in a release.

Comment 12 Jarod Wilson 2012-12-07 18:36:01 UTC
Patch(es)

Comment 15 Jarod Wilson 2012-12-11 20:32:40 UTC
Patch(es)

Comment 17 Shengnan Wang 2013-02-01 10:16:01 UTC
Hi K.Y.,
Could you please help to check the testing steps and results of the limitation testing? 

Testing the live migration on two Hyper-V hosts with one network adapter for each. Due to the environment limitation, there is still slightly packets drop during the process (Reference 'Networking considerations for live migration' http://technet.microsoft.com/en-us/library/ff428137%28WS.10%29.aspx)

Before fixing of the problem (testing with RHEL6.3 LIC guests), the RHEL6.3 LIC guest was out of network after doing live migration. Keeped pinging the guest, there were more than 400 packets lost in average.

Testing with the fixed kernel (with RHEL6.4 snapshot5 guest, kernel-2.6.32-356.el6). There is only about 60 packets lost during the live migration in average.

Comment 18 Shengnan Wang 2013-02-01 10:42:36 UTC
(In reply to comment #17)
> Hi K.Y.,
> Could you please help to check the testing steps and results of the
> limitation testing? 
> 

It should be 'live migration testing' not 'limitation testing'.


> Testing the live migration on two Hyper-V hosts with one network adapter for
> each. Due to the environment limitation, there is still slightly packets
> drop during the process (Reference 'Networking considerations for live
> migration'
> http://technet.microsoft.com/en-us/library/ff428137%28WS.10%29.aspx)
> 

There is a table listing the 'Host configuration' and Live migration bandwidth in 'Networking considerations for live migration' part. From the table, see that there will be some packets lost in the test environment with one network adapter.

> Before fixing of the problem (testing with RHEL6.3 LIC guests), the RHEL6.3
> LIC guest was out of network after doing live migration. Keeped pinging the
> guest, there were more than 400 packets lost in average.
> 
> Testing with the fixed kernel (with RHEL6.4 snapshot5 guest,
> kernel-2.6.32-356.el6). There is only about 60 packets lost during the live
> migration in average.

Are the steps and results enough to verify the bug? Or could you help to test the package if there is more suitable environment on your site?

Thanks!

Comment 19 K. Y. Srinivasan 2013-02-03 23:38:47 UTC
I think some packet loss is to be expected. I am copying Haiyang and Hashir. They can shed some additional light on this.

Comment 20 Haiyang Zhang 2013-02-04 15:41:42 UTC
(In reply to comment #19)
> I think some packet loss is to be expected. I am copying Haiyang and Hashir.
> They can shed some additional light on this.

I agree that a few packet loss during the transition is expected, as long as the VM is reachable after the migration.

Comment 21 Shengnan Wang 2013-02-05 10:28:14 UTC

Verify this problem with RHEL6.4 guest (kernel-2.6.32-356.el6). 

Build version:
Host: Microsoft Hyper-V Server 2012
Guest: RHEL6.4 (kernel-2.6.32-356.el6)

Steps:
1. Obtain a two-node 2012 Hyper-V Cluster. (There is one network adapter for each host.)
2. Connect the cluster to a layer-2 network switch;
3. Create a RHEL6.4 guest with the hv_netvsc driver on one host and configure the guest to use the static mac. 
4. Check the guest network via ping from the other machine. 
5. Live migrate the RHEL6.4 guest to the other host via SCVMM.
6. Check the output of the ping.

Results:
Due to the environment limitation, there is still slightly packets drop during the process. Details, please have a look at comment 17 and commment 18. Confirmed the test steps and results with MS side. Some packets loss is to be expected mentioned in comment 19 and comment 20.


So change the status fo the bug to 'verified'.

Comment 23 errata-xmlrpc 2013-02-21 06:56:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0496.html