Red Hat Bugzilla – Bug 251527
Fake ARP dropped after migration leading to loss of network connectivity
Last modified: 2008-05-21 10:49:11 EDT
After migration the Xen netfront sends a gratuitous ARP to cause arp caches to
get refreshed and minimise network downtime.
Unfortunately carrier detect is delayed in the kernel (up to a minute?) so the
ARP is either dropped meaning there is a network blackout until the ARP caches
expire or the guest generates an ARP for some other reason.
This was fixed in the upstream Xen kernel by
Or in the upstream mainline kernel by
Created attachment 288931 [details]
xen-unstable 13763:8132bf3ddbef ported to 2.6.18-53.el5
Created attachment 288941 [details]
xen-unstable 14280:42b29f084c31 ported to 2.6.18-53.el5
Created attachment 291769 [details]
[NET] link_watch: Always schedule urgent events
I have rolled up the commits d9568ba91b1fdd1ea4fdbf9fcc76b867cca6c1d5 and
db0ccffed91e234cad99a35f07d5a322f410baa2 into one and backported it to RHEL5.
Assigning and setting flags.
Causing problems with at least one customer configuration where they are
performing failback in clustered configuration. Would like this in 5.2.
"This is really killing us because it makes zero-downtime failback
impossible - we are seeing 30-60s loss of connectivity until the ARP
This needs a matching bug for 4.6 as we are seeing it in 4.6 DomUs
BZ 429930 is the rhel4 clone of this bug, and I've attached the rhel4
equiv. patch for it.
We're in the process of doing live migration testing of rhel5.2-ish &
rhel4.7-ish kernel with the respective patches. once verified, i'll
post the rhel4 patch (for 4.7).
if needed for 4.6.z, pls raise flags for that additional effort.
You can download this test kernel from http://people.redhat.com/dzickus/el5
We (our customer) have just tested updated kernels and can't confirm that this
issue is fixed. Network blackout is shorter, about 15 sec (comparing to 1-3
minutes before updating the kernel), but we expect it to be much more shorter (1
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.