Bug 251527 - Fake ARP dropped after migration leading to loss of network connectivity
Summary: Fake ARP dropped after migration leading to loss of network connectivity
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen
Version: 5.0
Hardware: All
OS: Linux
low
low
Target Milestone: rc
: ---
Assignee: Don Dutile (Red Hat)
QA Contact: Martin Jenner
URL:
Whiteboard:
Depends On:
Blocks: 441716
TreeView+ depends on / blocked
 
Reported: 2007-08-09 15:30 UTC by Ian Campbell
Modified: 2008-05-21 14:49 UTC (History)
4 users (show)

Fixed In Version: RHBA-2008-0314
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-05-21 14:49:11 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
xen-unstable 13763:8132bf3ddbef ported to 2.6.18-53.el5 (1.26 KB, patch)
2007-12-14 09:31 UTC, Ian Campbell
no flags Details | Diff
xen-unstable 14280:42b29f084c31 ported to 2.6.18-53.el5 (10.83 KB, patch)
2007-12-14 09:32 UTC, Ian Campbell
no flags Details | Diff
[NET] link_watch: Always schedule urgent events (4.45 KB, patch)
2008-01-15 23:47 UTC, Herbert Xu
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2008:0314 0 normal SHIPPED_LIVE Updated kernel packages for Red Hat Enterprise Linux 5.2 2008-05-20 18:43:34 UTC

Description Ian Campbell 2007-08-09 15:30:35 UTC
After migration the Xen netfront sends a gratuitous ARP to cause arp caches to
get refreshed and minimise network downtime.

Unfortunately carrier detect is delayed in the kernel (up to a minute?) so the
ARP is either dropped meaning there is a network blackout until the ARP caches
expire or the guest generates an ARP for some other reason.

This was fixed in the upstream Xen kernel by
http://xenbits.xensource.com/xen-unstable.hg?rev/42b29f084c31

Or in the upstream mainline kernel by
http://lkml.org/lkml/2007/5/8/179
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=572a103ded0ad880f75ce83e99f0512fbb80b5b0
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=294cc44b7e48a6e7732499eebcf409b231460d8e

Comment 1 Ian Campbell 2007-12-14 09:31:30 UTC
Created attachment 288931 [details]
xen-unstable 13763:8132bf3ddbef ported to 2.6.18-53.el5

Comment 2 Ian Campbell 2007-12-14 09:32:02 UTC
Created attachment 288941 [details]
xen-unstable 14280:42b29f084c31 ported to 2.6.18-53.el5

Comment 3 Herbert Xu 2008-01-15 23:47:29 UTC
Created attachment 291769 [details]
[NET] link_watch: Always schedule urgent events

I have rolled up the commits d9568ba91b1fdd1ea4fdbf9fcc76b867cca6c1d5 and
db0ccffed91e234cad99a35f07d5a322f410baa2 into one and backported it to RHEL5.

Comment 4 Bill Burns 2008-01-23 19:41:12 UTC
Assigning and setting flags.


Comment 6 Rob Kenna 2008-01-23 20:06:50 UTC
Causing problems with at least one customer configuration where they are
performing failback in clustered configuration.  Would like this in 5.2.


"This is really killing us because it makes zero-downtime failback
impossible - we are seeing 30-60s loss of connectivity until the ARP
cache expires."

Comment 7 Nick Strugnell 2008-01-24 10:34:51 UTC
This needs a matching bug for 4.6 as we are seeing it in 4.6 DomUs

Nick

Comment 8 Don Dutile (Red Hat) 2008-01-24 15:07:03 UTC
Nick:

BZ 429930 is the rhel4 clone of this bug, and I've attached the rhel4
equiv. patch for it.

We're in the process of doing live migration testing of rhel5.2-ish & 
rhel4.7-ish kernel with the respective patches.  once verified, i'll
post the rhel4 patch (for 4.7). 
if needed for 4.6.z, pls raise flags for that additional effort.

- Don

Comment 9 Don Zickus 2008-01-24 16:08:51 UTC
in 2.6.18-74.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 11 Jakub Suchy 2008-02-06 16:35:05 UTC
We (our customer) have just tested updated kernels and can't confirm that this
issue is fixed. Network blackout is shorter, about 15 sec (comparing to 1-3
minutes before updating the kernel), but we expect it to be much more shorter (1
second?)

Comment 21 errata-xmlrpc 2008-05-21 14:49:11 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0314.html



Note You need to log in before you can comment on or make changes to this bug.