Bug 713585

Summary:

RHEL 6.1 Xen paravirt guest is getting network outage during live migration

Product:

Red Hat Enterprise Linux 6

Reporter:

asilva <asilva>

Component:

kernel

Assignee:

Laszlo Ersek <lersek>

Status:

CLOSED ERRATA

QA Contact:

Virtualization Bugs <virt-bugs>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

6.1

CC:

drjones, jmunilla, jzheng, leiwang, lersek, pbonzini, pcao, qguan, qwan, sputhenp, tburke, xen-maint, yuzhou

Target Milestone:

alpha

Keywords:

ZStream

Target Release:

6.2

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

kernel-2.6.32-169.el6

Doc Type:

Bug Fix

Doc Text:

In order to ensure uninterrupted network connectivity after live migration, the "net.ipv4.conf.INTERFACE.arp_notify" sysctl should be set to 1 in a Red Hat Enterprise Linux 6.1 Xen guest using the paravirtualized (xen-netfront) network driver.

Story Points:

---

Clone Of:

Clones:

761591 (view as bug list)

Environment:

Last Closed:

2011-12-06 13:41:31 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

720347

Bug Blocks:

Attachments:

Description	Flags
[debugging patch] try to queue 10 GARPs, with half a second between them	none
[debugging patch] sysadmin-controllable resend params	none
netback: partially revert the host side of 14bee682	none
netback: wait for hotplug scripts to complete before signalling Connected to frontend	none
xen-netfront: send gARP after backend moved to XenbusStateConnected	none

Comment 4 RHEL Program Management 2011-06-21 05:34:16 UTC

This request was evaluated by Red Hat Product Management for inclusion in Red Hat Enterprise Linux 5.7 and Red Hat does not plan to fix this issue the currently developed update.

Contact your manager or support representative in case you need to escalate this bug.

Comment 7 RHEL Program Management 2011-06-21 09:40:00 UTC

This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 9 Laszlo Ersek 2011-06-22 16:09:53 UTC

I can confirm the gratuitous ARP thing is not working under RHEL-6.1 (using the environment described in comment 0), even though all the arp_notify sysctls are set to 1 in the guest.

When ping-pong migrating the "rhel5-mig" (RHEL-5.6) guest, I was running this command on both mig-src and mig-dst hosts:

# tcpdump -l -vv -n -i br0 arp and ether src 00:16:36:06:ED:0C

The MAC being the RHEL-5.6 guest's MAC. When the migration finally switches over, the following is printed on *both* hosts:

  09:49:03.665698 arp reply 10.65.210.219 is-at 00:16:36:06:ed:0c
  09:49:03.665739 arp reply 10.65.210.219 is-at 00:16:36:06:ed:0c

And the ping from my laptop to the guest keeps getting responses.

However, when the RHEL-6.1 guest is migrated:

# tcpdump -l -vv -n -i br0 arp and ether src 00:16:36:48:cb:02

There are no such packets captured on either host. Ping from my laptop to the guest stops working while the guest is on host "B". As soon as the guest is migrated back, ping is immediately resurrected.

So we can safely say that the RHEL-6.1 guest does not send out the gratuitous ARP packet after it resumes on the new host, and the switch on the local ethernet segment keeps remembering the old port. I'll have to debug the RHEL-6.1 guest.

Comment 10 Laszlo Ersek 2011-06-22 18:07:44 UTC

I downgraded the guest kernel to -117 as the first step of a bisection, and it still does not work.

Comment 11 Laszlo Ersek 2011-06-22 20:52:28 UTC

(In reply to comment #10)
> I downgraded the guest kernel to -117 as the first step of a bisection, and it
> still does not work.

... meaning: I'm not sure how it *ever* worked. I mentioned -115 above based on the git commit log, but that build was probably not even released in-house.

-117 is available in Brew. The maintainer designated -117 as the first build having this patch too (see bug 622575 comment 5).

I'm not sure how to interpret bug 622575 comment 7 and bug 622575 comment 8 -- Hushan, was the patch for bug 622575 tested with migration?

(I tried to add some printfs to see if the kernel gets to the place where it would send the ARP req; unfortunately my build crashed sometime before that with a NULL pointer dereference. Probably a broken build / initrd.)

Comment 12 Laszlo Ersek 2011-06-23 15:00:00 UTC

I have compared how the RHEL-5 code sends out this unsolicited ARP notification versus how the RHEL-6 code attempts to do it. So:

RHEL-5 (and linux-2.6.18-xen.hg): see the send_fake_arp() function in "drivers/xen/netfront/netfront.c":

	dst_ip = INADDR_BROADCAST;
	src_ip = inet_select_addr(dev, dst_ip, RT_SCOPE_LINK);

[...]

	skb = arp_create(ARPOP_REPLY, ETH_P_ARP,
			 dst_ip, dev, src_ip,
			 /*dst_hw*/ NULL, /*src_hw*/ NULL,
			 /*target_hw*/ dev->dev_addr);

That is:
- ARP *reply*
- destination IP: broadcast address,
- source IP address: the one assigned to the netfront device within this subnet,
- dst_hw is set to NULL and then overridden in arp_create(), but it's not
  really important here,
- src_hw (source MAC) is overridden in arp_create(), it is set to netfront's
  MAC (dev->dev_addr)
- target_hw is set to dev->dev_addr as well

This is exactly what was posted originally for upstream (http://kerneltrap.org/mailarchive/linux-netdev/2010/5/12/6277063, see also bug 622575 comment 0):

+				arp_send(ARPOP_REPLY, ETH_P_ARP,
+					 INADDR_BROADCAST, dev,
+					 ifa->ifa_address,
+					 NULL, NULL, dev->dev_addr);

However they rather went with what we have now in RHEL-6 (http://kerneltrap.org/mailarchive/linux-netdev/2010/5/24/6277888, "net/ipv4/devinet.c"):

				arp_send(ARPOP_REQUEST, ETH_P_ARP,
					 ifa->ifa_address, dev,
					 ifa->ifa_address, NULL,
					 dev->dev_addr, NULL);

- ARP request -- changed
- destination IP: netfront's own IP -- changed,
- source IP: netfront's own IP (no change),
- dst_hw: no change,
- src_hw: set directly to dev->dev_addr (no change),
- target_hw: filled with \0 in arp_create() -- changed

Wikipedia (what else) has this to say (http://en.wikipedia.org/wiki/Address_Resolution_Protocol#ARP_announcements):

"Such an announcement, also called a gratuitous ARP message, is usually broadcast as an ARP request containing the sender's protocol address (SPA) in the target field (TPA=SPA), with the target hardware address (THA) set to zero. An alternative is to broadcast an ARP reply with the sender's hardware and protocol addresses (SHA and SPA) duplicated in the target fields (TPA=SPA, THA=SHA)."

The RHEL-6 way exactly satisfies the first sentence.

The RHEL-5 way satisfies THA=SHA in the second sentence, but does not satisfy TPA=SPA.

Interestingly enough, the RHEL-5 way works, while the RHEL-6 way does not. Referring back to comment 9, the RHEL-6 code did not put out any packet on the wire. The "ether src NETFRONT-MAC" part would have caught it.

... Alright, the RHEL-6 code uses wrong addresses. See this commit:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=6c91afe1

Comment 16 Laszlo Ersek 2011-06-23 16:57:31 UTC

Backporting upstream c/s 6c91afe1 did not work. I'll try to imitate the RHEL-5 way in RHEL-6.

Comment 17 Laszlo Ersek 2011-06-23 19:38:39 UTC

Sending an ARP reply doesn't work either, even though I've verified that the packet gets to dev_queue_xmit().

Comment 18 Laszlo Ersek 2011-06-24 09:52:21 UTC

Last night I tried to run tcpdump inside the guests as well, on the netfront interface.

**** The RHEL-6.1 guest sends out the following two packets (two migrations):

# tcpdump -e -l -X -vv -n -i eth0 arp \
      and \( ether src 00:16:36:48:cb:02 or ether dst 00:16:36:48:cb:02 \)

23:54:16.790621 00:16:36:48:cb:02 > Broadcast, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 10.65.209.63 tell 10.65.209.63, length 28
        0x0000:  0001 0800 0604 0001 0016 3648 cb02 0a41  ..........6H...A
        0x0010:  d13f 0000 0000 0000 0a41 d13f            .?.......A.?

23:54:32.995448 00:16:36:48:cb:02 > Broadcast, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 10.65.209.63 tell 10.65.209.63, length 28
        0x0000:  0001 0800 0604 0001 0016 3648 cb02 0a41  ..........6H...A
        0x0010:  d13f 0000 0000 0000 0a41 d13f            .?.......A.?

**** RHEL-5.6 guest:

# tcpdump -e -l -X -vv -n -i eth0 arp \
      and \( ether src 00:16:36:06:ED:0C or ether dst 00:16:36:06:ED:0C \)

15:48:47.253601 00:16:36:06:ed:0c > Broadcast, ethertype ARP (0x0806), length 42: arp reply 10.65.210.219 is-at 00:16:36:06:ed:0c
        0x0000:  0001 0800 0604 0002 0016 3606 ed0c 0a41  ..........6....A
        0x0010:  d2db 0016 3606 ed0c ffff ffff            ....6.......

15:49:07.283418 00:16:36:06:ed:0c > Broadcast, ethertype ARP (0x0806), length 42: arp reply 10.65.210.219 is-at 00:16:36:06:ed:0c
        0x0000:  0001 0800 0604 0002 0016 3606 ed0c 0a41  ..........6....A
        0x0010:  d2db 0016 3606 ed0c ffff ffff            ....6.......

15:49:07.283418 00:16:36:06:ed:0c > Broadcast, ethertype ARP (0x0806), length 42: arp reply 10.65.210.219 is-at 00:16:36:06:ed:0c
        0x0000:  0001 0800 0604 0002 0016 3606 ed0c 0a41  ..........6....A
        0x0010:  d2db 0016 3606 ed0c ffff ffff            ....6.......

**** Offload settings of the RHEL-5.6 netfront interface:

# ethtool -k eth0
Offload parameters for eth0:
Cannot get device rx csum settings: Operation not supported
Cannot get device udp large send offload settings: Operation not supported
rx-checksumming: off
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: on
udp fragmentation offload: off
generic segmentation offload: off
generic-receive-offload: off

**** Then I applied Ian Campbell's original suggestion (http://kerneltrap.org/mailarchive/linux-netdev/2010/5/12/6277063) to RHEL-6.1. These are the resultant outgoing packets, captured on the netfront interface:

# tcpdump -e -l -X -vv -n -i eth0 arp \
      and \( ether src 00:16:36:48:cb:02 or ether dst 00:16:36:48:cb:02 \)

16:08:40.936213 00:16:36:48:cb:02 > Broadcast, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Reply 10.65.209.63 is-at 00:16:36:48:cb:02, length 28
        0x0000:  0001 0800 0604 0002 0016 3648 cb02 0a41  ..........6H...A
        0x0010:  d13f 0016 3648 cb02 ffff ffff            .?..6H......
16:09:02.500510 00:16:36:48:cb:02 > Broadcast, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Reply 10.65.209.63 is-at 00:16:36:48:cb:02, length 28
        0x0000:  0001 0800 0604 0002 0016 3648 cb02 0a41  ..........6H...A
        0x0010:  d13f 0016 3648 cb02 ffff ffff            .?..6H......

Except for the MAC and the IP address, these match (in structure) the RHEL-5.6 packets precisely.

**** Offload settings of the RHEL-6.1 netfront interface (queried with the above patch applied)

# ethtool -k eth0
Offload parameters for eth0:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: on
udp-fragmentation-offload: off
generic-segmentation-offload: off
generic-receive-offload: off
large-receive-offload: off

Furthermore, I traced dev_queue_xmit() a bit more, and the packet seems to enter __dev_xmit_skb():

	if (q->enqueue) {
		rc = __dev_xmit_skb(skb, q, dev, txq);
		goto out;
	}

**** Summary

Now I think, based on __dev_xmit_skb() to lesser extent, and on the guest tcpdump to greater extent, that the problem could be located more towards the guest-host interface. Netfront/netback communication problem, or perhaps timing.

The timing idea goes like this: when netback/netfront are connected, the guest sends out the packet. However at this point the netback vif might not yet have been added to the bridge in the host. I'll try to investigate this. With the host tcpdump I've only listened on br0; vif6.0 is not present as long as the guest is migrated away.

Comment 19 Laszlo Ersek 2011-06-24 12:28:38 UTC

Created attachment 509743 [details]
[debugging patch] try to queue 10 GARPs, with half a second between them

Confirming timing / packet loss problem. The attached patch fixes the outage by sending 10 GARPs and sleeping half a second between them. The br0 host tcpdump sees 9 out of 10.

The cause of losing the first GARP might be one of (at least) the following to:
- The netback vif is added too late to the bridge. (Timing.)
- The first packet is lost in all cases. (Packet loss.) We have seen similar
  problems, see eg. bug 640690 comment 15, last paragraph.

I'll figure out which one it is by removing the sleep from the loop and seeing how many packets get to the host bridge.

Comment 20 Laszlo Ersek 2011-06-24 13:32:14 UTC

Created attachment 509760 [details]
[debugging patch] sysadmin-controllable resend params

(In reply to comment #19)

> The cause of losing the first GARP might be one of (at least) the following
> to:
> - The netback vif is added too late to the bridge. (Timing.)
> - The first packet is lost in all cases. (Packet loss.) We have seen similar
>   problems, see eg. bug 640690 comment 15, last paragraph.
> 
> I'll figure out which one it is by removing the sleep from the loop and seeing
> how many packets get to the host bridge.

Timing problem. I passed "xen_netfront.garp_resend=3 xen_netfront.garp_resend_sleep=0" to the guest kernel, and ping-pong migrated the guest a few times. The number of gARPs seen on the host bridge varied from zero to three. When it was zero, I experienced the network outage.

With "xen_netfront.garp_resend=1 xen_netfront.garp_resend_sleep=1000" (meaning: a single gARP is re-queued after the initial gARP, at a distance of 1 second),
- there was no outage during 5 full ping-pongs (10 migrations in total),
- a single gARP was seen on the host bridge each time,
- pinging the guest from my laptop during the test:

--- 10.65.209.63 ping statistics ---
258 packets transmitted, 248 received, 3% packet loss, time 257319ms
rtt min/avg/max/mdev = 222.438/234.464/415.224/20.094 ms

(After each migration completed, a single ICMP echo-request went unanswered, while the guest was in msleep(1000) before it re-queued the gARP. This "downtime" can be mitigated by lowering garp_resend_sleep.)

I'll need some opinions what to keep tuneable and in what range. Since Paolo fixed the original bug... :)

Comment 23 Paolo Bonzini 2011-06-27 11:21:55 UTC

I'd say upstream (xen-devel with CC to netdev.org) is the best place to discuss this.

Alternatively, could it be fixed in host userspace?

Comment 24 Laszlo Ersek 2011-06-27 11:56:16 UTC

(In reply to comment #23)
> I'd say upstream (xen-devel with CC to netdev.org) is the best
> place to discuss this.

I agree; I'm waiting for the customer's test results -- I don't want to bother upstream without "facts".

> Alternatively, could it be fixed in host userspace?

I'm not sure you can add the netback vif to the bridge reliably before the netback/netfront handshake completes. (Is there a netback vif before the handshake at all?)

Comment 27 Paolo Bonzini 2011-06-27 13:22:51 UTC

> > I'd say upstream (xen-devel with CC to netdev.org) is the best
> > place to discuss this.
> 
> I agree; I'm waiting for the customer's test results -- I don't want to bother
> upstream without "facts".

Heh, your bugzilla comments have way more facts than are needed to start a discussion upstream.

> > Alternatively, could it be fixed in host userspace?
> 
> I'm not sure you can add the netback vif to the bridge reliably before the
> netback/netfront handshake completes. (Is there a netback vif before the
> handshake at all?)

I was too terse.  I meant fixing the handshake.  But actually there's already such a hook point in the host, so a guest-only patch is possible too.

The hotplug script is writing "connected" to hotplug-status in the backend tree.  The guest can watch that (setting up the watch before moving out of Initialising state), and use that to trigger the ARP.  Perhaps that's a cleaner choice?  (Aside: Xenstore watches fire even when the same value is rewritten).

Comment 28 Laszlo Ersek 2011-06-28 12:12:11 UTC

F15 has the same problem, after setting all arp_notify sysctls to 1.

Comment 29 Laszlo Ersek 2011-06-28 14:06:54 UTC

upstream discussion:
http://lists.xensource.com/archives/html/xen-devel/2011-06/msg01963.html

Comment 30 Laszlo Ersek 2011-06-28 14:13:45 UTC

We may have to backport

http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commitdiff;h=43223efd9bfd

to RHEL-5 netback, and move the netif_notify_peers() down to XenbusStateConnected in the RHEL-6 guest.

Comment 31 Andrew Jones 2011-06-29 08:05:47 UTC

Whatever the final clean solution will be for this, we need to also consider addressing the case where we have 6.2 running on older RHEL5 hosts, that may not have updated netbacks. Thus, we may need a hackier, guest-side only fix as well, or instead of, a clean solution.

Comment 34 Laszlo Ersek 2011-07-04 12:20:32 UTC

Further analysis revealed the following RHEL-5.3 changeset, for bug 458934:

http://git.engineering.redhat.com/?p=users/jwilson/rhel5/kernel;a=commitdiff;h=14bee682

commit 14bee682937a942146caa7414a381160e69bcffb
Author: Herbert Xu <herbert.xu>
Date:   Thu Aug 14 20:22:21 2008 +1000

    [xen] xennet: coordinate ARP with backend network status

    This patch delays the sending of the gratuitous ARP packet in
    the Xen frontend network driver until the backend signals that
    its carrier status has been processed by the stack.
    
    The frontend part of the patch is not applicable in upstream
    Linux as there is no gratuitous ARP packet support there at all.
    While the backend upstream is dead as far as I know.  So this
    is RHEL5-specific.
    
    Both patches are needed to ensure the successful transmission
    of the gratuitous ARP packet which is necessary for Xen live
    migrations.

In the backend: the patch registers a notifier callback for the "carrier up" event, and only switches the backend's state to XenbusStateConnected when that handler is called.

In the frontend: the send_fake_arp() call is moved from the "backend moved to XenbusStateInitWait" case to the "backend moved to XenbusStateConnected" case.

----------

Pvops dom0 implemented the backend's move to XenbusStateConnected completely differently (see comment 30).

The frontend part is simply missing from RHEL-6 (and upstream too, and they about knows it -- see the last paragraph of http://lists.xensource.com/archives/html/xen-devel/2011-06/msg01969.html).

Current working idea: we *only* have to relocated the netif_notify_peers() call from the XenbusStateInitWait handler to the XenbusStateConnected handler. This change should be upstream-able (should not harm there, see the xen-devel link above), and should fix the problem for us, because the RHEL-5.6 host would (indirectly) delay the RHEL-6 guest's gARP, just as it causes the RHEL-5 guest to wait.

Comment 35 Laszlo Ersek 2011-07-04 12:24:36 UTC

A summary of comment 34:

The pvops and the RHEL-5 dom0 kernels flip netback to XenbusStateConnected differently, but they agree in that the switch is "sufficiently" delayed.

The RHEL-5 guest co-operates with the RHEL-5 dom0.
The Fedora/upstream guest (probably) co-operates with the pvops dom0.

Moving the gARP to the XenbusStateConnected handler in the RHEL-6 / Fedora / upstream guest, the guest should continue working with the pvops dom0, and should *start* working with RHEL-5 dom0.

I'll test this.

Comment 36 Laszlo Ersek 2011-07-04 18:01:19 UTC

(In reply to comment #35)

> I'll test this.

It didn't work.

If we still want to backport upstream's netback solution to RHEL-5 (see comment 30), then we'll first have to revert commit 14bee682, posted for bug 458934 (comment 34), at least partially. The two seem to conflict in when to move netback to the Connected state.

Comment 38 Laszlo Ersek 2011-07-08 13:13:10 UTC

(In reply to comment #36)
> (In reply to comment #35)
> 
> > I'll test this.
> 
> It didn't work.

Meaning, moving just the netif_notify_peers() call to XenbusStateConnected in backend_changed() was not sufficient. (Hosts: 2.6.18-238, guest: 2.6.32-165, plus patch.)

Comment 39 Laszlo Ersek 2011-07-08 17:59:08 UTC

Created attachment 511972 [details]
netback: partially revert the host side of 14bee682

This patch reverts netback's notifier-based switch to XenbusStateConnected, and fixes up the reversion a bit. The state after this reversion is that netback moves as quickly to XenbusStateConnected as possible. This is only intended as preparation for the following patch, which backports upstream's switch to XenbusStateConnected in netback.

Comment 40 Laszlo Ersek 2011-07-08 18:01:19 UTC

Created attachment 511973 [details]
netback: wait for hotplug scripts to complete before signalling Connected to frontend

This is a backport of 43223efd from Jeremy's pvops tree. Plus I added two printk()'s for debugging.

Comment 41 Laszlo Ersek 2011-07-08 18:03:22 UTC

Created attachment 511974 [details]
xen-netfront: send gARP after backend moved to XenbusStateConnected

This 6.1 patch moves the gARP queueing to XenbusStateConnected, adding an informational printk() too.

Comment 42 Laszlo Ersek 2011-07-08 18:12:28 UTC

I uploaded the last three patches as proof that the strategy outlined in comment 32 and comment 33 doesn't work. The host side patches make sure that netback only advances to Connected state once the hotplug scripts have completed. The guest side patch ensures that the gARP is not queued until the host moved to Connected. Nonetheless, the network outage remains reproducible:

Hosts: 2.6.18-273.el5.missing_arp_bz713585_hostnotif.local.xen
Guest: 2.6.32-165.missing_arp_bz713585_connected

The times are synchronized between the two hosts and the guest (common NTP source).

When the guest is first started:

**** host1 log:

Jul  8 13:30:19 hp-z800-02 kernel: tap tap-1-51712: 2 getting info
Jul  8 13:30:19 hp-z800-02 kernel: device vif1.0 entered promiscuous mode
Jul  8 13:30:19 hp-z800-02 kernel: ADDRCONF(NETDEV_UP): vif1.0: link is not
                                   ready
Jul  8 13:30:20 hp-z800-02 kernel: blktap: ring-ref 8, event-channel 25,
                                   protocol 1 (x86_64-abi)
Jul  8 13:30:22 hp-z800-02 kernel: ADDRCONF(NETDEV_CHANGE): vif1.0: link
                                   becomes ready
Jul  8 13:30:22 hp-z800-02 kernel: br0: port 3(vif1.0) entering forwarding
                                   state
Jul  8 13:30:22 hp-z800-02 kernel: netback: Connected: from within watch

**** guest log:

Jul  8 13:30:33 dhcp47-212 kernel: Initialising Xen virtual ethernet driver.
Jul  8 13:30:33 dhcp47-212 kernel: xennet: queueing gARP


After first live migration to host2:

**** host2 log:

Jul  8 13:41:13 hp-z600-02 kernel: tap tap-1-51712: 2 getting info
Jul  8 13:41:13 hp-z600-02 kernel: device vif1.0 entered promiscuous mode
Jul  8 13:41:13 hp-z600-02 kernel: ADDRCONF(NETDEV_UP): vif1.0: link is not
                                   ready
Jul  8 13:41:17 hp-z600-02 kernel: blktap: ring-ref 8, event-channel 24,
                                   protocol 1 (x86_64-abi)
Jul  8 13:41:17 hp-z600-02 kernel: ADDRCONF(NETDEV_CHANGE): vif1.0: link
                                   becomes ready
Jul  8 13:41:17 hp-z600-02 kernel: br0: port 3(vif1.0) entering forwarding
                                   state
Jul  8 13:41:17 hp-z600-02 kernel: netback: Connected: from within watch

**** guest log:

Jul  8 13:41:17 dhcp47-212 kernel: xennet: queueing gARP


I have live-ping-ponged the guest multiple times, with identical results: the printk()'s testify about correct ordering, but the gARP is still lost. At this point I'm back to comment 20.

Comment 43 Laszlo Ersek 2011-07-11 08:38:52 UTC

The lost gARP seems to be a recurring problem. See bug 453526 and commit 24750f7.

Comment 44 Laszlo Ersek 2011-07-11 10:01:48 UTC

(In reply to comment #42)
> I uploaded the last three patches as proof that the strategy outlined in
> comment 32 and comment 33 doesn't work. The host side patches make sure that
> netback only advances to Connected state once the hotplug scripts have
> completed. The guest side patch ensures that the gARP is not queued until the
> host moved to Connected. Nonetheless, the network outage remains reproducible:
> 
> Hosts: 2.6.18-273.el5.missing_arp_bz713585_hostnotif.local.xen
> Guest: 2.6.32-165.missing_arp_bz713585_connected

I was an idiot, and forgot to set the arp_notify sysctl in the guest. After doing that, the network outage does go away under the above patches.

Covered cases this far:
- patching only the RHEL-6.1+ guest (2.6.32-165) does not work:
  - comment 34, comment 35, comment 36, comment 38
- patching only the host doesn't suffice either:
  - RHEL-6.1 (2.6.32-131.0.15.el6) still experiences the outage
- patching both sides (see quoted versions above): works

Further steps:
- check (unpatched) Fedora guest under patched host,
- check RHEL-5 guest under patched host.

Comment 45 Laszlo Ersek 2011-07-11 13:19:37 UTC

On the patched host(s):
- RHEL-4.9 keeps working (tested with multiple ping-pongs),
- RHEL-5.6 keeps working (tested with multiple ping-pongs),
- Fedora-15, with arp_notify set, produces results identical to those of the unpatched RHEL-6 guest -- network outage after migration. The guest patch will have to be sent to upstream for consideration.

Comment 46 Laszlo Ersek 2011-07-12 08:03:07 UTC

(In reply to comment #45)
> The guest patch will have to be sent to upstream for consideration.

http://lists.xensource.com/archives/html/xen-devel/2011-07/msg00327.html

Comment 47 Aristeu Rozanski 2011-07-18 15:28:20 UTC

Patch(es) available on kernel-2.6.32-169.el6

Comment 50 Qin Guan 2011-07-26 10:07:28 UTC

Hi Laszlo,

I did some test on:
 host: 2.6.18-274.el5xen
 guest: 2.6.32-171.el6

Result as below:
1. The network outage happen without set arp_notify sysctl to 1 on the guest.
2. After set all arp_notify sysctl to 1, there is no packet loss during multiple live ping pong migration (about 10 times).

Should we have technical note to address this (set arp_notify to 1)?

Regards!

Comment 51 Laszlo Ersek 2011-07-26 17:07:52 UTC

(In reply to comment #50)

> Should we have technical note to address this (set arp_notify to 1)?

I guess... I'm adding a proposal.

Comment 52 Laszlo Ersek 2011-07-26 17:07:52 UTC

    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
In order to ensure uninterrupted network connectivity after live migration, the "net.ipv4.conf.INTERFACE.arp_notify" sysctl should be set to 1 in a Red Hat Enterprise Linux 6.1 Xen guest using the paravirtualized (xen-netfront) network driver.

Comment 53 Jinxin Zheng 2011-09-05 06:46:58 UTC

I can reproduce this bug on kernel-2.6.32-131.0.15.el6.

I did the migration between two RHEL5.7 (kernel-xen-2.6.18-284.el5xen) hosts. During and after the migration I cannot ping the guest from outside.

Then I updated the guest kernel to 2.6.32-194.el6, and did the test again. During the migration, the guest stopped responding for a little while. About 5 ping packets were lost. But after that the guest is live again. The ping continued to get response from the guest.

I'm moving this to VERIFIED.

Comment 54 errata-xmlrpc 2011-12-06 13:41:31 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2011-1530.html