Bug 713585
Summary: | RHEL 6.1 Xen paravirt guest is getting network outage during live migration | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | asilva <asilva> | |
Component: | kernel | Assignee: | Laszlo Ersek <lersek> | |
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | |
Severity: | medium | Docs Contact: | ||
Priority: | medium | |||
Version: | 6.1 | CC: | drjones, jmunilla, jzheng, leiwang, lersek, pbonzini, pcao, qguan, qwan, sputhenp, tburke, xen-maint, yuzhou | |
Target Milestone: | alpha | Keywords: | ZStream | |
Target Release: | 6.2 | |||
Hardware: | All | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | kernel-2.6.32-169.el6 | Doc Type: | Bug Fix | |
Doc Text: |
In order to ensure uninterrupted network connectivity after live migration, the "net.ipv4.conf.INTERFACE.arp_notify" sysctl should be set to 1 in a Red Hat Enterprise Linux 6.1 Xen guest using the paravirtualized (xen-netfront) network driver.
|
Story Points: | --- | |
Clone Of: | ||||
: | 761591 (view as bug list) | Environment: | ||
Last Closed: | 2011-12-06 13:41:31 UTC | Type: | --- | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 720347 | |||
Bug Blocks: | ||||
Attachments: |
Comment 4
RHEL Program Management
2011-06-21 05:34:16 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. I can confirm the gratuitous ARP thing is not working under RHEL-6.1 (using the environment described in comment 0), even though all the arp_notify sysctls are set to 1 in the guest. When ping-pong migrating the "rhel5-mig" (RHEL-5.6) guest, I was running this command on both mig-src and mig-dst hosts: # tcpdump -l -vv -n -i br0 arp and ether src 00:16:36:06:ED:0C The MAC being the RHEL-5.6 guest's MAC. When the migration finally switches over, the following is printed on *both* hosts: 09:49:03.665698 arp reply 10.65.210.219 is-at 00:16:36:06:ed:0c 09:49:03.665739 arp reply 10.65.210.219 is-at 00:16:36:06:ed:0c And the ping from my laptop to the guest keeps getting responses. However, when the RHEL-6.1 guest is migrated: # tcpdump -l -vv -n -i br0 arp and ether src 00:16:36:48:cb:02 There are no such packets captured on either host. Ping from my laptop to the guest stops working while the guest is on host "B". As soon as the guest is migrated back, ping is immediately resurrected. So we can safely say that the RHEL-6.1 guest does not send out the gratuitous ARP packet after it resumes on the new host, and the switch on the local ethernet segment keeps remembering the old port. I'll have to debug the RHEL-6.1 guest. I downgraded the guest kernel to -117 as the first step of a bisection, and it still does not work. (In reply to comment #10) > I downgraded the guest kernel to -117 as the first step of a bisection, and it > still does not work. ... meaning: I'm not sure how it *ever* worked. I mentioned -115 above based on the git commit log, but that build was probably not even released in-house. -117 is available in Brew. The maintainer designated -117 as the first build having this patch too (see bug 622575 comment 5). I'm not sure how to interpret bug 622575 comment 7 and bug 622575 comment 8 -- Hushan, was the patch for bug 622575 tested with migration? (I tried to add some printfs to see if the kernel gets to the place where it would send the ARP req; unfortunately my build crashed sometime before that with a NULL pointer dereference. Probably a broken build / initrd.) I have compared how the RHEL-5 code sends out this unsolicited ARP notification versus how the RHEL-6 code attempts to do it. So: RHEL-5 (and linux-2.6.18-xen.hg): see the send_fake_arp() function in "drivers/xen/netfront/netfront.c": dst_ip = INADDR_BROADCAST; src_ip = inet_select_addr(dev, dst_ip, RT_SCOPE_LINK); [...] skb = arp_create(ARPOP_REPLY, ETH_P_ARP, dst_ip, dev, src_ip, /*dst_hw*/ NULL, /*src_hw*/ NULL, /*target_hw*/ dev->dev_addr); That is: - ARP *reply* - destination IP: broadcast address, - source IP address: the one assigned to the netfront device within this subnet, - dst_hw is set to NULL and then overridden in arp_create(), but it's not really important here, - src_hw (source MAC) is overridden in arp_create(), it is set to netfront's MAC (dev->dev_addr) - target_hw is set to dev->dev_addr as well This is exactly what was posted originally for upstream (http://kerneltrap.org/mailarchive/linux-netdev/2010/5/12/6277063, see also bug 622575 comment 0): + arp_send(ARPOP_REPLY, ETH_P_ARP, + INADDR_BROADCAST, dev, + ifa->ifa_address, + NULL, NULL, dev->dev_addr); However they rather went with what we have now in RHEL-6 (http://kerneltrap.org/mailarchive/linux-netdev/2010/5/24/6277888, "net/ipv4/devinet.c"): arp_send(ARPOP_REQUEST, ETH_P_ARP, ifa->ifa_address, dev, ifa->ifa_address, NULL, dev->dev_addr, NULL); - ARP request -- changed - destination IP: netfront's own IP -- changed, - source IP: netfront's own IP (no change), - dst_hw: no change, - src_hw: set directly to dev->dev_addr (no change), - target_hw: filled with \0 in arp_create() -- changed Wikipedia (what else) has this to say (http://en.wikipedia.org/wiki/Address_Resolution_Protocol#ARP_announcements): "Such an announcement, also called a gratuitous ARP message, is usually broadcast as an ARP request containing the sender's protocol address (SPA) in the target field (TPA=SPA), with the target hardware address (THA) set to zero. An alternative is to broadcast an ARP reply with the sender's hardware and protocol addresses (SHA and SPA) duplicated in the target fields (TPA=SPA, THA=SHA)." The RHEL-6 way exactly satisfies the first sentence. The RHEL-5 way satisfies THA=SHA in the second sentence, but does not satisfy TPA=SPA. Interestingly enough, the RHEL-5 way works, while the RHEL-6 way does not. Referring back to comment 9, the RHEL-6 code did not put out any packet on the wire. The "ether src NETFRONT-MAC" part would have caught it. ... Alright, the RHEL-6 code uses wrong addresses. See this commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=6c91afe1 Backporting upstream c/s 6c91afe1 did not work. I'll try to imitate the RHEL-5 way in RHEL-6. Sending an ARP reply doesn't work either, even though I've verified that the packet gets to dev_queue_xmit(). Last night I tried to run tcpdump inside the guests as well, on the netfront interface. **** The RHEL-6.1 guest sends out the following two packets (two migrations): # tcpdump -e -l -X -vv -n -i eth0 arp \ and \( ether src 00:16:36:48:cb:02 or ether dst 00:16:36:48:cb:02 \) 23:54:16.790621 00:16:36:48:cb:02 > Broadcast, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 10.65.209.63 tell 10.65.209.63, length 28 0x0000: 0001 0800 0604 0001 0016 3648 cb02 0a41 ..........6H...A 0x0010: d13f 0000 0000 0000 0a41 d13f .?.......A.? 23:54:32.995448 00:16:36:48:cb:02 > Broadcast, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 10.65.209.63 tell 10.65.209.63, length 28 0x0000: 0001 0800 0604 0001 0016 3648 cb02 0a41 ..........6H...A 0x0010: d13f 0000 0000 0000 0a41 d13f .?.......A.? **** RHEL-5.6 guest: # tcpdump -e -l -X -vv -n -i eth0 arp \ and \( ether src 00:16:36:06:ED:0C or ether dst 00:16:36:06:ED:0C \) 15:48:47.253601 00:16:36:06:ed:0c > Broadcast, ethertype ARP (0x0806), length 42: arp reply 10.65.210.219 is-at 00:16:36:06:ed:0c 0x0000: 0001 0800 0604 0002 0016 3606 ed0c 0a41 ..........6....A 0x0010: d2db 0016 3606 ed0c ffff ffff ....6....... 15:49:07.283418 00:16:36:06:ed:0c > Broadcast, ethertype ARP (0x0806), length 42: arp reply 10.65.210.219 is-at 00:16:36:06:ed:0c 0x0000: 0001 0800 0604 0002 0016 3606 ed0c 0a41 ..........6....A 0x0010: d2db 0016 3606 ed0c ffff ffff ....6....... 15:49:07.283418 00:16:36:06:ed:0c > Broadcast, ethertype ARP (0x0806), length 42: arp reply 10.65.210.219 is-at 00:16:36:06:ed:0c 0x0000: 0001 0800 0604 0002 0016 3606 ed0c 0a41 ..........6....A 0x0010: d2db 0016 3606 ed0c ffff ffff ....6....... **** Offload settings of the RHEL-5.6 netfront interface: # ethtool -k eth0 Offload parameters for eth0: Cannot get device rx csum settings: Operation not supported Cannot get device udp large send offload settings: Operation not supported rx-checksumming: off tx-checksumming: on scatter-gather: on tcp segmentation offload: on udp fragmentation offload: off generic segmentation offload: off generic-receive-offload: off **** Then I applied Ian Campbell's original suggestion (http://kerneltrap.org/mailarchive/linux-netdev/2010/5/12/6277063) to RHEL-6.1. These are the resultant outgoing packets, captured on the netfront interface: # tcpdump -e -l -X -vv -n -i eth0 arp \ and \( ether src 00:16:36:48:cb:02 or ether dst 00:16:36:48:cb:02 \) 16:08:40.936213 00:16:36:48:cb:02 > Broadcast, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Reply 10.65.209.63 is-at 00:16:36:48:cb:02, length 28 0x0000: 0001 0800 0604 0002 0016 3648 cb02 0a41 ..........6H...A 0x0010: d13f 0016 3648 cb02 ffff ffff .?..6H...... 16:09:02.500510 00:16:36:48:cb:02 > Broadcast, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Reply 10.65.209.63 is-at 00:16:36:48:cb:02, length 28 0x0000: 0001 0800 0604 0002 0016 3648 cb02 0a41 ..........6H...A 0x0010: d13f 0016 3648 cb02 ffff ffff .?..6H...... Except for the MAC and the IP address, these match (in structure) the RHEL-5.6 packets precisely. **** Offload settings of the RHEL-6.1 netfront interface (queried with the above patch applied) # ethtool -k eth0 Offload parameters for eth0: rx-checksumming: on tx-checksumming: on scatter-gather: on tcp-segmentation-offload: on udp-fragmentation-offload: off generic-segmentation-offload: off generic-receive-offload: off large-receive-offload: off Furthermore, I traced dev_queue_xmit() a bit more, and the packet seems to enter __dev_xmit_skb(): if (q->enqueue) { rc = __dev_xmit_skb(skb, q, dev, txq); goto out; } **** Summary Now I think, based on __dev_xmit_skb() to lesser extent, and on the guest tcpdump to greater extent, that the problem could be located more towards the guest-host interface. Netfront/netback communication problem, or perhaps timing. The timing idea goes like this: when netback/netfront are connected, the guest sends out the packet. However at this point the netback vif might not yet have been added to the bridge in the host. I'll try to investigate this. With the host tcpdump I've only listened on br0; vif6.0 is not present as long as the guest is migrated away. Created attachment 509743 [details] [debugging patch] try to queue 10 GARPs, with half a second between them Confirming timing / packet loss problem. The attached patch fixes the outage by sending 10 GARPs and sleeping half a second between them. The br0 host tcpdump sees 9 out of 10. The cause of losing the first GARP might be one of (at least) the following to: - The netback vif is added too late to the bridge. (Timing.) - The first packet is lost in all cases. (Packet loss.) We have seen similar problems, see eg. bug 640690 comment 15, last paragraph. I'll figure out which one it is by removing the sleep from the loop and seeing how many packets get to the host bridge. Created attachment 509760 [details] [debugging patch] sysadmin-controllable resend params (In reply to comment #19) > The cause of losing the first GARP might be one of (at least) the following > to: > - The netback vif is added too late to the bridge. (Timing.) > - The first packet is lost in all cases. (Packet loss.) We have seen similar > problems, see eg. bug 640690 comment 15, last paragraph. > > I'll figure out which one it is by removing the sleep from the loop and seeing > how many packets get to the host bridge. Timing problem. I passed "xen_netfront.garp_resend=3 xen_netfront.garp_resend_sleep=0" to the guest kernel, and ping-pong migrated the guest a few times. The number of gARPs seen on the host bridge varied from zero to three. When it was zero, I experienced the network outage. With "xen_netfront.garp_resend=1 xen_netfront.garp_resend_sleep=1000" (meaning: a single gARP is re-queued after the initial gARP, at a distance of 1 second), - there was no outage during 5 full ping-pongs (10 migrations in total), - a single gARP was seen on the host bridge each time, - pinging the guest from my laptop during the test: --- 10.65.209.63 ping statistics --- 258 packets transmitted, 248 received, 3% packet loss, time 257319ms rtt min/avg/max/mdev = 222.438/234.464/415.224/20.094 ms (After each migration completed, a single ICMP echo-request went unanswered, while the guest was in msleep(1000) before it re-queued the gARP. This "downtime" can be mitigated by lowering garp_resend_sleep.) I'll need some opinions what to keep tuneable and in what range. Since Paolo fixed the original bug... :) I'd say upstream (xen-devel with CC to netdev.org) is the best place to discuss this. Alternatively, could it be fixed in host userspace? (In reply to comment #23) > I'd say upstream (xen-devel with CC to netdev.org) is the best > place to discuss this. I agree; I'm waiting for the customer's test results -- I don't want to bother upstream without "facts". > Alternatively, could it be fixed in host userspace? I'm not sure you can add the netback vif to the bridge reliably before the netback/netfront handshake completes. (Is there a netback vif before the handshake at all?) > > I'd say upstream (xen-devel with CC to netdev.org) is the best > > place to discuss this. > > I agree; I'm waiting for the customer's test results -- I don't want to bother > upstream without "facts". Heh, your bugzilla comments have way more facts than are needed to start a discussion upstream. > > Alternatively, could it be fixed in host userspace? > > I'm not sure you can add the netback vif to the bridge reliably before the > netback/netfront handshake completes. (Is there a netback vif before the > handshake at all?) I was too terse. I meant fixing the handshake. But actually there's already such a hook point in the host, so a guest-only patch is possible too. The hotplug script is writing "connected" to hotplug-status in the backend tree. The guest can watch that (setting up the watch before moving out of Initialising state), and use that to trigger the ARP. Perhaps that's a cleaner choice? (Aside: Xenstore watches fire even when the same value is rewritten). F15 has the same problem, after setting all arp_notify sysctls to 1. upstream discussion: http://lists.xensource.com/archives/html/xen-devel/2011-06/msg01963.html We may have to backport http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commitdiff;h=43223efd9bfd to RHEL-5 netback, and move the netif_notify_peers() down to XenbusStateConnected in the RHEL-6 guest. Whatever the final clean solution will be for this, we need to also consider addressing the case where we have 6.2 running on older RHEL5 hosts, that may not have updated netbacks. Thus, we may need a hackier, guest-side only fix as well, or instead of, a clean solution. Further analysis revealed the following RHEL-5.3 changeset, for bug 458934: http://git.engineering.redhat.com/?p=users/jwilson/rhel5/kernel;a=commitdiff;h=14bee682 commit 14bee682937a942146caa7414a381160e69bcffb Author: Herbert Xu <herbert.xu> Date: Thu Aug 14 20:22:21 2008 +1000 [xen] xennet: coordinate ARP with backend network status This patch delays the sending of the gratuitous ARP packet in the Xen frontend network driver until the backend signals that its carrier status has been processed by the stack. The frontend part of the patch is not applicable in upstream Linux as there is no gratuitous ARP packet support there at all. While the backend upstream is dead as far as I know. So this is RHEL5-specific. Both patches are needed to ensure the successful transmission of the gratuitous ARP packet which is necessary for Xen live migrations. In the backend: the patch registers a notifier callback for the "carrier up" event, and only switches the backend's state to XenbusStateConnected when that handler is called. In the frontend: the send_fake_arp() call is moved from the "backend moved to XenbusStateInitWait" case to the "backend moved to XenbusStateConnected" case. ---------- Pvops dom0 implemented the backend's move to XenbusStateConnected completely differently (see comment 30). The frontend part is simply missing from RHEL-6 (and upstream too, and they about knows it -- see the last paragraph of http://lists.xensource.com/archives/html/xen-devel/2011-06/msg01969.html). Current working idea: we *only* have to relocated the netif_notify_peers() call from the XenbusStateInitWait handler to the XenbusStateConnected handler. This change should be upstream-able (should not harm there, see the xen-devel link above), and should fix the problem for us, because the RHEL-5.6 host would (indirectly) delay the RHEL-6 guest's gARP, just as it causes the RHEL-5 guest to wait. A summary of comment 34: The pvops and the RHEL-5 dom0 kernels flip netback to XenbusStateConnected differently, but they agree in that the switch is "sufficiently" delayed. The RHEL-5 guest co-operates with the RHEL-5 dom0. The Fedora/upstream guest (probably) co-operates with the pvops dom0. Moving the gARP to the XenbusStateConnected handler in the RHEL-6 / Fedora / upstream guest, the guest should continue working with the pvops dom0, and should *start* working with RHEL-5 dom0. I'll test this. (In reply to comment #35) > I'll test this. It didn't work. If we still want to backport upstream's netback solution to RHEL-5 (see comment 30), then we'll first have to revert commit 14bee682, posted for bug 458934 (comment 34), at least partially. The two seem to conflict in when to move netback to the Connected state. (In reply to comment #36) > (In reply to comment #35) > > > I'll test this. > > It didn't work. Meaning, moving just the netif_notify_peers() call to XenbusStateConnected in backend_changed() was not sufficient. (Hosts: 2.6.18-238, guest: 2.6.32-165, plus patch.) Created attachment 511972 [details]
netback: partially revert the host side of 14bee682
This patch reverts netback's notifier-based switch to XenbusStateConnected, and fixes up the reversion a bit. The state after this reversion is that netback moves as quickly to XenbusStateConnected as possible. This is only intended as preparation for the following patch, which backports upstream's switch to XenbusStateConnected in netback.
Created attachment 511973 [details]
netback: wait for hotplug scripts to complete before signalling Connected to frontend
This is a backport of 43223efd from Jeremy's pvops tree. Plus I added two printk()'s for debugging.
Created attachment 511974 [details]
xen-netfront: send gARP after backend moved to XenbusStateConnected
This 6.1 patch moves the gARP queueing to XenbusStateConnected, adding an informational printk() too.
I uploaded the last three patches as proof that the strategy outlined in comment 32 and comment 33 doesn't work. The host side patches make sure that netback only advances to Connected state once the hotplug scripts have completed. The guest side patch ensures that the gARP is not queued until the host moved to Connected. Nonetheless, the network outage remains reproducible: Hosts: 2.6.18-273.el5.missing_arp_bz713585_hostnotif.local.xen Guest: 2.6.32-165.missing_arp_bz713585_connected The times are synchronized between the two hosts and the guest (common NTP source). When the guest is first started: **** host1 log: Jul 8 13:30:19 hp-z800-02 kernel: tap tap-1-51712: 2 getting info Jul 8 13:30:19 hp-z800-02 kernel: device vif1.0 entered promiscuous mode Jul 8 13:30:19 hp-z800-02 kernel: ADDRCONF(NETDEV_UP): vif1.0: link is not ready Jul 8 13:30:20 hp-z800-02 kernel: blktap: ring-ref 8, event-channel 25, protocol 1 (x86_64-abi) Jul 8 13:30:22 hp-z800-02 kernel: ADDRCONF(NETDEV_CHANGE): vif1.0: link becomes ready Jul 8 13:30:22 hp-z800-02 kernel: br0: port 3(vif1.0) entering forwarding state Jul 8 13:30:22 hp-z800-02 kernel: netback: Connected: from within watch **** guest log: Jul 8 13:30:33 dhcp47-212 kernel: Initialising Xen virtual ethernet driver. Jul 8 13:30:33 dhcp47-212 kernel: xennet: queueing gARP After first live migration to host2: **** host2 log: Jul 8 13:41:13 hp-z600-02 kernel: tap tap-1-51712: 2 getting info Jul 8 13:41:13 hp-z600-02 kernel: device vif1.0 entered promiscuous mode Jul 8 13:41:13 hp-z600-02 kernel: ADDRCONF(NETDEV_UP): vif1.0: link is not ready Jul 8 13:41:17 hp-z600-02 kernel: blktap: ring-ref 8, event-channel 24, protocol 1 (x86_64-abi) Jul 8 13:41:17 hp-z600-02 kernel: ADDRCONF(NETDEV_CHANGE): vif1.0: link becomes ready Jul 8 13:41:17 hp-z600-02 kernel: br0: port 3(vif1.0) entering forwarding state Jul 8 13:41:17 hp-z600-02 kernel: netback: Connected: from within watch **** guest log: Jul 8 13:41:17 dhcp47-212 kernel: xennet: queueing gARP I have live-ping-ponged the guest multiple times, with identical results: the printk()'s testify about correct ordering, but the gARP is still lost. At this point I'm back to comment 20. The lost gARP seems to be a recurring problem. See bug 453526 and commit 24750f7. (In reply to comment #42) > I uploaded the last three patches as proof that the strategy outlined in > comment 32 and comment 33 doesn't work. The host side patches make sure that > netback only advances to Connected state once the hotplug scripts have > completed. The guest side patch ensures that the gARP is not queued until the > host moved to Connected. Nonetheless, the network outage remains reproducible: > > Hosts: 2.6.18-273.el5.missing_arp_bz713585_hostnotif.local.xen > Guest: 2.6.32-165.missing_arp_bz713585_connected I was an idiot, and forgot to set the arp_notify sysctl in the guest. After doing that, the network outage does go away under the above patches. Covered cases this far: - patching only the RHEL-6.1+ guest (2.6.32-165) does not work: - comment 34, comment 35, comment 36, comment 38 - patching only the host doesn't suffice either: - RHEL-6.1 (2.6.32-131.0.15.el6) still experiences the outage - patching both sides (see quoted versions above): works Further steps: - check (unpatched) Fedora guest under patched host, - check RHEL-5 guest under patched host. On the patched host(s): - RHEL-4.9 keeps working (tested with multiple ping-pongs), - RHEL-5.6 keeps working (tested with multiple ping-pongs), - Fedora-15, with arp_notify set, produces results identical to those of the unpatched RHEL-6 guest -- network outage after migration. The guest patch will have to be sent to upstream for consideration. (In reply to comment #45) > The guest patch will have to be sent to upstream for consideration. http://lists.xensource.com/archives/html/xen-devel/2011-07/msg00327.html Patch(es) available on kernel-2.6.32-169.el6 Hi Laszlo, I did some test on: host: 2.6.18-274.el5xen guest: 2.6.32-171.el6 Result as below: 1. The network outage happen without set arp_notify sysctl to 1 on the guest. 2. After set all arp_notify sysctl to 1, there is no packet loss during multiple live ping pong migration (about 10 times). Should we have technical note to address this (set arp_notify to 1)? Regards! (In reply to comment #50) > Should we have technical note to address this (set arp_notify to 1)? I guess... I'm adding a proposal. Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: In order to ensure uninterrupted network connectivity after live migration, the "net.ipv4.conf.INTERFACE.arp_notify" sysctl should be set to 1 in a Red Hat Enterprise Linux 6.1 Xen guest using the paravirtualized (xen-netfront) network driver. I can reproduce this bug on kernel-2.6.32-131.0.15.el6. I did the migration between two RHEL5.7 (kernel-xen-2.6.18-284.el5xen) hosts. During and after the migration I cannot ping the guest from outside. Then I updated the guest kernel to 2.6.32-194.el6, and did the test again. During the migration, the guest stopped responding for a little while. About 5 ping packets were lost. But after that the guest is live again. The ping continued to get response from the guest. I'm moving this to VERIFIED. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2011-1530.html |