360741 – (Roldyx) ICMP crash when live migrating

Bug 360741 (Roldyx) - ICMP crash when live migrating

Summary: ICMP crash when live migrating

Keywords:
Status:	CLOSED DUPLICATE of bug 426861
Alias:	Roldyx
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	xen
Sub Component:
Version:	5.1
Hardware:	x86_64
OS:	Linux
Priority:	low
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Rik van Riel
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	492190
TreeView+	depends on / blocked

Reported:	2007-10-31 19:01 UTC by Rodrigo roldan
Modified:	2009-12-14 21:22 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2009-04-10 17:13:29 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
xend.log Dom0 (3.00 KB, application/octet-stream) 2007-11-05 18:47 UTC, Rodrigo roldan	no flags	Details
View All

Description Rodrigo roldan 2007-10-31 19:01:13 UTC

Description of problem:
ICMP crash after of migrate a VM.

Version-Release number of selected component (if applicable):
kernel version Dom0 and DomU=Linux xen7 2.6.18-53.el5xen
How reproducible:


Steps to Reproduce:
1. root@squid#ping localhost in VM --> Reply from...
2. In xen1(Dom0).... root@xen1#xm migrate Squid xen2
3. In xen2(Dom0).... root@xen2#xm list
Squid        1      511     1 r-----   2451.1
4. In VM(Squid) ... root@squid#ping localhost
PING localhost (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_seq=0 ttl=64 time=0.000 ms. 
  
Actual results:
#ping localhost
PING localhost (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_seq=0 ttl=64 time=0.000 ms

--- localhost ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.000/0.000/0.000/0.000 ms, pipe 2
-------- END -----
#strace ping localhost

mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0xf7fe5000
read(4, "# Do not remove the following li"..., 4096) = 174
close(4)                                = 0
munmap(0xf7fe5000, 4096)                = 0
write(1, "64 bytes from localhost (127.0.0"..., 6964 bytes from localhost
(127.0.0.1): icmp_seq=0 ttl=64 time=0.000 ms
) = 69
recvmsg(3, 0xffafab10, MSG_DONTWAIT)    = -1 EAGAIN (Resource temporarily
unavailable)
gettimeofday({1193854525, 449804}, NULL) = 0

Expected results:
Reply from...

Additional info:
I can connect to vm via ssh... but ICMP dont work..

Comment 1 Rodrigo roldan 2007-11-05 18:47:34 UTC

Created attachment 248531 [details]
xend.log Dom0

Comment 2 Rodrigo roldan 2007-11-12 15:06:33 UTC

I detected the problem, it's a rare one. When having a bad mdadm config, we have
connection issues. 

The solution: repair /etc/mdadm.conf or wait for a source code update.

Comment 3 Alain RICHARD 2008-07-09 12:54:47 UTC

Under Centos 5.2 (RHEL 5.2), I get the same behavior : once a domU is migrated from one server to an 
other, ping stop after just the first packet.

Looking more deeply in that issue, I found out that effectively it hangs on "gettimeofday" with an 
EAGAIN error.

In fact the problem is not with ping, but with some part of the clock code. When the problem is present, 
I get always the same time with the date command :

[root@auto127 ~]# date
mer jui  9 14:33:42 CEST 2008
[root@auto127 ~]# date
mer jui  9 14:33:42 CEST 2008
[root@auto127 ~]# date
mer jui  9 14:33:42 CEST 2008
[root@auto127 ~]# date
mer jui  9 14:33:42 CEST 2008
[root@auto127 ~]# date
mer jui  9 14:33:42 CEST 2008
[root@auto127 ~]# date
mer jui  9 14:33:42 CEST 2008
[root@auto127 ~]# date
mer jui  9 14:33:42 CEST 2008
[root@auto127 ~]# date
mer jui  9 14:33:42 CEST 2008
[root@auto127 ~]# date
mer jui  9 14:33:42 CEST 2008
[root@auto127 ~]# date
mer jui  9 14:33:42 CEST 2008
[root@auto127 ~]# date
mer jui  9 14:33:42 CEST 2008
[root@auto127 ~]# 

After some time (from 5 minutes to 15 minutes), the date works again and so is ping.

During the hang, cat /proc/uptime is increasing normaly.

Our setup is :

dom0 servers under Centos 5.2, 2.6.18-92.1.1.el5xen, clock synchronised with ntpd.
domU under Centos 5.2, 2.6.18-92.1.1.el5xen, no ntpd daemon.

/proc/sys/xen/independent_wallclock is 0 in dom0 and domU.

this is 100% reproductible under our setup.

Comment 4 Chris Lalancette 2008-07-09 14:51:47 UTC

Ah, OK.  Note that there are still some possible lingering problems with live
migrate ARP stuff, but if your clock is stopping, then it is almost certainly BZ
426861 which we've been tracking.

Chris Lalancette

Comment 5 Bill Burns 2008-07-09 15:02:56 UTC

To the original reporter, does time stop when this happens as it does to the
other  reporter?

Comment 6 Rik van Riel 2009-04-10 17:13:29 UTC

It's been half a year since this bug has seen activity. I believe it is a duplicate of bug 426861 and will close it as such.  Feel free to reopen if the bug continues to manifest itself after upgrading to RHEL 5.3.

*** This bug has been marked as a duplicate of bug 426861 ***

Note You need to log in before you can comment on or make changes to this bug.