Bug 360741 - (Roldyx) ICMP crash when live migrating
ICMP crash when live migrating
Status: CLOSED DUPLICATE of bug 426861
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: xen (Show other bugs)
5.1
x86_64 Linux
low Severity high
: ---
: ---
Assigned To: Rik van Riel
Virtualization Bugs
:
Depends On:
Blocks: 492190
  Show dependency treegraph
 
Reported: 2007-10-31 15:01 EDT by Rodrigo roldan
Modified: 2009-12-14 16:22 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-04-10 13:13:29 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
xend.log Dom0 (3.00 KB, application/octet-stream)
2007-11-05 13:47 EST, Rodrigo roldan
no flags Details

  None (edit)
Description Rodrigo roldan 2007-10-31 15:01:13 EDT
Description of problem:
ICMP crash after of migrate a VM.

Version-Release number of selected component (if applicable):
kernel version Dom0 and DomU=Linux xen7 2.6.18-53.el5xen
How reproducible:


Steps to Reproduce:
1. root@squid#ping localhost in VM --> Reply from...
2. In xen1(Dom0).... root@xen1#xm migrate Squid xen2
3. In xen2(Dom0).... root@xen2#xm list
Squid        1      511     1 r-----   2451.1
4. In VM(Squid) ... root@squid#ping localhost
PING localhost (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_seq=0 ttl=64 time=0.000 ms. 
  
Actual results:
#ping localhost
PING localhost (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_seq=0 ttl=64 time=0.000 ms

--- localhost ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.000/0.000/0.000/0.000 ms, pipe 2
-------- END -----
#strace ping localhost

mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0xf7fe5000
read(4, "# Do not remove the following li"..., 4096) = 174
close(4)                                = 0
munmap(0xf7fe5000, 4096)                = 0
write(1, "64 bytes from localhost (127.0.0"..., 6964 bytes from localhost
(127.0.0.1): icmp_seq=0 ttl=64 time=0.000 ms
) = 69
recvmsg(3, 0xffafab10, MSG_DONTWAIT)    = -1 EAGAIN (Resource temporarily
unavailable)
gettimeofday({1193854525, 449804}, NULL) = 0

Expected results:
Reply from...

Additional info:
I can connect to vm via ssh... but ICMP dont work..
Comment 1 Rodrigo roldan 2007-11-05 13:47:34 EST
Created attachment 248531 [details]
xend.log Dom0
Comment 2 Rodrigo roldan 2007-11-12 10:06:33 EST
I detected the problem, it's a rare one. When having a bad mdadm config, we have
connection issues. 

The solution: repair /etc/mdadm.conf or wait for a source code update.
Comment 3 Alain RICHARD 2008-07-09 08:54:47 EDT
Under Centos 5.2 (RHEL 5.2), I get the same behavior : once a domU is migrated from one server to an 
other, ping stop after just the first packet.

Looking more deeply in that issue, I found out that effectively it hangs on "gettimeofday" with an 
EAGAIN error.

In fact the problem is not with ping, but with some part of the clock code. When the problem is present, 
I get always the same time with the date command :

[root@auto127 ~]# date
mer jui  9 14:33:42 CEST 2008
[root@auto127 ~]# date
mer jui  9 14:33:42 CEST 2008
[root@auto127 ~]# date
mer jui  9 14:33:42 CEST 2008
[root@auto127 ~]# date
mer jui  9 14:33:42 CEST 2008
[root@auto127 ~]# date
mer jui  9 14:33:42 CEST 2008
[root@auto127 ~]# date
mer jui  9 14:33:42 CEST 2008
[root@auto127 ~]# date
mer jui  9 14:33:42 CEST 2008
[root@auto127 ~]# date
mer jui  9 14:33:42 CEST 2008
[root@auto127 ~]# date
mer jui  9 14:33:42 CEST 2008
[root@auto127 ~]# date
mer jui  9 14:33:42 CEST 2008
[root@auto127 ~]# date
mer jui  9 14:33:42 CEST 2008
[root@auto127 ~]# 

After some time (from 5 minutes to 15 minutes), the date works again and so is ping.

During the hang, cat /proc/uptime is increasing normaly.

Our setup is :

dom0 servers under Centos 5.2, 2.6.18-92.1.1.el5xen, clock synchronised with ntpd.
domU under Centos 5.2, 2.6.18-92.1.1.el5xen, no ntpd daemon.

/proc/sys/xen/independent_wallclock is 0 in dom0 and domU.

this is 100% reproductible under our setup.


Comment 4 Chris Lalancette 2008-07-09 10:51:47 EDT
Ah, OK.  Note that there are still some possible lingering problems with live
migrate ARP stuff, but if your clock is stopping, then it is almost certainly BZ
426861 which we've been tracking.

Chris Lalancette
Comment 5 Bill Burns 2008-07-09 11:02:56 EDT
To the original reporter, does time stop when this happens as it does to the
other  reporter?
Comment 6 Rik van Riel 2009-04-10 13:13:29 EDT
It's been half a year since this bug has seen activity. I believe it is a duplicate of bug 426861 and will close it as such.  Feel free to reopen if the bug continues to manifest itself after upgrading to RHEL 5.3.

*** This bug has been marked as a duplicate of bug 426861 ***

Note You need to log in before you can comment on or make changes to this bug.