Description of problem: ICMP crash after of migrate a VM. Version-Release number of selected component (if applicable): kernel version Dom0 and DomU=Linux xen7 2.6.18-53.el5xen How reproducible: Steps to Reproduce: 1. root@squid#ping localhost in VM --> Reply from... 2. In xen1(Dom0).... root@xen1#xm migrate Squid xen2 3. In xen2(Dom0).... root@xen2#xm list Squid 1 511 1 r----- 2451.1 4. In VM(Squid) ... root@squid#ping localhost PING localhost (127.0.0.1) 56(84) bytes of data. 64 bytes from localhost (127.0.0.1): icmp_seq=0 ttl=64 time=0.000 ms. Actual results: #ping localhost PING localhost (127.0.0.1) 56(84) bytes of data. 64 bytes from localhost (127.0.0.1): icmp_seq=0 ttl=64 time=0.000 ms --- localhost ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.000/0.000/0.000/0.000 ms, pipe 2 -------- END ----- #strace ping localhost mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xf7fe5000 read(4, "# Do not remove the following li"..., 4096) = 174 close(4) = 0 munmap(0xf7fe5000, 4096) = 0 write(1, "64 bytes from localhost (127.0.0"..., 6964 bytes from localhost (127.0.0.1): icmp_seq=0 ttl=64 time=0.000 ms ) = 69 recvmsg(3, 0xffafab10, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable) gettimeofday({1193854525, 449804}, NULL) = 0 Expected results: Reply from... Additional info: I can connect to vm via ssh... but ICMP dont work..
Created attachment 248531 [details] xend.log Dom0
I detected the problem, it's a rare one. When having a bad mdadm config, we have connection issues. The solution: repair /etc/mdadm.conf or wait for a source code update.
Under Centos 5.2 (RHEL 5.2), I get the same behavior : once a domU is migrated from one server to an other, ping stop after just the first packet. Looking more deeply in that issue, I found out that effectively it hangs on "gettimeofday" with an EAGAIN error. In fact the problem is not with ping, but with some part of the clock code. When the problem is present, I get always the same time with the date command : [root@auto127 ~]# date mer jui 9 14:33:42 CEST 2008 [root@auto127 ~]# date mer jui 9 14:33:42 CEST 2008 [root@auto127 ~]# date mer jui 9 14:33:42 CEST 2008 [root@auto127 ~]# date mer jui 9 14:33:42 CEST 2008 [root@auto127 ~]# date mer jui 9 14:33:42 CEST 2008 [root@auto127 ~]# date mer jui 9 14:33:42 CEST 2008 [root@auto127 ~]# date mer jui 9 14:33:42 CEST 2008 [root@auto127 ~]# date mer jui 9 14:33:42 CEST 2008 [root@auto127 ~]# date mer jui 9 14:33:42 CEST 2008 [root@auto127 ~]# date mer jui 9 14:33:42 CEST 2008 [root@auto127 ~]# date mer jui 9 14:33:42 CEST 2008 [root@auto127 ~]# After some time (from 5 minutes to 15 minutes), the date works again and so is ping. During the hang, cat /proc/uptime is increasing normaly. Our setup is : dom0 servers under Centos 5.2, 2.6.18-92.1.1.el5xen, clock synchronised with ntpd. domU under Centos 5.2, 2.6.18-92.1.1.el5xen, no ntpd daemon. /proc/sys/xen/independent_wallclock is 0 in dom0 and domU. this is 100% reproductible under our setup.
Ah, OK. Note that there are still some possible lingering problems with live migrate ARP stuff, but if your clock is stopping, then it is almost certainly BZ 426861 which we've been tracking. Chris Lalancette
To the original reporter, does time stop when this happens as it does to the other reporter?
It's been half a year since this bug has seen activity. I believe it is a duplicate of bug 426861 and will close it as such. Feel free to reopen if the bug continues to manifest itself after upgrading to RHEL 5.3. *** This bug has been marked as a duplicate of bug 426861 ***