+++ This bug was initially created as a clone of Bug #653262 +++ Description of problem: For RHEL5 PV guest(whose kernel is 2.6.18-231.el5xen), I try to ssh to it after I balloon it up. The network is lost when I ssh to it. I cannot ssh into it nor can I ping to it, although I can get console of the guest. This problem only happens when I turn off auto-ballooning(which is default value) and there is not enough free memory. For other situations such as auto-ballooning is on or there is enough free memory for guest to balloon up, no such issues are triggered. Version-Release number of selected component (if applicable): Host: xen-devel-3.0.3-117.el5 xen-libs-3.0.3-117.el5 kernel-xen-devel-2.6.18-231.el5 xen-3.0.3-117.el5 xen-debuginfo-3.0.3-117.el5 kernel-xen-2.6.18-231.el5 Guest: 2.6.18-231.el5xen How reproducible: Always Steps to Reproduce: 1. Make sure auto-ballooning is turn off in xend. # grep "balloon-dom0" /etc/xen/xend-config.sxp (auto-balloon-dom0 no) 2. Create a RHEL5 PV guest with memory=512 and maxmem=1024 3. Make sure free memory is not enough for ballooning: # xm info | grep free free_memory : 1 # xm li vm1 Name ID Mem(MiB) VCPUs State Time(s) vm1 1 511 4 -b---- 8.9 # xm li vm1 -l | grep mem (memory 512) (shadow_memory 0) (maxmem 1024) 4. Try to balloon up guest # xm mem-set vm1 900 5. ping to guest after ballooning # ping 10.66.93.117 PING 10.66.93.117 (10.66.93.117) 56(84) bytes of data. 64 bytes from 10.66.93.117: icmp_seq=1 ttl=64 time=0.252 ms 64 bytes from 10.66.93.117: icmp_seq=2 ttl=64 time=0.050 ms 64 bytes from 10.66.93.117: icmp_seq=3 ttl=64 time=0.053 ms --- 10.66.93.117 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2000ms rtt min/avg/max/mdev = 0.050/0.118/0.252/0.094 ms 6. ssh to guest after that Actual results: At step 6, you cannot ssh to the guest and after which the network of the guest is lost. You cannot ping to it any more, though the guest is still alive and you can get console of it. Expected results: Everything works well after ballooning. Additional info: No such issues are triggered when I downgrade the PV guest to -194 kernel-xen packages. So I would consider this bug as regression. --- Additional comment from yuzhang on 2010-11-15 00:37:00 EST --- Created attachment 460474 [details] config file to create the guest --- Additional comment from yuzhang on 2010-11-15 00:39:02 EST --- Created attachment 460475 [details] xm dmesg log --- Additional comment from yuzhang on 2010-11-15 00:43:06 EST --- Created attachment 460476 [details] xend.log --- Additional comment from drjones on 2010-11-15 06:58:20 EST --- Most likely culprit 7c14912 [virt] xen: don't give up ballooning under mem pressure if a -221 kernel works (this patch is in -222) then that would give my accusation more weight. --- Additional comment from lersek on 2010-11-15 10:30:39 EST --- I could reproduce the problem with host -231, guest -222 -- I was able to ssh in the guest, and started to type "uname -r" to verify I'm running -222. I didn't get past "una", and then "ping" stopped to work too. I checked with -221, and the problem is gone. I think Andrew is right. I'll try to revert the patch. ====================== The bug is present on RHEL6 too, but is not a regression there.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
RHEL6 doesn't do flipping (see bug 653505 and bug 653262 for an explanation of how flipping causes the bug in RHEL4 and RHEL5). However, dom0 tries to balloon up even for copying receivers and fails in the same way as explained in the above mentioned bugs. So, this is a backend bug.
There is a patch in upstream c/s 14355.
c/s 14355: http://xenbits.xensource.com/xen-unstable.hg?rev/68282f4b3e0f (Second hunk only.)
Created attachment 461068 [details] no need to balloon up for copying receivers
in kernel-2.6.18-233.el5 You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed.
I'm putting this in VERIFIED according to comment 20 of bug 653262 and comment 6 of bug 653505. Additionally: merely updating the host kernel was found not working. Host and guest kernel must both be updated in ordery to solve this issue.
Yes, that's correct. HVM guests do copying by default so they do not require an upgrade. For PV guests, if you do not upgrade the guest you need rx_copy=1 on the kernel command line of the guest.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0017.html
*** Bug 648763 has been marked as a duplicate of this bug. ***