Bug 653501 - netback tries to balloon up even if front-end doesn't do flipping
Summary: netback tries to balloon up even if front-end doesn't do flipping
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen
Version: 5.6
Hardware: Unspecified
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Laszlo Ersek
QA Contact: Virtualization Bugs
URL:
Whiteboard:
: 648763 (view as bug list)
Depends On:
Blocks: 514489 653262 653505
TreeView+ depends on / blocked
 
Reported: 2010-11-15 16:00 UTC by Paolo Bonzini
Modified: 2018-11-14 18:49 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of: 653262
Environment:
Last Closed: 2011-01-13 22:01:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
no need to balloon up for copying receivers (518 bytes, patch)
2010-11-17 13:45 UTC, Laszlo Ersek
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0017 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.6 kernel security and bug fix update 2011-01-13 10:37:42 UTC

Description Paolo Bonzini 2010-11-15 16:00:09 UTC
+++ This bug was initially created as a clone of Bug #653262 +++

Description of problem:
For RHEL5 PV guest(whose kernel is 2.6.18-231.el5xen), I try to ssh to it after I balloon it up. The network is lost when I ssh to it. I cannot ssh into it nor can I ping to it, although I can get console of the guest. This problem only happens when I turn off auto-ballooning(which is default value) and there is not enough free memory. For other situations such as auto-ballooning is on or there is enough free memory for guest to balloon up, no such issues are triggered.     

Version-Release number of selected component (if applicable):
Host:  
xen-devel-3.0.3-117.el5
xen-libs-3.0.3-117.el5
kernel-xen-devel-2.6.18-231.el5
xen-3.0.3-117.el5
xen-debuginfo-3.0.3-117.el5
kernel-xen-2.6.18-231.el5

Guest: 
2.6.18-231.el5xen

How reproducible:
Always

Steps to Reproduce:
1. Make sure auto-ballooning is turn off in xend. 
# grep "balloon-dom0" /etc/xen/xend-config.sxp
 (auto-balloon-dom0 no)

2. Create a RHEL5 PV guest with memory=512 and maxmem=1024
3. Make sure free memory is not enough for ballooning:
# xm info | grep free
free_memory            : 1

# xm li vm1
Name                                      ID Mem(MiB) VCPUs State   Time(s)
vm1                                        1      511     4 -b----      8.9

# xm li vm1 -l | grep mem
    (memory 512)
    (shadow_memory 0)
    (maxmem 1024)

4. Try to balloon up guest
# xm mem-set vm1 900

5. ping to guest after ballooning
# ping 10.66.93.117
PING 10.66.93.117 (10.66.93.117) 56(84) bytes of data.
64 bytes from 10.66.93.117: icmp_seq=1 ttl=64 time=0.252 ms
64 bytes from 10.66.93.117: icmp_seq=2 ttl=64 time=0.050 ms
64 bytes from 10.66.93.117: icmp_seq=3 ttl=64 time=0.053 ms

--- 10.66.93.117 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.050/0.118/0.252/0.094 ms

6. ssh to guest after that 

Actual results:
At step 6, you cannot ssh to the guest and after which the network of the guest is lost. You cannot ping to it any more, though the guest is still alive and you can get console of it.  

Expected results:
Everything works well after ballooning.


Additional info:
No such issues are triggered when I downgrade the PV guest to -194 kernel-xen packages. So I would consider this bug as regression.

--- Additional comment from yuzhang on 2010-11-15 00:37:00 EST ---

Created attachment 460474 [details]
config file to create the guest

--- Additional comment from yuzhang on 2010-11-15 00:39:02 EST ---

Created attachment 460475 [details]
xm dmesg log

--- Additional comment from yuzhang on 2010-11-15 00:43:06 EST ---

Created attachment 460476 [details]
xend.log

--- Additional comment from drjones on 2010-11-15 06:58:20 EST ---

Most likely culprit

7c14912 [virt] xen: don't give up ballooning under mem pressure

if a -221 kernel works (this patch is in -222) then that would give my accusation more weight.

--- Additional comment from lersek on 2010-11-15 10:30:39 EST ---

I could reproduce the problem with host -231, guest -222 -- I was able to ssh in the guest, and started to type "uname -r" to verify I'm running -222. I didn't get past "una", and then "ping" stopped to work too. I checked with -221, and the problem is gone. I think Andrew is right. I'll try to revert the patch.

======================

The bug is present on RHEL6 too, but is not a regression there.

Comment 1 RHEL Program Management 2010-11-15 20:49:39 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 2 Paolo Bonzini 2010-11-16 17:14:56 UTC
RHEL6 doesn't do flipping (see bug 653505 and bug 653262 for an explanation of how flipping causes the bug in RHEL4 and RHEL5).  However, dom0 tries to balloon up even for copying receivers and fails in the same way as explained in the above mentioned bugs.

So, this is a backend bug.

Comment 3 Paolo Bonzini 2010-11-16 17:19:50 UTC
There is a patch in upstream c/s 14355.

Comment 5 Laszlo Ersek 2010-11-16 18:19:58 UTC
c/s 14355:

http://xenbits.xensource.com/xen-unstable.hg?rev/68282f4b3e0f

(Second hunk only.)

Comment 7 RHEL Program Management 2010-11-17 06:30:30 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 8 Laszlo Ersek 2010-11-17 13:45:13 UTC
Created attachment 461068 [details]
no need to balloon up for copying receivers

Comment 10 Jarod Wilson 2010-11-23 17:06:18 UTC
in kernel-2.6.18-233.el5
You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 12 Jinxin Zheng 2010-12-22 08:18:38 UTC
I'm putting this in VERIFIED according to comment 20 of bug 653262 and comment 6 of bug 653505.

Additionally: merely updating the host kernel was found not working. Host and guest kernel must both be updated in ordery to solve this issue.

Comment 13 Paolo Bonzini 2010-12-22 09:47:57 UTC
Yes, that's correct.  HVM guests do copying by default so they do not require an upgrade.  For PV guests, if you do not upgrade the guest you need rx_copy=1 on the kernel command line of the guest.

Comment 15 errata-xmlrpc 2011-01-13 22:01:09 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0017.html

Comment 16 Laszlo Ersek 2011-02-03 08:49:18 UTC
*** Bug 648763 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.