Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1173257

Summary: RHEV VM with hugepages panics after migration due to memory balloon device
Product: Red Hat Enterprise Virtualization Manager Reporter: Jake Hunsaker <jhunsaker>
Component: vdsmAssignee: Martin Sivák <msivak>
Status: CLOSED ERRATA QA Contact: Artyom <alukiano>
Severity: high Docs Contact:
Priority: high    
Version: 3.4.2CC: bazulay, dfediuck, ecohen, iheim, jhunsaker, lpeer, lsurette, sherold, yeylon
Target Milestone: ---   
Target Release: 3.5.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: sla
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Previously, it took a couple of seconds to collect the memory and balloon information of a virtual machine that had just finished migration. This caused MOM to receive zeros and subsequently try to set the balloon size to zero. The guest operating system then returned all memory it could and crashed with kernel panic once the kernel needed to allocate some buffer. Now, VDSM does not report any ballooning information (not even zero) until it collects the necessary data, so migrating ballooned hosts works properly.
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-02-11 21:13:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: SLA RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jake Hunsaker 2014-12-11 18:48:53 UTC
Description of problem:

A RHEV VM has a kernel panic after running out of memory.

This VM runs RHEL 6.5, has 8GB total memory, and has about 5GB of Huge Pages configured for oracle databases.
It seems like what's happening is when the VM finishes migrating, the balloon driver kicks in and attempts to take back a bunch of memory, but is a little too ambitious about it.

When the migration completes sometimes this error is in the Event Log:
"The Balloon device on VM <VM> on host <hypervisor> is inflated but the device cannot be controlled (guest agent is down)."

Version-Release number of selected component (if applicable):
rhevm-3.4.2-0.2
vdsm-4.14.13-3.bz1152587 (hotfix for BZ 1152587)
qemu-kvm-rhev-0.12.1.2-2.415.el6_5.14

How reproducible:
Customer reports he can reproduce this on demand

Steps to Reproduce:
1. Configure VM with hugepages
2. Migrate VM
3.

Actual results:

VM with panic after the balloon driver kicks in

Expected results:

VM should not panic

Additional info:

I was not sure if this should be filed under RHEV or RHEL for KVM, but vdsm made sense to me since afaik vdsm is what kicks off the use of the balloon device. Please correct me if this is wrong.

Comment 1 Doron Fediuck 2014-12-14 08:51:07 UTC
Can you please check how much memory is guaranteed for this VM?
Also, can you please provide all relevant logs including mom log files?

Comment 2 Martin Sivák 2014-12-15 09:51:14 UTC
This is a potential duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1123274

Comment 3 Doron Fediuck 2014-12-15 15:26:25 UTC
Please test based on the fix of bug 1123274.

Comment 4 Jake Hunsaker 2014-12-16 14:16:59 UTC
Guaranteed memory was just below the amount reserved for hugepages in the guest. Looks like this may in fact be a duplicate of BZ 1123274 - waiting on final result from customer after adjusting the guaranteed memory setting.

Comment 6 Artyom 2014-12-21 15:00:11 UTC
Verified on rhevm-3.5.0-0.26.el6ev.noarch
1) Configure vm with memory 4096mb and guaranteed memory 1024mb
2) Configure huge-pages 2Gb(2048kb*1024) on guest OS.
3) Enabling ballooning also on cluster and vm level.
4) Migrate vm number of times.
Vm not crushed and run fine.

Comment 8 errata-xmlrpc 2015-02-11 21:13:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0159.html