Bug 1120197

Summary: The Balloon driver on VM ... on host ... is requested but unavailable.
Product: Red Hat Enterprise Virtualization Manager Reporter: Petr Spacek <pspacek>
Component: ovirt-engine-webadmin-portalAssignee: Doron Fediuck <dfediuck>
Status: CLOSED ERRATA QA Contact: Lukas Svaty <lsvaty>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.3.0CC: akotov, dfediuck, dossow, ecohen, erik-fedora, iheim, jcoscia, jhunsaker, mavital, pch, plarsen, rbalakri, Rhev-m-bugs, sherold, yeylon
Target Milestone: ---   
Target Release: 3.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: sla
Fixed In Version: org.ovirt.engine-root-3.5.0-23 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-02-11 18:06:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: SLA RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Petr Spacek 2014-07-16 11:42:45 UTC
Description of problem:
Almost all VMs on one hypervisor in our internal RHEV cluster are generating message "The Balloon driver on VM ... on host ... is requested but unavailable."


Version-Release number of selected component (if applicable):
# hypervisor: RHEL 6.5.z
mom-0.4.0-1.el6ev.noarch
vdsm-4.14.7-3.el6ev.x86_64

# RHEV-M:
Version 3.3.3-0.52.el6ev 

# Guest: Fedora 20
ovirt-guest-agent-common-1.0.9-1.fc20.noarch
kernel-3.13.10-200.fc20.x86_64


How reproducible: It happens for two days now.

Steps to Reproduce: ? It happens on our internal cluster.


Actual results:
Error message 'The Balloon driver on VM ... on host ... is requested but unavailable.' is shown.

Expected results:
Well, the error message should not be there.


Additional info:
Information from one of affected guests:
vm-233 ~]# lsmod | grep virtio
virtio_console         23843  1 
virtio_net             28024  0 
virtio_balloon         13530  0 
virtio_blk             17972  3 
virtio_pci             17677  0 
virtio_ring            19975  5 virtio_blk,virtio_net,virtio_pci,virtio_balloon,virtio_console
virtio                 14172  5 virtio_blk,virtio_net,virtio_pci,virtio_balloon,virtio_console

vm-233 ~]# service ovirt-guest-agent status
Redirecting to /bin/systemctl status  ovirt-guest-agent.service
ovirt-guest-agent.service - oVirt Guest Agent
   Loaded: loaded (/usr/lib/systemd/system/ovirt-guest-agent.service; enabled)
   Active: active (running) since Út 2014-07-01 10:47:47 CEST; 2 weeks 1 days ago
 Main PID: 456 (python)
   CGroup: /system.slice/ovirt-guest-agent.service
           └─456 /usr/bin/python /usr/share/ovirt-guest-agent/ovirt-guest-agent.py

I'm attaching logs from affected hypervisor. List of compressed files follows:
log/getAllVmStats.log
log/getVdsCapabilities.log
log/getVdsStats.log
log/mom.log
log/mom.log.1
log/mom.log.2
log/vdsm.log
log/vdsm.log.1
log/vdsm.log.10
log/vdsm.log.11
log/vdsm.log.12
log/vdsm.log.2
log/vdsm.log.3
log/vdsm.log.4
log/vdsm.log.5
log/vdsm.log.6
log/vdsm.log.7
log/vdsm.log.8
log/vdsm.log.9
log/webadmin.log
log/webadmin-log.pdf

Comment 3 Petr Spacek 2014-08-22 09:29:49 UTC
As far as I can see all affected VMs have "Memory Balloon Device Enabled" checkbox enabled. The interesting thing is that it didn't happened in last two weeks for some reason. Maybe VDSM/hypervisors were restarted in meantime...

Comment 4 Jiri Moskovcak 2014-08-27 14:46:23 UTC
From the logs it seems that the balloon works fine (the memory changes) and the problem is in the check in the engine code:

if (isBalloonDeviceActiveOnVm(vmInternalData)
                        && (Objects.equals(balloonInfo.getCurrentMemory(), balloonInfo.getBalloonMaxMemory())
                || !Objects.equals(balloonInfo.getCurrentMemory(), balloonInfo.getBalloonTargetMemory()))) {
                    vmBalloonDriverIsRequestedAndUnavailable(vmId);


getCurrentMemory() and getTargetMemory() returns *almost* the same number (probably because of some memory alignment or rounding error) so the balloon works, but the condition fails because it requires the numbers to be exactly the same.

We can add some allowed difference into check so it's not so strict, but still checks if the balloon works (changes).

Comment 5 Lukas Svaty 2014-12-03 14:21:09 UTC
Can you provide any verification steps please how to simulate this?

Comment 6 Jiri Moskovcak 2014-12-04 09:17:36 UTC
(In reply to Lukas Svaty from comment #5)
> Can you provide any verification steps please how to simulate this?

- just try to set odd (not divisible by 2) value as the balloon target, the balloon should get a different (but close enough) amount of memory and there should be no warning

Comment 7 Lukas Svaty 2014-12-22 11:50:28 UTC
tested multiple times on av13.4 seems to be working
if this bug reappears feel free to reopen it

Comment 8 Peter Larsen 2015-01-19 21:25:30 UTC
I can confirm having the exact same problem with RHEVH 3.4

Comment 11 errata-xmlrpc 2015-02-11 18:06:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0158.html