Bug 1120197 - The Balloon driver on VM ... on host ... is requested but unavailable.
Summary: The Balloon driver on VM ... on host ... is requested but unavailable.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine-webadmin-portal
Version: 3.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.5.0
Assignee: Doron Fediuck
QA Contact: Lukas Svaty
URL:
Whiteboard: sla
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-07-16 11:42 UTC by Petr Spacek
Modified: 2019-04-28 09:51 UTC (History)
15 users (show)

Fixed In Version: org.ovirt.engine-root-3.5.0-23
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-02-11 18:06:07 UTC
oVirt Team: SLA


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2015:0158 normal SHIPPED_LIVE Important: Red Hat Enterprise Virtualization Manager 3.5.0 2015-02-11 22:38:50 UTC
oVirt gerrit 33921 master MERGED engine: handle memory allignment when checking the balloon health Never
oVirt gerrit 35585 ovirt-engine-3.5 MERGED engine: handle memory allignment when checking the balloon health Never
Red Hat Knowledge Base (Article) 1162653 None None None Never

Description Petr Spacek 2014-07-16 11:42:45 UTC
Description of problem:
Almost all VMs on one hypervisor in our internal RHEV cluster are generating message "The Balloon driver on VM ... on host ... is requested but unavailable."


Version-Release number of selected component (if applicable):
# hypervisor: RHEL 6.5.z
mom-0.4.0-1.el6ev.noarch
vdsm-4.14.7-3.el6ev.x86_64

# RHEV-M:
Version 3.3.3-0.52.el6ev 

# Guest: Fedora 20
ovirt-guest-agent-common-1.0.9-1.fc20.noarch
kernel-3.13.10-200.fc20.x86_64


How reproducible: It happens for two days now.

Steps to Reproduce: ? It happens on our internal cluster.


Actual results:
Error message 'The Balloon driver on VM ... on host ... is requested but unavailable.' is shown.

Expected results:
Well, the error message should not be there.


Additional info:
Information from one of affected guests:
vm-233 ~]# lsmod | grep virtio
virtio_console         23843  1 
virtio_net             28024  0 
virtio_balloon         13530  0 
virtio_blk             17972  3 
virtio_pci             17677  0 
virtio_ring            19975  5 virtio_blk,virtio_net,virtio_pci,virtio_balloon,virtio_console
virtio                 14172  5 virtio_blk,virtio_net,virtio_pci,virtio_balloon,virtio_console

vm-233 ~]# service ovirt-guest-agent status
Redirecting to /bin/systemctl status  ovirt-guest-agent.service
ovirt-guest-agent.service - oVirt Guest Agent
   Loaded: loaded (/usr/lib/systemd/system/ovirt-guest-agent.service; enabled)
   Active: active (running) since Út 2014-07-01 10:47:47 CEST; 2 weeks 1 days ago
 Main PID: 456 (python)
   CGroup: /system.slice/ovirt-guest-agent.service
           └─456 /usr/bin/python /usr/share/ovirt-guest-agent/ovirt-guest-agent.py

I'm attaching logs from affected hypervisor. List of compressed files follows:
log/getAllVmStats.log
log/getVdsCapabilities.log
log/getVdsStats.log
log/mom.log
log/mom.log.1
log/mom.log.2
log/vdsm.log
log/vdsm.log.1
log/vdsm.log.10
log/vdsm.log.11
log/vdsm.log.12
log/vdsm.log.2
log/vdsm.log.3
log/vdsm.log.4
log/vdsm.log.5
log/vdsm.log.6
log/vdsm.log.7
log/vdsm.log.8
log/vdsm.log.9
log/webadmin.log
log/webadmin-log.pdf

Comment 3 Petr Spacek 2014-08-22 09:29:49 UTC
As far as I can see all affected VMs have "Memory Balloon Device Enabled" checkbox enabled. The interesting thing is that it didn't happened in last two weeks for some reason. Maybe VDSM/hypervisors were restarted in meantime...

Comment 4 Jiri Moskovcak 2014-08-27 14:46:23 UTC
From the logs it seems that the balloon works fine (the memory changes) and the problem is in the check in the engine code:

if (isBalloonDeviceActiveOnVm(vmInternalData)
                        && (Objects.equals(balloonInfo.getCurrentMemory(), balloonInfo.getBalloonMaxMemory())
                || !Objects.equals(balloonInfo.getCurrentMemory(), balloonInfo.getBalloonTargetMemory()))) {
                    vmBalloonDriverIsRequestedAndUnavailable(vmId);


getCurrentMemory() and getTargetMemory() returns *almost* the same number (probably because of some memory alignment or rounding error) so the balloon works, but the condition fails because it requires the numbers to be exactly the same.

We can add some allowed difference into check so it's not so strict, but still checks if the balloon works (changes).

Comment 5 Lukas Svaty 2014-12-03 14:21:09 UTC
Can you provide any verification steps please how to simulate this?

Comment 6 Jiri Moskovcak 2014-12-04 09:17:36 UTC
(In reply to Lukas Svaty from comment #5)
> Can you provide any verification steps please how to simulate this?

- just try to set odd (not divisible by 2) value as the balloon target, the balloon should get a different (but close enough) amount of memory and there should be no warning

Comment 7 Lukas Svaty 2014-12-22 11:50:28 UTC
tested multiple times on av13.4 seems to be working
if this bug reappears feel free to reopen it

Comment 8 Peter Larsen 2015-01-19 21:25:30 UTC
I can confirm having the exact same problem with RHEVH 3.4

Comment 11 errata-xmlrpc 2015-02-11 18:06:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0158.html


Note You need to log in before you can comment on or make changes to this bug.