Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1104723

Summary: Host crashes and recovers after memory is consumed
Product: Red Hat Enterprise Virtualization Manager Reporter: Lukas Svaty <lsvaty>
Component: ovirt-engineAssignee: Martin Sivák <msivak>
Status: CLOSED NOTABUG QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 3.4.0CC: acathrow, dfediuck, gchaplik, gklein, iheim, lpeer, lsvaty, msivak, Rhev-m-bugs, yeylon
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: All   
Whiteboard: sla
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-06-25 12:11:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: SLA RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
vdsm log of actions
none
engine log of actions none

Description Lukas Svaty 2014-06-04 14:40:34 UTC
Description of problem:
After allocating bigger amount of memory on host, host crashes and recovers successfully. 
[~/logs] $ vdsm.log | grep "ERROR"
Thread-13::ERROR::2014-06-04 16:24:01,775::sdc::137::Storage.StorageDomainCache::(_findDomain) looking for unfetched domain 5a24b606-1e50-442e-8960-3768f4374379
Thread-13::ERROR::2014-06-04 16:24:01,775::sdc::154::Storage.StorageDomainCache::(_findUnfetchedDomain) looking for domain 5a24b606-1e50-442e-8960-3768f4374379
[~/logs] $ cat engine.log | grep "ERROR"
2014-06-04 16:24:00,196 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetStatsVDSCommand] (DefaultQuartzScheduler_Worker-11) [28727ffa] Command GetStatsVDSCommand(HostName = 10.34.62.204, HostId = a95011a1-ae55-4e47-ac11-26180458e77f, vds=Host[10.34.62.204,a95011a1-ae55-4e47-ac11-26180458e77f]) execution failed. Exception: VDSNetworkException: java.net.ConnectException: Connection refused
2014-06-04 16:24:00,199 ERROR [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-11) [28727ffa] vds::refreshVdsStats Failed getVdsStats,  vds = a95011a1-ae55-4e47-ac11-26180458e77f : 10.34.62.204, error = VDSNetworkException: java.net.ConnectException: Connection refused

Version-Release number of selected component (if applicable):
av9.4

How reproducible:
100%

Steps to Reproduce:
1. enable balloon, disable KSM on cluster/VM
2. Run a VM with Memory 2GB / Guaranteed memory 1GB
3. consume memory on host to 90%

Actual results:
Host crashes and recover from it successfully.
Events (in some cases only the first one appears for example in logs):
State was set to Up for host 10.34.62.204.
Host 10.34.62.204 is initializing. Message: Recovering from crash or Initializing

Expected results:
Host should work while it has available memory.

Additional info:
Attaching vdsm.log and engine.log of actions.
In the end the event 
"The Balloon driver on VM balloon-1 on host 10.34.62.204 is requested but unavailable."
appeared, checked VM and guest-agent was still running.

Comment 1 Lukas Svaty 2014-06-04 14:41:00 UTC
Created attachment 902209 [details]
vdsm log of actions

Comment 2 Lukas Svaty 2014-06-04 14:41:20 UTC
Created attachment 902210 [details]
engine log of actions

Comment 3 Lukas Svaty 2014-06-25 12:11:38 UTC
Consulted with msivak. Problem was that host had 99% cpu consumed by external process. And response time was too big. As this is not a bug on our environment, but in our tests -> CLOSED - NOTABUG