Bug 1252012

Summary: balloon enabled at cluster level cause NPE in VMs monitoring
Product: [Retired] oVirt Reporter: Michal Skrivanek <michal.skrivanek>
Component: ovirt-engine-coreAssignee: Michal Skrivanek <michal.skrivanek>
Status: CLOSED CURRENTRELEASE QA Contact: Shira Maximov <mshira>
Severity: high Docs Contact:
Priority: high    
Version: 3.6CC: amureini, bugs, ecohen, gklein, lsurette, mgoldboi, mshira, rbalakri, yeylon
Target Milestone: ---   
Target Release: 3.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: virt
Fixed In Version: 3.6.0-10 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-11-04 11:18:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1230208    
Attachments:
Description Flags
hosts logs
none
rhevm logs none

Description Michal Skrivanek 2015-08-10 13:32:48 UTC
When balloon is enabled the proceedBalloonCheck() in VmsMonitoring accesses vdsmVm entry, but in case vdsm stopped reporting the VM (e.g. when it shut down, migrated away, etc) it caused NPE as no vdsm data are there

Comment 1 Michal Skrivanek 2015-08-10 13:33:55 UTC
monitoring is broken then, VM status in engine doesn't correspond with the actual state (e.g. VM stick in Powering Down forever)

Workaround is to disable ballooning in Edit Cluster dialog

Comment 2 Omer Frenkel 2015-08-10 13:41:12 UTC
steps to reproduce:
1. make sure balloon is enabled in the cluster
2. run vm
3. stop engine
4. stop the vm using "vdsClient -s 0 destroy <vm_id>" on the host that runs the vm
5. start the engine

actual result:
the status of the vm is not updated, exceptions in engine.log

expected result:
engine identify the vm is down and update the ui with the correct status

Comment 4 Shira Maximov 2015-08-31 14:27:42 UTC
i was able to reproduce the bug on  Version: 3.6.0-0.12.master.el6. 
steps to reproduce : 
1. make sure balloon is enabled in the cluster
2. run vm
3. stop engine
4. stop the vm using "vdsClient -s 0 destroy <vm_id>" on the host that runs the vm
5. start the engine

the engine logs: 

2015-08-30 13:41:57,635 ERROR [org.ovirt.engine.core.vdsbroker.VmsMonitoring] (ForkJoinPool-1-worker-132) [] Failed during vms monitoring on host host_mixed_2 error is: java.lang.NullPointerException
2015-08-30 13:41:57,635 ERROR [org.ovirt.engine.core.vdsbroker.VmsMonitoring] (ForkJoinPool-1-worker-132) [] Exception:: java.lang.NullPointerException
	at org.ovirt.engine.core.vdsbroker.VmAnalyzer.proceedBalloonCheck(VmAnalyzer.java:359) [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.VmAnalyzer.analyze(VmAnalyzer.java:118) [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.VmsMonitoring.refreshVmStats(VmsMonitoring.java:215) [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.VmsMonitoring.perform(VmsMonitoring.java:147) [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.jsonrpc.EventVmStatsRefresher$1.onNext(EventVmStatsRefresher.java:66) [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.jsonrpc.EventVmStatsRefresher$1.onNext(EventVmStatsRefresher.java:47) [vdsbroker.jar:]
	at org.ovirt.vdsm.jsonrpc.client.events.EventPublisher$EventCallable.call(EventPublisher.java:114) [vdsm-jsonrpc-java-client.jar:]
	at org.ovirt.vdsm.jsonrpc.client.events.EventPublisher$EventCallable.call(EventPublisher.java:89) [vdsm-jsonrpc-java-client.jar:]
	at java.util.concurrent.ForkJoinTask$AdaptedCallable.exec(ForkJoinTask.java:1288) [rt.jar:1.7.0_85]
	at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:334) [rt.jar:1.7.0_85]
	at java.util.concurrent.ForkJoinWorkerThread.execTask(ForkJoinWorkerThread.java:604) [rt.jar:1.7.0_85]
	at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:784) [rt.jar:1.7.0_85]
	at java.util.concurrent.ForkJoinPool.work(ForkJoinPool.java:646) [rt.jar:1.7.0_85]
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:398) [rt.jar:1.7.0_85]

Comment 5 Omer Frenkel 2015-09-01 11:23:43 UTC
i cannot reproduce this, can you please attach the engine.log for this time?
also what is your vdsm version ?

Comment 6 Shira Maximov 2015-09-08 07:15:11 UTC
Created attachment 1071212 [details]
hosts logs

Comment 7 Shira Maximov 2015-09-08 07:15:39 UTC
Created attachment 1071213 [details]
rhevm logs

Comment 8 Shira Maximov 2015-09-08 07:19:08 UTC
i'v attached the log, you can see the error in 13:42

the vdsm version:
vdsm-4.17.3-1.el7ev.noarch

also you can see the link for the automation test: 
https://rhev-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/3.6_Dev/job/3.6-GE-compute/208/testReport/junit/rhevmtests.sla.mom.mom_test/004-Balloon_REST;test_e_balloon_no_agent/Balloon_REST_test_e_balloon_no_agent/

Comment 9 Shira Maximov 2015-09-10 07:51:46 UTC
i verified this bug on : 
Red Hat Enterprise Virtualization Manager Version: 3.6.0-0.13.master.el6

Comment 11 Sandro Bonazzola 2015-11-04 11:18:46 UTC
oVirt 3.6.0 has been released on November 4th, 2015 and should fix this issue.
If problems still persist, please open a new BZ and reference this one.