Bug 1252012 - balloon enabled at cluster level cause NPE in VMs monitoring
balloon enabled at cluster level cause NPE in VMs monitoring
Status: CLOSED CURRENTRELEASE
Product: oVirt
Classification: Community
Component: ovirt-engine-core (Show other bugs)
3.6
Unspecified Unspecified
high Severity high
: ---
: 3.6.0
Assigned To: Michal Skrivanek
Shira Maximov
virt
:
Depends On:
Blocks: 1230208
  Show dependency treegraph
 
Reported: 2015-08-10 09:32 EDT by Michal Skrivanek
Modified: 2016-02-10 14:50 EST (History)
9 users (show)

See Also:
Fixed In Version: 3.6.0-10
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-11-04 06:18:46 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Virt
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
hosts logs (851.48 KB, application/x-bzip)
2015-09-08 03:15 EDT, Shira Maximov
no flags Details
rhevm logs (334.53 KB, application/x-bzip)
2015-09-08 03:15 EDT, Shira Maximov
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 44635 master MERGED core: add missing null check on proceedBalloonCheck Never
oVirt gerrit 44705 ovirt-engine-3.6 MERGED core: add missing null check on proceedBalloonCheck Never
oVirt gerrit 45109 master MERGED core: add missing null check for balloon info Never
oVirt gerrit 45125 ovirt-engine-3.6 MERGED core: add missing null check for balloon info Never

  None (edit)
Description Michal Skrivanek 2015-08-10 09:32:48 EDT
When balloon is enabled the proceedBalloonCheck() in VmsMonitoring accesses vdsmVm entry, but in case vdsm stopped reporting the VM (e.g. when it shut down, migrated away, etc) it caused NPE as no vdsm data are there
Comment 1 Michal Skrivanek 2015-08-10 09:33:55 EDT
monitoring is broken then, VM status in engine doesn't correspond with the actual state (e.g. VM stick in Powering Down forever)

Workaround is to disable ballooning in Edit Cluster dialog
Comment 2 Omer Frenkel 2015-08-10 09:41:12 EDT
steps to reproduce:
1. make sure balloon is enabled in the cluster
2. run vm
3. stop engine
4. stop the vm using "vdsClient -s 0 destroy <vm_id>" on the host that runs the vm
5. start the engine

actual result:
the status of the vm is not updated, exceptions in engine.log

expected result:
engine identify the vm is down and update the ui with the correct status
Comment 4 Shira Maximov 2015-08-31 10:27:42 EDT
i was able to reproduce the bug on  Version: 3.6.0-0.12.master.el6. 
steps to reproduce : 
1. make sure balloon is enabled in the cluster
2. run vm
3. stop engine
4. stop the vm using "vdsClient -s 0 destroy <vm_id>" on the host that runs the vm
5. start the engine

the engine logs: 

2015-08-30 13:41:57,635 ERROR [org.ovirt.engine.core.vdsbroker.VmsMonitoring] (ForkJoinPool-1-worker-132) [] Failed during vms monitoring on host host_mixed_2 error is: java.lang.NullPointerException
2015-08-30 13:41:57,635 ERROR [org.ovirt.engine.core.vdsbroker.VmsMonitoring] (ForkJoinPool-1-worker-132) [] Exception:: java.lang.NullPointerException
	at org.ovirt.engine.core.vdsbroker.VmAnalyzer.proceedBalloonCheck(VmAnalyzer.java:359) [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.VmAnalyzer.analyze(VmAnalyzer.java:118) [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.VmsMonitoring.refreshVmStats(VmsMonitoring.java:215) [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.VmsMonitoring.perform(VmsMonitoring.java:147) [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.jsonrpc.EventVmStatsRefresher$1.onNext(EventVmStatsRefresher.java:66) [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.jsonrpc.EventVmStatsRefresher$1.onNext(EventVmStatsRefresher.java:47) [vdsbroker.jar:]
	at org.ovirt.vdsm.jsonrpc.client.events.EventPublisher$EventCallable.call(EventPublisher.java:114) [vdsm-jsonrpc-java-client.jar:]
	at org.ovirt.vdsm.jsonrpc.client.events.EventPublisher$EventCallable.call(EventPublisher.java:89) [vdsm-jsonrpc-java-client.jar:]
	at java.util.concurrent.ForkJoinTask$AdaptedCallable.exec(ForkJoinTask.java:1288) [rt.jar:1.7.0_85]
	at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:334) [rt.jar:1.7.0_85]
	at java.util.concurrent.ForkJoinWorkerThread.execTask(ForkJoinWorkerThread.java:604) [rt.jar:1.7.0_85]
	at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:784) [rt.jar:1.7.0_85]
	at java.util.concurrent.ForkJoinPool.work(ForkJoinPool.java:646) [rt.jar:1.7.0_85]
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:398) [rt.jar:1.7.0_85]
Comment 5 Omer Frenkel 2015-09-01 07:23:43 EDT
i cannot reproduce this, can you please attach the engine.log for this time?
also what is your vdsm version ?
Comment 6 Shira Maximov 2015-09-08 03:15:11 EDT
Created attachment 1071212 [details]
hosts logs
Comment 7 Shira Maximov 2015-09-08 03:15:39 EDT
Created attachment 1071213 [details]
rhevm logs
Comment 8 Shira Maximov 2015-09-08 03:19:08 EDT
i'v attached the log, you can see the error in 13:42

the vdsm version:
vdsm-4.17.3-1.el7ev.noarch

also you can see the link for the automation test: 
https://rhev-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/3.6_Dev/job/3.6-GE-compute/208/testReport/junit/rhevmtests.sla.mom.mom_test/004-Balloon_REST;test_e_balloon_no_agent/Balloon_REST_test_e_balloon_no_agent/
Comment 9 Shira Maximov 2015-09-10 03:51:46 EDT
i verified this bug on : 
Red Hat Enterprise Virtualization Manager Version: 3.6.0-0.13.master.el6
Comment 11 Sandro Bonazzola 2015-11-04 06:18:46 EST
oVirt 3.6.0 has been released on November 4th, 2015 and should fix this issue.
If problems still persist, please open a new BZ and reference this one.

Note You need to log in before you can comment on or make changes to this bug.