Bug 1252012 - balloon enabled at cluster level cause NPE in VMs monitoring
Summary: balloon enabled at cluster level cause NPE in VMs monitoring
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: oVirt
Classification: Retired
Component: ovirt-engine-core
Version: 3.6
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 3.6.0
Assignee: Michal Skrivanek
QA Contact: Shira Maximov
URL:
Whiteboard: virt
Depends On:
Blocks: 1230208
TreeView+ depends on / blocked
 
Reported: 2015-08-10 13:32 UTC by Michal Skrivanek
Modified: 2016-02-10 19:50 UTC (History)
9 users (show)

Fixed In Version: 3.6.0-10
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-11-04 11:18:46 UTC
oVirt Team: Virt
Embargoed:


Attachments (Terms of Use)
hosts logs (851.48 KB, application/x-bzip)
2015-09-08 07:15 UTC, Shira Maximov
no flags Details
rhevm logs (334.53 KB, application/x-bzip)
2015-09-08 07:15 UTC, Shira Maximov
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 44635 0 master MERGED core: add missing null check on proceedBalloonCheck Never
oVirt gerrit 44705 0 ovirt-engine-3.6 MERGED core: add missing null check on proceedBalloonCheck Never
oVirt gerrit 45109 0 master MERGED core: add missing null check for balloon info Never
oVirt gerrit 45125 0 ovirt-engine-3.6 MERGED core: add missing null check for balloon info Never

Description Michal Skrivanek 2015-08-10 13:32:48 UTC
When balloon is enabled the proceedBalloonCheck() in VmsMonitoring accesses vdsmVm entry, but in case vdsm stopped reporting the VM (e.g. when it shut down, migrated away, etc) it caused NPE as no vdsm data are there

Comment 1 Michal Skrivanek 2015-08-10 13:33:55 UTC
monitoring is broken then, VM status in engine doesn't correspond with the actual state (e.g. VM stick in Powering Down forever)

Workaround is to disable ballooning in Edit Cluster dialog

Comment 2 Omer Frenkel 2015-08-10 13:41:12 UTC
steps to reproduce:
1. make sure balloon is enabled in the cluster
2. run vm
3. stop engine
4. stop the vm using "vdsClient -s 0 destroy <vm_id>" on the host that runs the vm
5. start the engine

actual result:
the status of the vm is not updated, exceptions in engine.log

expected result:
engine identify the vm is down and update the ui with the correct status

Comment 4 Shira Maximov 2015-08-31 14:27:42 UTC
i was able to reproduce the bug on  Version: 3.6.0-0.12.master.el6. 
steps to reproduce : 
1. make sure balloon is enabled in the cluster
2. run vm
3. stop engine
4. stop the vm using "vdsClient -s 0 destroy <vm_id>" on the host that runs the vm
5. start the engine

the engine logs: 

2015-08-30 13:41:57,635 ERROR [org.ovirt.engine.core.vdsbroker.VmsMonitoring] (ForkJoinPool-1-worker-132) [] Failed during vms monitoring on host host_mixed_2 error is: java.lang.NullPointerException
2015-08-30 13:41:57,635 ERROR [org.ovirt.engine.core.vdsbroker.VmsMonitoring] (ForkJoinPool-1-worker-132) [] Exception:: java.lang.NullPointerException
	at org.ovirt.engine.core.vdsbroker.VmAnalyzer.proceedBalloonCheck(VmAnalyzer.java:359) [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.VmAnalyzer.analyze(VmAnalyzer.java:118) [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.VmsMonitoring.refreshVmStats(VmsMonitoring.java:215) [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.VmsMonitoring.perform(VmsMonitoring.java:147) [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.jsonrpc.EventVmStatsRefresher$1.onNext(EventVmStatsRefresher.java:66) [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.jsonrpc.EventVmStatsRefresher$1.onNext(EventVmStatsRefresher.java:47) [vdsbroker.jar:]
	at org.ovirt.vdsm.jsonrpc.client.events.EventPublisher$EventCallable.call(EventPublisher.java:114) [vdsm-jsonrpc-java-client.jar:]
	at org.ovirt.vdsm.jsonrpc.client.events.EventPublisher$EventCallable.call(EventPublisher.java:89) [vdsm-jsonrpc-java-client.jar:]
	at java.util.concurrent.ForkJoinTask$AdaptedCallable.exec(ForkJoinTask.java:1288) [rt.jar:1.7.0_85]
	at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:334) [rt.jar:1.7.0_85]
	at java.util.concurrent.ForkJoinWorkerThread.execTask(ForkJoinWorkerThread.java:604) [rt.jar:1.7.0_85]
	at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:784) [rt.jar:1.7.0_85]
	at java.util.concurrent.ForkJoinPool.work(ForkJoinPool.java:646) [rt.jar:1.7.0_85]
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:398) [rt.jar:1.7.0_85]

Comment 5 Omer Frenkel 2015-09-01 11:23:43 UTC
i cannot reproduce this, can you please attach the engine.log for this time?
also what is your vdsm version ?

Comment 6 Shira Maximov 2015-09-08 07:15:11 UTC
Created attachment 1071212 [details]
hosts logs

Comment 7 Shira Maximov 2015-09-08 07:15:39 UTC
Created attachment 1071213 [details]
rhevm logs

Comment 8 Shira Maximov 2015-09-08 07:19:08 UTC
i'v attached the log, you can see the error in 13:42

the vdsm version:
vdsm-4.17.3-1.el7ev.noarch

also you can see the link for the automation test: 
https://rhev-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/3.6_Dev/job/3.6-GE-compute/208/testReport/junit/rhevmtests.sla.mom.mom_test/004-Balloon_REST;test_e_balloon_no_agent/Balloon_REST_test_e_balloon_no_agent/

Comment 9 Shira Maximov 2015-09-10 07:51:46 UTC
i verified this bug on : 
Red Hat Enterprise Virtualization Manager Version: 3.6.0-0.13.master.el6

Comment 11 Sandro Bonazzola 2015-11-04 11:18:46 UTC
oVirt 3.6.0 has been released on November 4th, 2015 and should fix this issue.
If problems still persist, please open a new BZ and reference this one.


Note You need to log in before you can comment on or make changes to this bug.