When balloon is enabled the proceedBalloonCheck() in VmsMonitoring accesses vdsmVm entry, but in case vdsm stopped reporting the VM (e.g. when it shut down, migrated away, etc) it caused NPE as no vdsm data are there
monitoring is broken then, VM status in engine doesn't correspond with the actual state (e.g. VM stick in Powering Down forever) Workaround is to disable ballooning in Edit Cluster dialog
steps to reproduce: 1. make sure balloon is enabled in the cluster 2. run vm 3. stop engine 4. stop the vm using "vdsClient -s 0 destroy <vm_id>" on the host that runs the vm 5. start the engine actual result: the status of the vm is not updated, exceptions in engine.log expected result: engine identify the vm is down and update the ui with the correct status
i was able to reproduce the bug on Version: 3.6.0-0.12.master.el6. steps to reproduce : 1. make sure balloon is enabled in the cluster 2. run vm 3. stop engine 4. stop the vm using "vdsClient -s 0 destroy <vm_id>" on the host that runs the vm 5. start the engine the engine logs: 2015-08-30 13:41:57,635 ERROR [org.ovirt.engine.core.vdsbroker.VmsMonitoring] (ForkJoinPool-1-worker-132) [] Failed during vms monitoring on host host_mixed_2 error is: java.lang.NullPointerException 2015-08-30 13:41:57,635 ERROR [org.ovirt.engine.core.vdsbroker.VmsMonitoring] (ForkJoinPool-1-worker-132) [] Exception:: java.lang.NullPointerException at org.ovirt.engine.core.vdsbroker.VmAnalyzer.proceedBalloonCheck(VmAnalyzer.java:359) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.VmAnalyzer.analyze(VmAnalyzer.java:118) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.VmsMonitoring.refreshVmStats(VmsMonitoring.java:215) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.VmsMonitoring.perform(VmsMonitoring.java:147) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.jsonrpc.EventVmStatsRefresher$1.onNext(EventVmStatsRefresher.java:66) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.jsonrpc.EventVmStatsRefresher$1.onNext(EventVmStatsRefresher.java:47) [vdsbroker.jar:] at org.ovirt.vdsm.jsonrpc.client.events.EventPublisher$EventCallable.call(EventPublisher.java:114) [vdsm-jsonrpc-java-client.jar:] at org.ovirt.vdsm.jsonrpc.client.events.EventPublisher$EventCallable.call(EventPublisher.java:89) [vdsm-jsonrpc-java-client.jar:] at java.util.concurrent.ForkJoinTask$AdaptedCallable.exec(ForkJoinTask.java:1288) [rt.jar:1.7.0_85] at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:334) [rt.jar:1.7.0_85] at java.util.concurrent.ForkJoinWorkerThread.execTask(ForkJoinWorkerThread.java:604) [rt.jar:1.7.0_85] at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:784) [rt.jar:1.7.0_85] at java.util.concurrent.ForkJoinPool.work(ForkJoinPool.java:646) [rt.jar:1.7.0_85] at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:398) [rt.jar:1.7.0_85]
i cannot reproduce this, can you please attach the engine.log for this time? also what is your vdsm version ?
Created attachment 1071212 [details] hosts logs
Created attachment 1071213 [details] rhevm logs
i'v attached the log, you can see the error in 13:42 the vdsm version: vdsm-4.17.3-1.el7ev.noarch also you can see the link for the automation test: https://rhev-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/3.6_Dev/job/3.6-GE-compute/208/testReport/junit/rhevmtests.sla.mom.mom_test/004-Balloon_REST;test_e_balloon_no_agent/Balloon_REST_test_e_balloon_no_agent/
i verified this bug on : Red Hat Enterprise Virtualization Manager Version: 3.6.0-0.13.master.el6
oVirt 3.6.0 has been released on November 4th, 2015 and should fix this issue. If problems still persist, please open a new BZ and reference this one.