Description of problem: During migration, the scheduler uses the CPU load of a VM. When its missing, a NPE is raised. Version-Release number of selected component (if applicable): ovirt-engine-4.3.2.1-1 The bug is also present in 4.2.
verified on ovirt-engine-4.4.0-0.0.master.20190505144126.git46533ec.el7.noarch by two migration scenarios: 1. migrate VMs with 0% VM CPU load . 2. create memory and cpu load in host while cluster is under evenly_distributed scheduling policy , wait until VMs with 0% CPU load are migrated by scheduler. No NPE error in engine.log
verified on the base of https://bugzilla.redhat.com/show_bug.cgi?id=1696621#c1
*** Bug 1714594 has been marked as a duplicate of this bug. ***
Moving to ASSIGNED, because the issue is still not fixed.
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.
Created attachment 1574689 [details] MigrateMultipleVmsCommand NPE logs add some logs for MigrateMultipleVmsCommand NPE
Hi Andrej, I saw that you closed the https://bugzilla.redhat.com/show_bug.cgi?id=1714594 as duplicate of the current 1696621. I just wanted to note that in the last run (ovirt-engine-4.3.4.1-0.1.el7.noarch) I see the NPE in different scenarios for bll.MigrateMultipleVmsCommand , MigrateVmToServerCommand, MigrateVmCommand. And it was not seen in the previous automation runs.
All the commands call scheduler and the code where the NPE happens. This code was only added in version 4.3.4.1, so that can be why the NPE was not seen previously.
Please note oVirt gerrit 100544 is not included in 4.3.4, does this bug need to be retargeted to 4.3.5?
moving back to post for better visibility of the issue. If it needs to be in 4.3.4 it's missing a cherry-pick to ovirt-engine-4.3.4.z branch
The NPE only happens in an edge case. It should be ok to retarget.
a small update just to be sure we don't miss any scenario: also happens for bll.BalanceVmCommand . 2019-06-11 18:19:46,332+03 INFO [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (EE-ManagedThreadFactory-engineScheduled-Thread-18) [50f4b998] Candidate host 'host_mixed_2' ('c1ddb00e-d338-43b7-bf77-840325ad7402') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'VmToHostsAffinityGroups' (correlation id: 50f4b998) 2019-06-11 18:19:46,337+03 ERROR [org.ovirt.engine.core.bll.BalanceVmCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-18) [50f4b998] Command 'org.ovirt.engine.core.bll.BalanceVmCommand' failed: null 2019-06-11 18:19:46,337+03 ERROR [org.ovirt.engine.core.bll.BalanceVmCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-18) [50f4b998] Exception: java.lang.NullPointerException at org.ovirt.engine.core.bll.scheduling.SchedulingManager.addPendingResources(SchedulingManager.java:495) [bll.jar:] at org.ovirt.engine.core.bll.scheduling.SchedulingManager.schedule(SchedulingManager.java:407) [bll.jar:] at org.ovirt.engine.core.bll.scheduling.SchedulingManager.access$100(SchedulingManager.java:97) [bll.jar:]
verified by running all the automation tiers with a lot of migration and balancing cases in ovirt-engine-4.3.5.1-0.1.el7.noarch
This bugzilla is included in oVirt 4.3.5 release, published on July 30th 2019. Since the problem described in this bug report should be resolved in oVirt 4.3.5 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.