Bug 1696621 - NPE when migrating a VM with missing CPU load
Summary: NPE when migrating a VM with missing CPU load
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: Backend.Core
Version: 4.3.2
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ovirt-4.3.5
: 4.3.5
Assignee: Andrej Krejcir
QA Contact: Polina
URL:
Whiteboard:
: 1714594 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-04-05 09:41 UTC by Andrej Krejcir
Modified: 2019-07-30 14:08 UTC (History)
6 users (show)

Fixed In Version: ovirt-engine-4.3.5
Clone Of:
Environment:
Last Closed: 2019-07-30 14:08:15 UTC
oVirt Team: Virt
Embargoed:
pm-rhel: ovirt-4.3+


Attachments (Terms of Use)
MigrateMultipleVmsCommand NPE logs (526.36 KB, application/gzip)
2019-05-29 10:19 UTC, Polina
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 99190 0 'None' MERGED scheduler: Fix NPE when a running VM has null CPU load. 2020-05-14 13:25:53 UTC
oVirt gerrit 99509 0 'None' MERGED scheduler: Fix NPE when a running VM has null CPU load. 2020-05-14 13:25:53 UTC
oVirt gerrit 100398 0 'None' MERGED scheduler: Fix NPE caused by missing CPU load 2020-05-14 13:25:53 UTC
oVirt gerrit 100544 0 'None' MERGED scheduler: Fix NPE caused by missing CPU load 2020-05-14 13:25:53 UTC

Description Andrej Krejcir 2019-04-05 09:41:52 UTC
Description of problem:

During migration, the scheduler uses the CPU load of a VM. When its missing, a NPE is raised.

Version-Release number of selected component (if applicable):

ovirt-engine-4.3.2.1-1

The bug is also present in 4.2.

Comment 1 Polina 2019-05-12 07:58:44 UTC
verified on ovirt-engine-4.4.0-0.0.master.20190505144126.git46533ec.el7.noarch
by two migration scenarios:
1. migrate VMs with 0% VM CPU load .
2. create memory and cpu load in host while cluster is under evenly_distributed scheduling policy , wait until VMs with 0% CPU load are migrated by scheduler. 
No NPE error in engine.log

Comment 2 Polina 2019-05-20 10:06:09 UTC
verified on the base of https://bugzilla.redhat.com/show_bug.cgi?id=1696621#c1

Comment 3 Andrej Krejcir 2019-05-29 09:07:38 UTC
*** Bug 1714594 has been marked as a duplicate of this bug. ***

Comment 4 Andrej Krejcir 2019-05-29 09:08:27 UTC
Moving to ASSIGNED, because the issue is still not fixed.

Comment 5 RHEL Program Management 2019-05-29 09:08:29 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 6 Polina 2019-05-29 10:19:53 UTC
Created attachment 1574689 [details]
MigrateMultipleVmsCommand NPE logs

add some logs for MigrateMultipleVmsCommand NPE

Comment 7 Polina 2019-05-29 10:32:12 UTC
Hi Andrej,

I saw that you closed the https://bugzilla.redhat.com/show_bug.cgi?id=1714594 as duplicate of the current 1696621. 
I just wanted to note that in the last run (ovirt-engine-4.3.4.1-0.1.el7.noarch) I see the NPE in different scenarios for bll.MigrateMultipleVmsCommand , MigrateVmToServerCommand, MigrateVmCommand. 
And it was not seen in the previous automation runs.

Comment 8 Andrej Krejcir 2019-05-29 14:36:38 UTC
All the commands call scheduler and the code where the NPE happens. This code was only added in version 4.3.4.1, so that can be why the NPE was not seen previously.

Comment 9 Sandro Bonazzola 2019-06-06 07:54:49 UTC
Please note oVirt gerrit 100544 is not included in 4.3.4, does this bug need to be retargeted to 4.3.5?

Comment 10 Sandro Bonazzola 2019-06-06 07:56:06 UTC
moving back to post for better visibility of the issue. If it needs to be in 4.3.4 it's missing a cherry-pick to ovirt-engine-4.3.4.z branch

Comment 11 RHEL Program Management 2019-06-06 07:56:09 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 12 Andrej Krejcir 2019-06-06 08:47:13 UTC
The NPE only happens in an edge case. It should be ok to retarget.

Comment 13 Polina 2019-06-12 09:31:07 UTC
a small update just to be sure we don't miss any scenario: also happens for bll.BalanceVmCommand . 

2019-06-11 18:19:46,332+03 INFO  [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (EE-ManagedThreadFactory-engineScheduled-Thread-18) [50f4b998] Candidate host 'host_mixed_2' ('c1ddb00e-d338-43b7-bf77-840325ad7402') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'VmToHostsAffinityGroups' (correlation id: 50f4b998)
2019-06-11 18:19:46,337+03 ERROR [org.ovirt.engine.core.bll.BalanceVmCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-18) [50f4b998] Command 'org.ovirt.engine.core.bll.BalanceVmCommand' failed: null
2019-06-11 18:19:46,337+03 ERROR [org.ovirt.engine.core.bll.BalanceVmCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-18) [50f4b998] Exception: java.lang.NullPointerException
        at org.ovirt.engine.core.bll.scheduling.SchedulingManager.addPendingResources(SchedulingManager.java:495) [bll.jar:]
        at org.ovirt.engine.core.bll.scheduling.SchedulingManager.schedule(SchedulingManager.java:407) [bll.jar:]
        at org.ovirt.engine.core.bll.scheduling.SchedulingManager.access$100(SchedulingManager.java:97) [bll.jar:]

Comment 14 Polina 2019-06-27 13:28:33 UTC
verified by running all the automation tiers with a lot of migration and balancing cases in ovirt-engine-4.3.5.1-0.1.el7.noarch

Comment 15 Sandro Bonazzola 2019-07-30 14:08:15 UTC
This bugzilla is included in oVirt 4.3.5 release, published on July 30th 2019.

Since the problem described in this bug report should be
resolved in oVirt 4.3.5 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.