Bug 1537534 - UpdateVmCommand fails with NullPointerException after failed Hot-plug of a VM lease due to VM power off
Summary: UpdateVmCommand fails with NullPointerException after failed Hot-plug of a VM...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: 4.2.1
Hardware: x86_64
OS: Unspecified
high
high
Target Milestone: ovirt-4.2.2
: 4.2.2.2
Assignee: Eyal Shenitzky
QA Contact: Natalie Gavrielov
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-01-23 13:15 UTC by Elad
Modified: 2018-03-29 11:00 UTC (History)
7 users (show)

Fixed In Version: ovirt-engine-4.2.2
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-03-29 11:00:12 UTC
oVirt Team: Storage
Embargoed:
rule-engine: ovirt-4.2+
ylavi: blocker+


Attachments (Terms of Use)
logs (2.13 MB, application/x-gzip)
2018-01-23 13:15 UTC, Elad
no flags Details
operation canceled screenshot (190.30 KB, image/png)
2018-01-24 09:16 UTC, Eyal Shenitzky
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 87218 0 master MERGED core: validate vm running before setDestroyOnReboot 2021-02-19 09:41:02 UTC

Description Elad 2018-01-23 13:15:30 UTC
Created attachment 1384871 [details]
logs

Description of problem:
VM properties Update fails with NPE after a failure to hot plug a HA lease to the VM due to VM power off.


Version-Release number of selected component (if applicable):


How reproducible:
Reproduced after testing the fix of bug 1522832

Steps to Reproduce:
Same steps from bug 1522832 with an addition of update VM attempt at the end:
1. Create a VM with disk and set the resume behavior to Kill
2. Run the VM
3. Add a lease to the VM and power-off the VM immediately after
4. Try to edit the VM properties (from Webadmin, just open edit VM and click OK)

Actual results:
Update VM properties fails with:

2018-01-23 14:56:43,130+02 INFO  [org.ovirt.engine.core.bll.UpdateVmCommand] (default task-24) [a955a2b8-ae46-4998-8d7e-699f42fe3398] Running command: UpdateVmCommand internal: false. Entities affected :  ID: 28
2c9aa3-af9d-4212-8b78-4b3f7deb8628 Type: VMAction group EDIT_VM_PROPERTIES with role type USER
2018-01-23 14:56:43,160+02 WARN  [org.ovirt.engine.core.utils.ovf.OvfVmWriter] (default task-24) [a955a2b8-ae46-4998-8d7e-699f42fe3398] There are no users with permissions on VM 282c9aa3-af9d-4212-8b78-4b3f7deb8
628 to write
2018-01-23 14:56:43,172+02 ERROR [org.ovirt.engine.core.vdsbroker.ResourceManager] (default task-24) [a955a2b8-ae46-4998-8d7e-699f42fe3398] createCommand failed: null
2018-01-23 14:56:43,172+02 ERROR [org.ovirt.engine.core.vdsbroker.ResourceManager] (default task-24) [a955a2b8-ae46-4998-8d7e-699f42fe3398] Exception: java.lang.reflect.InvocationTargetException

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_161]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_161]
        at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_161]
Caused by: java.lang.NullPointerException
        at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936) [rt.jar:1.8.0_161]

2018-01-23 14:56:43,211+02 ERROR [org.ovirt.engine.core.bll.UpdateVmCommand] (default task-24) [a955a2b8-ae46-4998-8d7e-699f42fe3398] Command 'org.ovirt.engine.core.bll.UpdateVmCommand' failed: null
2018-01-23 14:56:43,211+02 ERROR [org.ovirt.engine.core.bll.UpdateVmCommand] (default task-24) [a955a2b8-ae46-4998-8d7e-699f42fe3398] Exception: java.lang.RuntimeException
        at org.ovirt.engine.core.vdsbroker.ResourceManager.createCommand(ResourceManager.java:344) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:384) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.ResourceManager$Proxy$_$$_WeldSubclass.runVdsCommand$$super(Unknown Source) [vdsbroker.jar:]


Expected results:
Update VM should succeed

Additional info:
logs

Comment 1 Elad 2018-01-23 13:21:48 UTC
Sorry, the fix of bug 1522832 was to rhevm-4.1.9, there doesn't seem to be actual relation between the bug and the fix of bug 1522832, except for the steps to reproduce.


Version-Release number of selected component (if applicable):
rhvm-4.2.1.2-0.1.el7.noarch
vdsm-4.20.14-1.el7ev.x86_64

Comment 2 Yaniv Kaul 2018-01-24 07:50:28 UTC
Why is this high severity? What are the odds and why would you add a lease and immediately power it off?

Comment 3 Elad 2018-01-24 09:12:35 UTC
The severity is high because once the bug occurs, there is no way to update the VM anymore

Comment 4 Eyal Shenitzky 2018-01-24 09:15:33 UTC
Elad,

Something seems unclear to me.
After you edit the VM and add a lease, when you try to edit the VM for the second time (if I understand correctly - before the first edit for add a lease was done) and pressed 'OK' you should receive an error - 'operation canceled' because there is a lock on the VM until the lease will be added (even if you shut down the VM).

I failed to reproduce this issue, I tried to edit the VM once again right after the VM shutdown - received 'operation canceled' error

When I try to edit the VM after the lease was added but the hot-plug failed - all works fine.

I attached a screenshot with the dialog box that should pop when you try to edit the VM while the previous VM edit wasn't done yet.

Comment 5 Eyal Shenitzky 2018-01-24 09:16:20 UTC
Created attachment 1385345 [details]
operation canceled screenshot

Comment 6 Elad 2018-01-24 09:27:20 UTC
(In reply to Eyal Shenitzky from comment #4)
Eyal, just wait for the lease creation task to finish and then try to update the VM.

Comment 7 Elad 2018-01-24 09:44:31 UTC
Steps to reproduce correction:
In step #1, The VM has to be created with default HA configuration (non HA with AUTO_RESUME as resume behavior)

Comment 8 Eyal Shenitzky 2018-01-24 10:41:00 UTC
Ok,

So after the fix in the steps to reproduce, it seems that when we edit the VM and  added a lease, the resume behavior change by default to 'KILL'.
The change in the resume behavior cannot take place only after a reboot of the VM.

When the VM shutdown while it was in the middle of the update and there is a lock on it, ProcessDownVmCommand try to initiate UpdateVmCommand to apply Next Run Config of the VM, but it failed to acquire a lock on the VM.

2018-01-24 12:28:35,500+02 WARN  [org.ovirt.engine.core.bll.ProcessDownVmCommand] (EE-ManagedThreadFactory-engine-Thread-2321) [3afc43b1] Could not acquire lock for UpdateVmCommand to apply Next Run Config of VM '1747e6fb-270e-4882-8175-031805dda4c0'

So in the next time we try to update the VM an exception is thrown when trying to run SetDestroyOnReboot (vm.getRunOnVds() = null).

Comment 9 Eyal Shenitzky 2018-01-24 10:44:05 UTC
Arik, can you please take a look?

Comment 10 Natalie Gavrielov 2018-02-27 15:41:38 UTC
Verified using: 
rhvm-4.2.2.1-0.1.el7.noarch
vdsm-4.20.19-1.el7ev.x86_64

Tested scenario described in comment 0 with the correction in comment 7.

Result: 
UI: Operation Canceled 
Error while executing action: 

test-vm:
Cannot edit VM. VM is being updated.

engine.log: No NPE found.

Comment 11 Sandro Bonazzola 2018-03-29 11:00:12 UTC
This bugzilla is included in oVirt 4.2.2 release, published on March 28th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.2 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.