Created attachment 1384871 [details] logs Description of problem: VM properties Update fails with NPE after a failure to hot plug a HA lease to the VM due to VM power off. Version-Release number of selected component (if applicable): How reproducible: Reproduced after testing the fix of bug 1522832 Steps to Reproduce: Same steps from bug 1522832 with an addition of update VM attempt at the end: 1. Create a VM with disk and set the resume behavior to Kill 2. Run the VM 3. Add a lease to the VM and power-off the VM immediately after 4. Try to edit the VM properties (from Webadmin, just open edit VM and click OK) Actual results: Update VM properties fails with: 2018-01-23 14:56:43,130+02 INFO [org.ovirt.engine.core.bll.UpdateVmCommand] (default task-24) [a955a2b8-ae46-4998-8d7e-699f42fe3398] Running command: UpdateVmCommand internal: false. Entities affected : ID: 28 2c9aa3-af9d-4212-8b78-4b3f7deb8628 Type: VMAction group EDIT_VM_PROPERTIES with role type USER 2018-01-23 14:56:43,160+02 WARN [org.ovirt.engine.core.utils.ovf.OvfVmWriter] (default task-24) [a955a2b8-ae46-4998-8d7e-699f42fe3398] There are no users with permissions on VM 282c9aa3-af9d-4212-8b78-4b3f7deb8 628 to write 2018-01-23 14:56:43,172+02 ERROR [org.ovirt.engine.core.vdsbroker.ResourceManager] (default task-24) [a955a2b8-ae46-4998-8d7e-699f42fe3398] createCommand failed: null 2018-01-23 14:56:43,172+02 ERROR [org.ovirt.engine.core.vdsbroker.ResourceManager] (default task-24) [a955a2b8-ae46-4998-8d7e-699f42fe3398] Exception: java.lang.reflect.InvocationTargetException at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_161] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_161] at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_161] Caused by: java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936) [rt.jar:1.8.0_161] 2018-01-23 14:56:43,211+02 ERROR [org.ovirt.engine.core.bll.UpdateVmCommand] (default task-24) [a955a2b8-ae46-4998-8d7e-699f42fe3398] Command 'org.ovirt.engine.core.bll.UpdateVmCommand' failed: null 2018-01-23 14:56:43,211+02 ERROR [org.ovirt.engine.core.bll.UpdateVmCommand] (default task-24) [a955a2b8-ae46-4998-8d7e-699f42fe3398] Exception: java.lang.RuntimeException at org.ovirt.engine.core.vdsbroker.ResourceManager.createCommand(ResourceManager.java:344) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:384) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.ResourceManager$Proxy$_$$_WeldSubclass.runVdsCommand$$super(Unknown Source) [vdsbroker.jar:] Expected results: Update VM should succeed Additional info: logs
Sorry, the fix of bug 1522832 was to rhevm-4.1.9, there doesn't seem to be actual relation between the bug and the fix of bug 1522832, except for the steps to reproduce. Version-Release number of selected component (if applicable): rhvm-4.2.1.2-0.1.el7.noarch vdsm-4.20.14-1.el7ev.x86_64
Why is this high severity? What are the odds and why would you add a lease and immediately power it off?
The severity is high because once the bug occurs, there is no way to update the VM anymore
Elad, Something seems unclear to me. After you edit the VM and add a lease, when you try to edit the VM for the second time (if I understand correctly - before the first edit for add a lease was done) and pressed 'OK' you should receive an error - 'operation canceled' because there is a lock on the VM until the lease will be added (even if you shut down the VM). I failed to reproduce this issue, I tried to edit the VM once again right after the VM shutdown - received 'operation canceled' error When I try to edit the VM after the lease was added but the hot-plug failed - all works fine. I attached a screenshot with the dialog box that should pop when you try to edit the VM while the previous VM edit wasn't done yet.
Created attachment 1385345 [details] operation canceled screenshot
(In reply to Eyal Shenitzky from comment #4) Eyal, just wait for the lease creation task to finish and then try to update the VM.
Steps to reproduce correction: In step #1, The VM has to be created with default HA configuration (non HA with AUTO_RESUME as resume behavior)
Ok, So after the fix in the steps to reproduce, it seems that when we edit the VM and added a lease, the resume behavior change by default to 'KILL'. The change in the resume behavior cannot take place only after a reboot of the VM. When the VM shutdown while it was in the middle of the update and there is a lock on it, ProcessDownVmCommand try to initiate UpdateVmCommand to apply Next Run Config of the VM, but it failed to acquire a lock on the VM. 2018-01-24 12:28:35,500+02 WARN [org.ovirt.engine.core.bll.ProcessDownVmCommand] (EE-ManagedThreadFactory-engine-Thread-2321) [3afc43b1] Could not acquire lock for UpdateVmCommand to apply Next Run Config of VM '1747e6fb-270e-4882-8175-031805dda4c0' So in the next time we try to update the VM an exception is thrown when trying to run SetDestroyOnReboot (vm.getRunOnVds() = null).
Arik, can you please take a look?
Verified using: rhvm-4.2.2.1-0.1.el7.noarch vdsm-4.20.19-1.el7ev.x86_64 Tested scenario described in comment 0 with the correction in comment 7. Result: UI: Operation Canceled Error while executing action: test-vm: Cannot edit VM. VM is being updated. engine.log: No NPE found.
This bugzilla is included in oVirt 4.2.2 release, published on March 28th 2018. Since the problem described in this bug report should be resolved in oVirt 4.2.2 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.