Description of problem: After a reboot of RHV-M, the Admin Portal fails to come Up. The browser shows: "500 - Internal Server Error" Version-Release number of selected component (if applicable): 4.4.10 Steps to Reproduce: 1. Have an entry in command_entities like this: engine=> select command_id,status,command_params_class,root_command_id,created_at from command_entities where status ='EXECUTION_FAILED'; command_id | status | command_params_class | root_command_id | created_at --------------------------------------+------------------+-----------------------------------------------------------------+--------------------------------------+------------------------------- f1f989c7-98bb-48eb-b133-06e7118f26b5 | EXECUTION_FAILED | org.ovirt.engine.core.common.action.TransferDiskImageParameters | f1f989c7-98bb-48eb-b133-06e7118f26b5 | 2022-03-03 02:54:22.368456+00 5286e268-051b-440e-90a3-f8fd46f16fe5 | EXECUTION_FAILED | org.ovirt.engine.core.common.action.TransferDiskImageParameters | 5286e268-051b-440e-90a3-f8fd46f16fe5 | 2022-03-03 02:54:30.334549+00 (2 rows) 2. Reboot RHV-M Actual results: Admin Portal doesn't load Expected results: Admin Portal should load Additional info: server.log: =~~~ 2022-03-17 15:54:15,194+01 ERROR [org.jboss.as.controller.management-operation] (Controller Boot Thread) WFLYCTL0013: Operation ("deploy") failed - address: ([("deployment" => "engine.ear")]) - failure description: {"WFLYCTL0080: Failed services" => {"jboss.deployment.subunit.\"engine.ear\".\"bll.jar\".component.InitBackendServicesOnStartupBean.START" => "java.lang.IllegalStateException: WFLYEE0042: Failed to construct component instance Caused by: java.lang.IllegalStateException: WFLYEE0042: Failed to construct component instance Caused by: javax.ejb.EJBException: java.lang.NullPointerException Caused by: java.lang.NullPointerException"}} ... Caused by: java.lang.NullPointerException at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand.getSharedLocks(TransferDiskImageCommand.java:394) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.buildLock(CommandBase.java:1893) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.acquireLockInternal(CommandBase.java:1855) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.acquireLockAsyncTask(CommandBase.java:1846) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.reacquireLocks(CommandBase.java:1834) ~~~
Seems like we should check whether getDiskImage() return null in getSharedLocks as we do in getExclusiveLocks but it would be interesting to see how we ended up in this situation
I think this is a similar issue to bug 2043984 Amar, are the failed transfers were made on a snapshot rather than a disk? If yes, the patches for 2043984 should handle this as the disk id should be always set
(In reply to Benny Zlotnik from comment #3) > I think this is a similar issue to bug 2043984 > > Amar, are the failed transfers were made on a snapshot rather than a disk? > If yes, the patches for 2043984 should handle this as the disk id should be > always set It was checked and indeed the transfers were made on a snapshot Benny, can you please write few words on this in Doc Text field?
(In reply to Arik from comment #6) > (In reply to Benny Zlotnik from comment #3) > > I think this is a similar issue to bug 2043984 > > > > Amar, are the failed transfers were made on a snapshot rather than a disk? > > If yes, the patches for 2043984 should handle this as the disk id should be > > always set > > It was checked and indeed the transfers were made on a snapshot > Benny, can you please write few words on this in Doc Text field? Added The fix for bug 2043984 resolves this as well
1. Start downloading a snapshot[1] 2. While the transfer is still in progress, run systemctl restart ovirt-engine [1] https://github.com/oVirt/python-ovirt-engine-sdk4/blob/2976aa52a6a7a5133ee56e1e8700648b2fcd4a36/examples/download_disk_snapshot.py
Verified. The Admin portal loads successfully after reproducing the same flow. Versions: engine-4.5.1.1-0.14.el8ev vdsm-4.50.1.2-1.el8ev.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: RHV Manager (ovirt-engine) [ovirt-4.5.1] security, bug fix and update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5555