Bug 2068270

Summary:	RHV-M Admin Portal gives '500 - Internal Server Error" with command_entities in EXECUTION_FAILED status
Product:	Red Hat Enterprise Virtualization Manager	Reporter:	amashah
Component:	ovirt-engine	Assignee:	Benny Zlotnik <bzlotnik>
Status:	CLOSED ERRATA	QA Contact:	Ilia Markelov <imarkelo>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	4.4.10	CC:	ahadas, apinnick, bzlotnik, emarcus, lleistne, sfishbai
Target Milestone:	ovirt-4.5.1	Keywords:	ZStream
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	ovirt-engine-4.5.0.1	Doc Type:	Bug Fix
Doc Text:	Previously, when downloading snapshots, the disk_id was not set, which caused resumption of the transfer operation to fail because locking requires the disk_id to be set. In this release, the disk_id is always set so that the transfer operation recovers after restart.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2022-07-14 12:54:31 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	Storage	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description amashah 2022-03-24 19:09:22 UTC

Description of problem:
After a reboot of RHV-M, the Admin Portal fails to come Up. The browser shows:
"500 - Internal Server Error"

Version-Release number of selected component (if applicable):
4.4.10


Steps to Reproduce:
1. Have an entry in command_entities like this:

engine=> select command_id,status,command_params_class,root_command_id,created_at from command_entities where status ='EXECUTION_FAILED';
              command_id              |      status      |                      command_params_class                       |           root_command_id            |          created_at           
--------------------------------------+------------------+-----------------------------------------------------------------+--------------------------------------+-------------------------------
 f1f989c7-98bb-48eb-b133-06e7118f26b5 | EXECUTION_FAILED | org.ovirt.engine.core.common.action.TransferDiskImageParameters | f1f989c7-98bb-48eb-b133-06e7118f26b5 | 2022-03-03 02:54:22.368456+00
 5286e268-051b-440e-90a3-f8fd46f16fe5 | EXECUTION_FAILED | org.ovirt.engine.core.common.action.TransferDiskImageParameters | 5286e268-051b-440e-90a3-f8fd46f16fe5 | 2022-03-03 02:54:30.334549+00
(2 rows)

2. Reboot RHV-M


Actual results:
Admin Portal doesn't load

Expected results:
Admin Portal should load

Additional info:

server.log:

=~~~
2022-03-17 15:54:15,194+01 ERROR [org.jboss.as.controller.management-operation] (Controller Boot Thread) WFLYCTL0013: Operation ("deploy") failed - address: ([("deployment" => "engine.ear")]) - failure description: {"WFLYCTL0080: Failed services" => {"jboss.deployment.subunit.\"engine.ear\".\"bll.jar\".component.InitBackendServicesOnStartupBean.START" => "java.lang.IllegalStateException: WFLYEE0042: Failed to construct component instance
    Caused by: java.lang.IllegalStateException: WFLYEE0042: Failed to construct component instance
    Caused by: javax.ejb.EJBException: java.lang.NullPointerException
    Caused by: java.lang.NullPointerException"}}

...

Caused by: java.lang.NullPointerException
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand.getSharedLocks(TransferDiskImageCommand.java:394)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.buildLock(CommandBase.java:1893)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.acquireLockInternal(CommandBase.java:1855)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.acquireLockAsyncTask(CommandBase.java:1846)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.reacquireLocks(CommandBase.java:1834)
~~~

Comment 2 Arik 2022-03-28 11:10:35 UTC

Seems like we should check whether getDiskImage() return null in getSharedLocks as we do in getExclusiveLocks but it would be interesting to see how we ended up in this situation

Comment 3 Benny Zlotnik 2022-03-28 14:05:08 UTC

I think this is a similar issue to bug 2043984

Amar, are the failed transfers were made on a snapshot rather than a disk?
If yes, the patches for 2043984 should handle this as the disk id should be always set

Comment 6 Arik 2022-05-10 21:18:03 UTC

(In reply to Benny Zlotnik from comment #3)
> I think this is a similar issue to bug 2043984
> 
> Amar, are the failed transfers were made on a snapshot rather than a disk?
> If yes, the patches for 2043984 should handle this as the disk id should be
> always set

It was checked and indeed the transfers were made on a snapshot
Benny, can you please write few words on this in Doc Text field?

Comment 7 Benny Zlotnik 2022-05-11 08:01:04 UTC

(In reply to Arik from comment #6)
> (In reply to Benny Zlotnik from comment #3)
> > I think this is a similar issue to bug 2043984
> > 
> > Amar, are the failed transfers were made on a snapshot rather than a disk?
> > If yes, the patches for 2043984 should handle this as the disk id should be
> > always set
> 
> It was checked and indeed the transfers were made on a snapshot
> Benny, can you please write few words on this in Doc Text field?

Added

The fix for bug 2043984 resolves this as well

Comment 12 Benny Zlotnik 2022-06-16 13:51:40 UTC

1. Start downloading a snapshot[1]
2. While the transfer is still in progress, run systemctl restart ovirt-engine



[1] https://github.com/oVirt/python-ovirt-engine-sdk4/blob/2976aa52a6a7a5133ee56e1e8700648b2fcd4a36/examples/download_disk_snapshot.py

Comment 13 Ilia Markelov 2022-06-19 23:11:41 UTC

Verified.

The Admin portal loads successfully after reproducing the same flow.

Versions:
engine-4.5.1.1-0.14.el8ev
vdsm-4.50.1.2-1.el8ev.x86_64

Comment 17 errata-xmlrpc 2022-07-14 12:54:31 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: RHV Manager (ovirt-engine) [ovirt-4.5.1] security, bug fix and update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5555