Bug 2068270 - RHV-M Admin Portal gives '500 - Internal Server Error" with command_entities in EXECUTION_FAILED status
Summary: RHV-M Admin Portal gives '500 - Internal Server Error" with command_entities ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.4.10
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ovirt-4.5.1
: ---
Assignee: Benny Zlotnik
QA Contact: Ilia Markelov
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-03-24 19:09 UTC by amashah
Modified: 2022-08-29 07:00 UTC (History)
6 users (show)

Fixed In Version: ovirt-engine-4.5.0.1
Doc Type: Bug Fix
Doc Text:
Previously, when downloading snapshots, the disk_id was not set, which caused resumption of the transfer operation to fail because locking requires the disk_id to be set. In this release, the disk_id is always set so that the transfer operation recovers after restart.
Clone Of:
Environment:
Last Closed: 2022-07-14 12:54:31 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-45440 0 None None None 2022-03-24 19:22:18 UTC
Red Hat Knowledge Base (Solution) 6841721 0 None None None 2022-03-24 19:18:30 UTC
Red Hat Product Errata RHSA-2022:5555 0 None None None 2022-07-14 12:55:15 UTC

Description amashah 2022-03-24 19:09:22 UTC
Description of problem:
After a reboot of RHV-M, the Admin Portal fails to come Up. The browser shows:
"500 - Internal Server Error"

Version-Release number of selected component (if applicable):
4.4.10


Steps to Reproduce:
1. Have an entry in command_entities like this:

engine=> select command_id,status,command_params_class,root_command_id,created_at from command_entities where status ='EXECUTION_FAILED';
              command_id              |      status      |                      command_params_class                       |           root_command_id            |          created_at           
--------------------------------------+------------------+-----------------------------------------------------------------+--------------------------------------+-------------------------------
 f1f989c7-98bb-48eb-b133-06e7118f26b5 | EXECUTION_FAILED | org.ovirt.engine.core.common.action.TransferDiskImageParameters | f1f989c7-98bb-48eb-b133-06e7118f26b5 | 2022-03-03 02:54:22.368456+00
 5286e268-051b-440e-90a3-f8fd46f16fe5 | EXECUTION_FAILED | org.ovirt.engine.core.common.action.TransferDiskImageParameters | 5286e268-051b-440e-90a3-f8fd46f16fe5 | 2022-03-03 02:54:30.334549+00
(2 rows)

2. Reboot RHV-M


Actual results:
Admin Portal doesn't load

Expected results:
Admin Portal should load

Additional info:

server.log:

=~~~
2022-03-17 15:54:15,194+01 ERROR [org.jboss.as.controller.management-operation] (Controller Boot Thread) WFLYCTL0013: Operation ("deploy") failed - address: ([("deployment" => "engine.ear")]) - failure description: {"WFLYCTL0080: Failed services" => {"jboss.deployment.subunit.\"engine.ear\".\"bll.jar\".component.InitBackendServicesOnStartupBean.START" => "java.lang.IllegalStateException: WFLYEE0042: Failed to construct component instance
    Caused by: java.lang.IllegalStateException: WFLYEE0042: Failed to construct component instance
    Caused by: javax.ejb.EJBException: java.lang.NullPointerException
    Caused by: java.lang.NullPointerException"}}

...

Caused by: java.lang.NullPointerException
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand.getSharedLocks(TransferDiskImageCommand.java:394)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.buildLock(CommandBase.java:1893)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.acquireLockInternal(CommandBase.java:1855)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.acquireLockAsyncTask(CommandBase.java:1846)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.reacquireLocks(CommandBase.java:1834)
~~~

Comment 2 Arik 2022-03-28 11:10:35 UTC
Seems like we should check whether getDiskImage() return null in getSharedLocks as we do in getExclusiveLocks but it would be interesting to see how we ended up in this situation

Comment 3 Benny Zlotnik 2022-03-28 14:05:08 UTC
I think this is a similar issue to bug 2043984

Amar, are the failed transfers were made on a snapshot rather than a disk?
If yes, the patches for 2043984 should handle this as the disk id should be always set

Comment 6 Arik 2022-05-10 21:18:03 UTC
(In reply to Benny Zlotnik from comment #3)
> I think this is a similar issue to bug 2043984
> 
> Amar, are the failed transfers were made on a snapshot rather than a disk?
> If yes, the patches for 2043984 should handle this as the disk id should be
> always set

It was checked and indeed the transfers were made on a snapshot
Benny, can you please write few words on this in Doc Text field?

Comment 7 Benny Zlotnik 2022-05-11 08:01:04 UTC
(In reply to Arik from comment #6)
> (In reply to Benny Zlotnik from comment #3)
> > I think this is a similar issue to bug 2043984
> > 
> > Amar, are the failed transfers were made on a snapshot rather than a disk?
> > If yes, the patches for 2043984 should handle this as the disk id should be
> > always set
> 
> It was checked and indeed the transfers were made on a snapshot
> Benny, can you please write few words on this in Doc Text field?

Added

The fix for bug 2043984 resolves this as well

Comment 12 Benny Zlotnik 2022-06-16 13:51:40 UTC
1. Start downloading a snapshot[1]
2. While the transfer is still in progress, run systemctl restart ovirt-engine



[1] https://github.com/oVirt/python-ovirt-engine-sdk4/blob/2976aa52a6a7a5133ee56e1e8700648b2fcd4a36/examples/download_disk_snapshot.py

Comment 13 Ilia Markelov 2022-06-19 23:11:41 UTC
Verified.

The Admin portal loads successfully after reproducing the same flow.

Versions:
engine-4.5.1.1-0.14.el8ev
vdsm-4.50.1.2-1.el8ev.x86_64

Comment 17 errata-xmlrpc 2022-07-14 12:54:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: RHV Manager (ovirt-engine) [ovirt-4.5.1] security, bug fix and update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5555


Note You need to log in before you can comment on or make changes to this bug.