Bug 2068270
| Summary: | RHV-M Admin Portal gives '500 - Internal Server Error" with command_entities in EXECUTION_FAILED status | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | amashah |
| Component: | ovirt-engine | Assignee: | Benny Zlotnik <bzlotnik> |
| Status: | CLOSED ERRATA | QA Contact: | Ilia Markelov <imarkelo> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.4.10 | CC: | ahadas, apinnick, bzlotnik, emarcus, lleistne, sfishbai |
| Target Milestone: | ovirt-4.5.1 | Keywords: | ZStream |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | ovirt-engine-4.5.0.1 | Doc Type: | Bug Fix |
| Doc Text: |
Previously, when downloading snapshots, the disk_id was not set, which caused resumption of the transfer operation to fail because locking requires the disk_id to be set. In this release, the disk_id is always set so that the transfer operation recovers after restart.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-07-14 12:54:31 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Seems like we should check whether getDiskImage() return null in getSharedLocks as we do in getExclusiveLocks but it would be interesting to see how we ended up in this situation I think this is a similar issue to bug 2043984 Amar, are the failed transfers were made on a snapshot rather than a disk? If yes, the patches for 2043984 should handle this as the disk id should be always set (In reply to Benny Zlotnik from comment #3) > I think this is a similar issue to bug 2043984 > > Amar, are the failed transfers were made on a snapshot rather than a disk? > If yes, the patches for 2043984 should handle this as the disk id should be > always set It was checked and indeed the transfers were made on a snapshot Benny, can you please write few words on this in Doc Text field? (In reply to Arik from comment #6) > (In reply to Benny Zlotnik from comment #3) > > I think this is a similar issue to bug 2043984 > > > > Amar, are the failed transfers were made on a snapshot rather than a disk? > > If yes, the patches for 2043984 should handle this as the disk id should be > > always set > > It was checked and indeed the transfers were made on a snapshot > Benny, can you please write few words on this in Doc Text field? Added The fix for bug 2043984 resolves this as well 1. Start downloading a snapshot[1] 2. While the transfer is still in progress, run systemctl restart ovirt-engine [1] https://github.com/oVirt/python-ovirt-engine-sdk4/blob/2976aa52a6a7a5133ee56e1e8700648b2fcd4a36/examples/download_disk_snapshot.py Verified. The Admin portal loads successfully after reproducing the same flow. Versions: engine-4.5.1.1-0.14.el8ev vdsm-4.50.1.2-1.el8ev.x86_64 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: RHV Manager (ovirt-engine) [ovirt-4.5.1] security, bug fix and update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5555 |
Description of problem: After a reboot of RHV-M, the Admin Portal fails to come Up. The browser shows: "500 - Internal Server Error" Version-Release number of selected component (if applicable): 4.4.10 Steps to Reproduce: 1. Have an entry in command_entities like this: engine=> select command_id,status,command_params_class,root_command_id,created_at from command_entities where status ='EXECUTION_FAILED'; command_id | status | command_params_class | root_command_id | created_at --------------------------------------+------------------+-----------------------------------------------------------------+--------------------------------------+------------------------------- f1f989c7-98bb-48eb-b133-06e7118f26b5 | EXECUTION_FAILED | org.ovirt.engine.core.common.action.TransferDiskImageParameters | f1f989c7-98bb-48eb-b133-06e7118f26b5 | 2022-03-03 02:54:22.368456+00 5286e268-051b-440e-90a3-f8fd46f16fe5 | EXECUTION_FAILED | org.ovirt.engine.core.common.action.TransferDiskImageParameters | 5286e268-051b-440e-90a3-f8fd46f16fe5 | 2022-03-03 02:54:30.334549+00 (2 rows) 2. Reboot RHV-M Actual results: Admin Portal doesn't load Expected results: Admin Portal should load Additional info: server.log: =~~~ 2022-03-17 15:54:15,194+01 ERROR [org.jboss.as.controller.management-operation] (Controller Boot Thread) WFLYCTL0013: Operation ("deploy") failed - address: ([("deployment" => "engine.ear")]) - failure description: {"WFLYCTL0080: Failed services" => {"jboss.deployment.subunit.\"engine.ear\".\"bll.jar\".component.InitBackendServicesOnStartupBean.START" => "java.lang.IllegalStateException: WFLYEE0042: Failed to construct component instance Caused by: java.lang.IllegalStateException: WFLYEE0042: Failed to construct component instance Caused by: javax.ejb.EJBException: java.lang.NullPointerException Caused by: java.lang.NullPointerException"}} ... Caused by: java.lang.NullPointerException at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand.getSharedLocks(TransferDiskImageCommand.java:394) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.buildLock(CommandBase.java:1893) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.acquireLockInternal(CommandBase.java:1855) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.acquireLockAsyncTask(CommandBase.java:1846) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.reacquireLocks(CommandBase.java:1834) ~~~