Red Hat Bugzilla – Bug 983443
[engine-backend] engine fails to revert a failed cloneImage task, after that, user cannot do anything on the system
Last modified: 2016-02-10 14:10:00 EST
Created attachment 772089 [details]
Description of problem:
When engine comes up after it crashed when CreateCloneOfTemplate task has already sent to vdsm, it fails in SetStoragePoolStatusCommand with:
2013-07-11 11:02:07,247 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (pool-5-thread-9) [771247cc] Error in StoragePoolUpEvent - : javax.ejb.EJBException: JBAS014580: Unexpected Error
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. with one host on a block pool, create a template
2. create a vm from the template (cloned)
3. after engine send the CreateCloneOfTemplate command to vdsm, stop ovirt engine and start it after 5 minutes
Engine fails to SetStoragePoolStatusCommand, and the task is not cleared from SPM. The image remains in LOCKED state forever
Engine should request from vdsm to delete the image when it comes up
Elad said that engine looks stuck.
Regarding revert- this should be decision of storage team.
I'll look first at the other issues (AsyncTaskMgr, SetStoragePoolStatus, etc...).
Even if no rollback is done (and we should) - ths system should not get stuck..
Elad, could you do any other operations on the system?
Did I understand correctly?
After this issue, engine cannot run any command that calls to open an async task on vdsm
It fails with:
2013-07-11 15:34:28,587 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateVDSCommand] (pool-5-thread-48) [6ea12989] FINISH, CreateVDSCommand, log id: 5d03ec41
2013-07-11 15:34:28,587 ERROR [org.ovirt.engine.core.vdsbroker.CreateVmVDSCommand] (pool-5-thread-48) [6ea12989] Error in excuting CreateVmVDSCommand: java.lang.NullPointerException
restart to ovirt-engine service does not help.
from vdsm side, any async task requested by engine get stuck and does not cleaned. restart to vdsm service does not help either.
user cannot do anything on the system
Moved to MODIFIED by mistake.
Still in review.
Engine handles with a failure in CopyImage after it comes up from a crash
Verified on RHEVM3.3-IS10:
Closing - RHEV 3.3 Released