Created attachment 772089 [details] logs Description of problem: When engine comes up after it crashed when CreateCloneOfTemplate task has already sent to vdsm, it fails in SetStoragePoolStatusCommand with: 2013-07-11 11:02:07,247 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (pool-5-thread-9) [771247cc] Error in StoragePoolUpEvent - : javax.ejb.EJBException: JBAS014580: Unexpected Error Version-Release number of selected component (if applicable): rhevm-3.3.0-0.6.master.el6ev.noarch How reproducible: 100% Steps to Reproduce: 1. with one host on a block pool, create a template 2. create a vm from the template (cloned) 3. after engine send the CreateCloneOfTemplate command to vdsm, stop ovirt engine and start it after 5 minutes Actual results: Engine fails to SetStoragePoolStatusCommand, and the task is not cleared from SPM. The image remains in LOCKED state forever Expected results: Engine should request from vdsm to delete the image when it comes up Additional info: logs
Elad said that engine looks stuck. Regarding revert- this should be decision of storage team. I'll look first at the other issues (AsyncTaskMgr, SetStoragePoolStatus, etc...). Even if no rollback is done (and we should) - ths system should not get stuck.. Elad, could you do any other operations on the system? Did I understand correctly?
After this issue, engine cannot run any command that calls to open an async task on vdsm It fails with: 2013-07-11 15:34:28,587 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateVDSCommand] (pool-5-thread-48) [6ea12989] FINISH, CreateVDSCommand, log id: 5d03ec41 2013-07-11 15:34:28,587 ERROR [org.ovirt.engine.core.vdsbroker.CreateVmVDSCommand] (pool-5-thread-48) [6ea12989] Error in excuting CreateVmVDSCommand: java.lang.NullPointerException restart to ovirt-engine service does not help. from vdsm side, any async task requested by engine get stuck and does not cleaned. restart to vdsm service does not help either.
user cannot do anything on the system
Moved to MODIFIED by mistake. Still in review.
Engine handles with a failure in CopyImage after it comes up from a crash Verified on RHEVM3.3-IS10: rhevm-3.3.0-0.15.master.el6ev.noarch
Closing - RHEV 3.3 Released