Bug 983443 - [engine-backend] engine fails to revert a failed cloneImage task, after that, user cannot do anything on the system
Summary: [engine-backend] engine fails to revert a failed cloneImage task, after that,...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.3.0
Hardware: x86_64
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.3.0
Assignee: Yair Zaslavsky
QA Contact: Elad
URL:
Whiteboard: infra
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-07-11 08:51 UTC by Elad
Modified: 2016-02-10 19:10 UTC (History)
9 users (show)

Fixed In Version: is9
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
oVirt Team: Infra
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
logs (1.74 MB, application/x-gzip)
2013-07-11 08:51 UTC, Elad
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 16821 0 None None None Never

Description Elad 2013-07-11 08:51:29 UTC
Created attachment 772089 [details]
logs

Description of problem:
When engine comes up after it crashed when CreateCloneOfTemplate task has already sent to vdsm, it fails in SetStoragePoolStatusCommand with:

2013-07-11 11:02:07,247 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (pool-5-thread-9) [771247cc] Error in StoragePoolUpEvent - : javax.ejb.EJBException: JBAS014580: Unexpected Error


Version-Release number of selected component (if applicable):
rhevm-3.3.0-0.6.master.el6ev.noarch

How reproducible:
100%

Steps to Reproduce:
1. with one host on a block pool, create a template
2. create a vm from the template (cloned)
3. after engine send the CreateCloneOfTemplate command to vdsm, stop ovirt engine and start it after 5 minutes

Actual results:
Engine fails to SetStoragePoolStatusCommand, and the task is not cleared from SPM. The image remains in LOCKED state forever

Expected results:
Engine should request from vdsm to delete the image when it comes up

Additional info:
logs

Comment 1 Yair Zaslavsky 2013-07-11 12:24:50 UTC
Elad said that engine looks stuck.
Regarding revert-  this should be decision of storage team.
I'll look first at the other issues (AsyncTaskMgr, SetStoragePoolStatus, etc...).
Even if no rollback is done (and we should) - ths system should not get stuck..


Elad, could you do any other operations on the system?
Did I understand correctly?

Comment 2 Elad 2013-07-11 12:47:48 UTC
After this issue, engine cannot run any command that calls to open an async task on vdsm
It fails with:
2013-07-11 15:34:28,587 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateVDSCommand] (pool-5-thread-48) [6ea12989] FINISH, CreateVDSCommand, log id: 5d03ec41
2013-07-11 15:34:28,587 ERROR [org.ovirt.engine.core.vdsbroker.CreateVmVDSCommand] (pool-5-thread-48) [6ea12989] Error in excuting CreateVmVDSCommand: java.lang.NullPointerException

restart to ovirt-engine service does not help. 

from vdsm side, any async task requested by engine get stuck and does not cleaned. restart to vdsm service does not help either.

Comment 3 Elad 2013-07-11 12:48:53 UTC
user cannot do anything on the system

Comment 4 Yair Zaslavsky 2013-07-21 14:10:33 UTC
Moved to MODIFIED by mistake.
Still in review.

Comment 5 Elad 2013-08-22 07:29:25 UTC
Engine handles with a failure in CopyImage after it comes up from a crash

Verified on RHEVM3.3-IS10:
rhevm-3.3.0-0.15.master.el6ev.noarch

Comment 6 Itamar Heim 2014-01-21 22:33:02 UTC
Closing - RHEV 3.3 Released

Comment 7 Itamar Heim 2014-01-21 22:33:08 UTC
Closing - RHEV 3.3 Released


Note You need to log in before you can comment on or make changes to this bug.