Bug 983443 - [engine-backend] engine fails to revert a failed cloneImage task, after that, user cannot do anything on the system
[engine-backend] engine fails to revert a failed cloneImage task, after that,...
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine (Show other bugs)
3.3.0
x86_64 Unspecified
unspecified Severity high
: ---
: 3.3.0
Assigned To: Yair Zaslavsky
Elad
infra
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-07-11 04:51 EDT by Elad
Modified: 2016-02-10 14:10 EST (History)
9 users (show)

See Also:
Fixed In Version: is9
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Infra
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
logs (1.74 MB, application/x-gzip)
2013-07-11 04:51 EDT, Elad
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 16821 None None None Never

  None (edit)
Description Elad 2013-07-11 04:51:29 EDT
Created attachment 772089 [details]
logs

Description of problem:
When engine comes up after it crashed when CreateCloneOfTemplate task has already sent to vdsm, it fails in SetStoragePoolStatusCommand with:

2013-07-11 11:02:07,247 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (pool-5-thread-9) [771247cc] Error in StoragePoolUpEvent - : javax.ejb.EJBException: JBAS014580: Unexpected Error


Version-Release number of selected component (if applicable):
rhevm-3.3.0-0.6.master.el6ev.noarch

How reproducible:
100%

Steps to Reproduce:
1. with one host on a block pool, create a template
2. create a vm from the template (cloned)
3. after engine send the CreateCloneOfTemplate command to vdsm, stop ovirt engine and start it after 5 minutes

Actual results:
Engine fails to SetStoragePoolStatusCommand, and the task is not cleared from SPM. The image remains in LOCKED state forever

Expected results:
Engine should request from vdsm to delete the image when it comes up

Additional info:
logs
Comment 1 Yair Zaslavsky 2013-07-11 08:24:50 EDT
Elad said that engine looks stuck.
Regarding revert-  this should be decision of storage team.
I'll look first at the other issues (AsyncTaskMgr, SetStoragePoolStatus, etc...).
Even if no rollback is done (and we should) - ths system should not get stuck..


Elad, could you do any other operations on the system?
Did I understand correctly?
Comment 2 Elad 2013-07-11 08:47:48 EDT
After this issue, engine cannot run any command that calls to open an async task on vdsm
It fails with:
2013-07-11 15:34:28,587 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateVDSCommand] (pool-5-thread-48) [6ea12989] FINISH, CreateVDSCommand, log id: 5d03ec41
2013-07-11 15:34:28,587 ERROR [org.ovirt.engine.core.vdsbroker.CreateVmVDSCommand] (pool-5-thread-48) [6ea12989] Error in excuting CreateVmVDSCommand: java.lang.NullPointerException

restart to ovirt-engine service does not help. 

from vdsm side, any async task requested by engine get stuck and does not cleaned. restart to vdsm service does not help either.
Comment 3 Elad 2013-07-11 08:48:53 EDT
user cannot do anything on the system
Comment 4 Yair Zaslavsky 2013-07-21 10:10:33 EDT
Moved to MODIFIED by mistake.
Still in review.
Comment 5 Elad 2013-08-22 03:29:25 EDT
Engine handles with a failure in CopyImage after it comes up from a crash

Verified on RHEVM3.3-IS10:
rhevm-3.3.0-0.15.master.el6ev.noarch
Comment 6 Itamar Heim 2014-01-21 17:33:02 EST
Closing - RHEV 3.3 Released
Comment 7 Itamar Heim 2014-01-21 17:33:08 EST
Closing - RHEV 3.3 Released

Note You need to log in before you can comment on or make changes to this bug.