918652 – engine: failed DeleteImageCommand by VDSM ignored by engine (disks are removed from database although vdsm failed to delete them)

Bug 918652 - engine: failed DeleteImageCommand by VDSM ignored by engine (disks are removed from database although vdsm failed to delete them)

Summary: engine: failed DeleteImageCommand by VDSM ignored by engine (disks are remove...

Keywords:
Status:	CLOSED DUPLICATE of bug 918469
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	ovirt-engine
Sub Component:
Version:	3.2.0
Hardware:	x86_64
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	3.2.0
Assignee:	Ayal Baron
QA Contact:	Elad
Docs Contact:
URL:
Whiteboard:	storage
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-03-06 16:49 UTC by Elad
Modified:	2016-02-10 20:08 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-03-25 08:53:20 UTC
oVirt Team:	Storage
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
logs (377.23 KB, application/x-gzip) 2013-03-06 16:49 UTC, Elad	no flags	Details
vdsm log (699.98 KB, application/x-gzip) 2013-03-14 09:12 UTC, Elad	no flags	Details
vdsm log (7.27 MB, application/octet-stream) 2013-03-21 17:17 UTC, Elad	no flags	Details
engine log (1.42 MB, application/octet-stream) 2013-03-21 17:18 UTC, Elad	no flags	Details
engine log. (439.05 KB, application/octet-stream) 2013-03-25 07:30 UTC, Haim	no flags	Details
View All

Description Elad 2013-03-06 16:49:07 UTC

Created attachment 706137 [details]
logs

Description of problem:

When VDSM DeleteImage command is failing, VDSM updates the engine using task polling about the failure that task was ended with failure 100 but engine ignores it and removes the disks from database, which creates sync loss between engine and hypervisor, in my case, i wondered why my storage still consumes all its sapce altough i deleted all my VMs and disks.

Version-Release number of selected component (if applicable):

rhevm-backend-3.2.0-10.10.beta1.el6ev.noarch
vdsm-4.10.2-10.0.el6ev.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Have VM with 1 or more disks.
2. remove the disks 
3. lvremove command by VDSM will fail and VDSM will answer engine with 'code 100'. (related to bug #918469)



  
Actual results:
engine will ignore the 'code 100' answer by VDSM and will delete the disks  from its database.

Expected results:
engine should relate to VDSM 'code 100' massage and:
1) not delete the disks from its database
2) notice the user about VDSM failed DeleteImage command.

Additional info:
see attatched logs

Comment 1 Yaniv Kaul 2013-03-07 07:26:57 UTC

One would expect from VDSM to return something better than 100 (which AFAIK, is "General Exception").

Comment 3 Ayal Baron 2013-03-10 11:34:19 UTC

The vdsm log attached is irrelevant (I'm guessing it rolled).
Please attach proper log.

Wrt General Exception, that is what vdsm returns when an unexpected error is thrown in dispatcher.py
Why that doesn't contain the message thrown in the unhandled exception I do not know, but that is not storage related.

Wrt the specific error that is not handled in this case I cannot say without logs.

Comment 4 Elad 2013-03-14 09:12:11 UTC

Created attachment 709929 [details]
vdsm log

Comment 5 Ayal Baron 2013-03-17 10:16:51 UTC

There is no 'lvremove' command in the attached log.  Again probably wrong log?

Comment 6 Elad 2013-03-21 17:17:58 UTC

Created attachment 714004 [details]
vdsm log

Comment 7 Elad 2013-03-21 17:18:32 UTC

Created attachment 714005 [details]
engine log

Comment 8 Ayal Baron 2013-03-24 23:11:18 UTC

The vdsm.log is still the wrong log.  It does not contain the General Exception.
There is an lvremove which failed due to storage issues, but by design that does not cause the operation to fail (as can be seen by the return value of the command).

The referenced bug 918469 has been fixed already.
If you see this again reopen.

Comment 9 Haim 2013-03-25 07:30:48 UTC

Created attachment 715886 [details]
engine log.

attaching the correct engine log.

please refer to: 

2013-03-06 17:02:46,306 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (QuartzScheduler_Worker-37) [4f5962cb] Error code GeneralException and error message
 VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = 'module' object has no attribute 'CMD_LOWPRIO'

disks are removed from data-base and not removed from hypervisor.

Comment 10 Ayal Baron 2013-03-25 08:53:20 UTC

(In reply to comment #9)
> Created attachment 715886 [details]
> engine log.
> 
> attaching the correct engine log.
> 
> please refer to: 
> 
> 2013-03-06 17:02:46,306 ERROR
> [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase]
> (QuartzScheduler_Worker-37) [4f5962cb] Error code GeneralException and error
> message
>  VDSGenericException: VDSErrorException: Failed to
> HSMGetAllTasksStatusesVDS, error = 'module' object has no attribute
> 'CMD_LOWPRIO'

As noted above, this problem was fixed in bug 918469

*** This bug has been marked as a duplicate of bug 918469 ***

Note You need to log in before you can comment on or make changes to this bug.