Bug 807768

Summary: ovirt-engine-backend: orphan object remains in SD if we remove vm that has a disk on the SD while its in maintenance
Product: Red Hat Enterprise Virtualization Manager Reporter: Dafna Ron <dron>
Component: ovirt-engineAssignee: Tal Nisan <tnisan>
Status: CLOSED CURRENTRELEASE QA Contact: Dafna Ron <dron>
Severity: high Docs Contact:
Priority: high    
Version: 3.1.0CC: abaron, amureini, bazulay, dyasny, hateya, iheim, lpeer, Rhev-m-bugs, yeylon, ykaul
Target Milestone: ---   
Target Release: 3.1.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs none

Description Dafna Ron 2012-03-28 16:36:32 UTC
Description of problem:

I removed a vm that has 2 disks on 2 different storage domains while one of the domains was in maintenance. 
no error was aggregated by vdsm so backend removed the vm and image from db while in actuality the image remains on SD. 
as a result we have an orphaned object in the domain.

this only happens when the vm is desktop type. vdsm fails task when the vm is a server.  

Version-Release number of selected component (if applicable):

vdsm-4.9.6-4.5.x86_64

How reproducible:

100%

Steps to Reproduce:
1. create two domains and attach them to a DC
2. add a vm (desktop not server) with two disks on each domain
3. put one domain in maintenance and remove vm 
  
Actual results:

bdsm does not fail the delete of image and backend completes the delete in db creating an orphan object in domain

Expected results:

we should fail the task so backend can rollback 

Additional info:logs will be attached

image ID: 6c1f8d8a-8f88-4372-9546-b7b65921279e

Comment 1 Dafna Ron 2012-03-28 16:38:31 UTC
Created attachment 573395 [details]
logs

Comment 2 Dafna Ron 2012-03-28 16:39:24 UTC
root@blond-vdsh ~]# lvs
  LV                                   VG                                   Attr     LSize   Pool Origin Data%  Move Log Copy%  Convert
  ids                                  83a46a9e-dac2-4513-bb21-a33ff76a495a -wi----- 128.00m                                           
  inbox                                83a46a9e-dac2-4513-bb21-a33ff76a495a -wi----- 128.00m                                           
  leases                               83a46a9e-dac2-4513-bb21-a33ff76a495a -wi-----   2.00g                                           
  master                               83a46a9e-dac2-4513-bb21-a33ff76a495a -wi-ao--   1.00g                                           
  metadata                             83a46a9e-dac2-4513-bb21-a33ff76a495a -wi----- 512.00m                                           
  outbox                               83a46a9e-dac2-4513-bb21-a33ff76a495a -wi----- 128.00m                                           
  29af568c-e767-4217-9610-79e6c0e70602 91fd7b39-198c-4cb8-889e-837103b3c46c -wi-----   1.00g                                           
  6c1f8d8a-8f88-4372-9546-b7b65921279e 91fd7b39-198c-4cb8-889e-837103b3c46c -wi-----   2.00g                                           
  ids                                  91fd7b39-198c-4cb8-889e-837103b3c46c -wi----- 128.00m                                           
  inbox                                91fd7b39-198c-4cb8-889e-837103b3c46c -wi----- 128.00m                                           
  leases                               91fd7b39-198c-4cb8-889e-837103b3c46c -wi-----   2.00g                                           
  master                               91fd7b39-198c-4cb8-889e-837103b3c46c -wi-----   1.00g                                           
  metadata                             91fd7b39-198c-4cb8-889e-837103b3c46c -wi----- 512.00m                                           
  outbox                               91fd7b39-198c-4cb8-889e-837103b3c46c -wi----- 128.00m                                           
  lv_home                              vg0                                  -wi-ao--  38.86g                                           
  lv_root                              vg0                                  -wi-ao--  19.53g                                           
  lv_swap                              vg0                                  -wi-ao--  15.62g                                           
[root@blond-vdsh ~]# !less

Comment 3 RHEL Program Management 2012-05-05 04:16:13 UTC
Since RHEL 6.3 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 4 Ayal Baron 2012-05-05 20:51:00 UTC
No error message returned because vdsm was never called..., why is this on vdsm?
VDSM has no notion of a VM, therefore when you delete one disk, vdsm cannot correlate it to another disk on another domain (which may be unavailable).

Comment 5 Dafna Ron 2012-05-06 08:17:49 UTC
image delete is sent and failed by vdsm, but you are right, it should not be on vdsm since the delete should be validated by backend. 

Thread-906::ERROR::2012-03-28 18:16:21,016::task::853::TaskManager.Task::(_setError) Task=`a8d07180-6c96-47e6-8e96-b67e6f732736`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 861, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 38, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 1257, in deleteImage
    self.validateSdUUID(sdUUID)
  File "/usr/share/vdsm/storage/hsm.py", line 208, in validateSdUUID
    sdCache.produce(sdUUID=sdUUID).validate()
  File "/usr/share/vdsm/storage/sdc.py", line 91, in produce
    dom = self._findDomain(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 115, in _findDomain
    raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist: ('91fd7b39-198c-4cb8-889e-837103b3c46c',)
Thread-906::DEBUG::2012-03-28 18:16:21,017::task::872::TaskManager.Task::(_run) Task=`a8d07180-6c96-47e6-8e96-b67e6f732736`::Task._run: a8d07180-6c96-47e6-8e96-b67e6f732736 ('91fd7b39-198c-4cb8-889e-837103b3c46c', 'aa9c0cc8-94b3-4546-9e3
5-0fd52d3454fd', '9a9606a1-e6e0-4a81-a8b9-56bdf01eccb0', 'false', 'false') {} failed - stopping task
Thread-906::DEBUG::2012-03-28 18:16:21,017::task::1199::TaskManager.Task::(stop) Task=`a8d07180-6c96-47e6-8e96-b67e6f732736`::stopping in state preparing (force False)
Thread-906::DEBUG::2012-03-28 18:16:21,018::task::978::TaskManager.Task::(_decref) Task=`a8d07180-6c96-47e6-8e96-b67e6f732736`::ref 1 aborting True
Thread-906::INFO::2012-03-28 18:16:21,019::task::1157::TaskManager.Task::(prepare) Task=`a8d07180-6c96-47e6-8e96-b67e6f732736`::aborting: Task is aborted: 'Storage domain does not exist' - code 358

Comment 6 Dafna Ron 2012-05-06 08:21:02 UTC
image delete is sent and failed by vdsm, but you are right, it should not be on vdsm since the delete should be validated by backend. 

Thread-906::ERROR::2012-03-28 18:16:21,016::task::853::TaskManager.Task::(_setError) Task=`a8d07180-6c96-47e6-8e96-b67e6f732736`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 861, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 38, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 1257, in deleteImage
    self.validateSdUUID(sdUUID)
  File "/usr/share/vdsm/storage/hsm.py", line 208, in validateSdUUID
    sdCache.produce(sdUUID=sdUUID).validate()
  File "/usr/share/vdsm/storage/sdc.py", line 91, in produce
    dom = self._findDomain(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 115, in _findDomain
    raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist: ('91fd7b39-198c-4cb8-889e-837103b3c46c',)
Thread-906::DEBUG::2012-03-28 18:16:21,017::task::872::TaskManager.Task::(_run) Task=`a8d07180-6c96-47e6-8e96-b67e6f732736`::Task._run: a8d07180-6c96-47e6-8e96-b67e6f732736 ('91fd7b39-198c-4cb8-889e-837103b3c46c', 'aa9c0cc8-94b3-4546-9e3
5-0fd52d3454fd', '9a9606a1-e6e0-4a81-a8b9-56bdf01eccb0', 'false', 'false') {} failed - stopping task
Thread-906::DEBUG::2012-03-28 18:16:21,017::task::1199::TaskManager.Task::(stop) Task=`a8d07180-6c96-47e6-8e96-b67e6f732736`::stopping in state preparing (force False)
Thread-906::DEBUG::2012-03-28 18:16:21,018::task::978::TaskManager.Task::(_decref) Task=`a8d07180-6c96-47e6-8e96-b67e6f732736`::ref 1 aborting True
Thread-906::INFO::2012-03-28 18:16:21,019::task::1157::TaskManager.Task::(prepare) Task=`a8d07180-6c96-47e6-8e96-b67e6f732736`::aborting: Task is aborted: 'Storage domain does not exist' - code 358

Comment 8 Dafna Ron 2012-06-17 14:45:36 UTC
this is partially fixed. 
in preallocated disks we have an error from UI which prevents the command from being sent to vdsm

when the disks are thin provision the command is sent to vdsm. 
currently the vdsm is blocking the delete but its a race. 

please reproduce with thin provion disks 

Thin provision - error is coming from vdsm: 

2012-06-17 17:33:00,883 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-3-thread-43) [2bfc812f] Failed in DeleteImageGroupVDS method
2012-06-17 17:33:00,883 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-3-thread-43) [2bfc812f] Error code StorageDomainDoesNotExist and error message IRSGenericException: IRSErrorException: Failed to DeleteImag
eGroupVDS, error = Storage domain does not exist: ('d8c4e43a-e956-4c56-b9cb-76dc98e3ab9b',)

preallocated -> blocked with CanDoAction:  

2012-06-17 17:32:20,953 WARN  [org.ovirt.engine.core.bll.RemoveVmCommand] (ajp--0.0.0.0-8009-4) CanDoAction of action RemoveVm failed. Reasons:ACTION_TYPE_FAILED_STORAGE_DOMAIN_STATUS_ILLEGAL,VAR__ACTION__REMOVE,VAR__TYPE__VM

Comment 9 Tal Nisan 2012-06-20 13:52:43 UTC
Checked again on si6, also with thin provisioning disks, the command is blocked by the backend and not sent to VDSM

Comment 13 Dafna Ron 2012-07-01 15:28:59 UTC
verified on si8