Hide Forgot
Description of problem: I removed a vm that has 2 disks on 2 different storage domains while one of the domains was in maintenance. no error was aggregated by vdsm so backend removed the vm and image from db while in actuality the image remains on SD. as a result we have an orphaned object in the domain. this only happens when the vm is desktop type. vdsm fails task when the vm is a server. Version-Release number of selected component (if applicable): vdsm-4.9.6-4.5.x86_64 How reproducible: 100% Steps to Reproduce: 1. create two domains and attach them to a DC 2. add a vm (desktop not server) with two disks on each domain 3. put one domain in maintenance and remove vm Actual results: bdsm does not fail the delete of image and backend completes the delete in db creating an orphan object in domain Expected results: we should fail the task so backend can rollback Additional info:logs will be attached image ID: 6c1f8d8a-8f88-4372-9546-b7b65921279e
Created attachment 573395 [details] logs
root@blond-vdsh ~]# lvs LV VG Attr LSize Pool Origin Data% Move Log Copy% Convert ids 83a46a9e-dac2-4513-bb21-a33ff76a495a -wi----- 128.00m inbox 83a46a9e-dac2-4513-bb21-a33ff76a495a -wi----- 128.00m leases 83a46a9e-dac2-4513-bb21-a33ff76a495a -wi----- 2.00g master 83a46a9e-dac2-4513-bb21-a33ff76a495a -wi-ao-- 1.00g metadata 83a46a9e-dac2-4513-bb21-a33ff76a495a -wi----- 512.00m outbox 83a46a9e-dac2-4513-bb21-a33ff76a495a -wi----- 128.00m 29af568c-e767-4217-9610-79e6c0e70602 91fd7b39-198c-4cb8-889e-837103b3c46c -wi----- 1.00g 6c1f8d8a-8f88-4372-9546-b7b65921279e 91fd7b39-198c-4cb8-889e-837103b3c46c -wi----- 2.00g ids 91fd7b39-198c-4cb8-889e-837103b3c46c -wi----- 128.00m inbox 91fd7b39-198c-4cb8-889e-837103b3c46c -wi----- 128.00m leases 91fd7b39-198c-4cb8-889e-837103b3c46c -wi----- 2.00g master 91fd7b39-198c-4cb8-889e-837103b3c46c -wi----- 1.00g metadata 91fd7b39-198c-4cb8-889e-837103b3c46c -wi----- 512.00m outbox 91fd7b39-198c-4cb8-889e-837103b3c46c -wi----- 128.00m lv_home vg0 -wi-ao-- 38.86g lv_root vg0 -wi-ao-- 19.53g lv_swap vg0 -wi-ao-- 15.62g [root@blond-vdsh ~]# !less
Since RHEL 6.3 External Beta has begun, and this bug remains unresolved, it has been rejected as it is not proposed as exception or blocker. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux.
No error message returned because vdsm was never called..., why is this on vdsm? VDSM has no notion of a VM, therefore when you delete one disk, vdsm cannot correlate it to another disk on another domain (which may be unavailable).
image delete is sent and failed by vdsm, but you are right, it should not be on vdsm since the delete should be validated by backend. Thread-906::ERROR::2012-03-28 18:16:21,016::task::853::TaskManager.Task::(_setError) Task=`a8d07180-6c96-47e6-8e96-b67e6f732736`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 861, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 38, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 1257, in deleteImage self.validateSdUUID(sdUUID) File "/usr/share/vdsm/storage/hsm.py", line 208, in validateSdUUID sdCache.produce(sdUUID=sdUUID).validate() File "/usr/share/vdsm/storage/sdc.py", line 91, in produce dom = self._findDomain(sdUUID) File "/usr/share/vdsm/storage/sdc.py", line 115, in _findDomain raise se.StorageDomainDoesNotExist(sdUUID) StorageDomainDoesNotExist: Storage domain does not exist: ('91fd7b39-198c-4cb8-889e-837103b3c46c',) Thread-906::DEBUG::2012-03-28 18:16:21,017::task::872::TaskManager.Task::(_run) Task=`a8d07180-6c96-47e6-8e96-b67e6f732736`::Task._run: a8d07180-6c96-47e6-8e96-b67e6f732736 ('91fd7b39-198c-4cb8-889e-837103b3c46c', 'aa9c0cc8-94b3-4546-9e3 5-0fd52d3454fd', '9a9606a1-e6e0-4a81-a8b9-56bdf01eccb0', 'false', 'false') {} failed - stopping task Thread-906::DEBUG::2012-03-28 18:16:21,017::task::1199::TaskManager.Task::(stop) Task=`a8d07180-6c96-47e6-8e96-b67e6f732736`::stopping in state preparing (force False) Thread-906::DEBUG::2012-03-28 18:16:21,018::task::978::TaskManager.Task::(_decref) Task=`a8d07180-6c96-47e6-8e96-b67e6f732736`::ref 1 aborting True Thread-906::INFO::2012-03-28 18:16:21,019::task::1157::TaskManager.Task::(prepare) Task=`a8d07180-6c96-47e6-8e96-b67e6f732736`::aborting: Task is aborted: 'Storage domain does not exist' - code 358
this is partially fixed. in preallocated disks we have an error from UI which prevents the command from being sent to vdsm when the disks are thin provision the command is sent to vdsm. currently the vdsm is blocking the delete but its a race. please reproduce with thin provion disks Thin provision - error is coming from vdsm: 2012-06-17 17:33:00,883 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-3-thread-43) [2bfc812f] Failed in DeleteImageGroupVDS method 2012-06-17 17:33:00,883 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-3-thread-43) [2bfc812f] Error code StorageDomainDoesNotExist and error message IRSGenericException: IRSErrorException: Failed to DeleteImag eGroupVDS, error = Storage domain does not exist: ('d8c4e43a-e956-4c56-b9cb-76dc98e3ab9b',) preallocated -> blocked with CanDoAction: 2012-06-17 17:32:20,953 WARN [org.ovirt.engine.core.bll.RemoveVmCommand] (ajp--0.0.0.0-8009-4) CanDoAction of action RemoveVm failed. Reasons:ACTION_TYPE_FAILED_STORAGE_DOMAIN_STATUS_ILLEGAL,VAR__ACTION__REMOVE,VAR__TYPE__VM
Checked again on si6, also with thin provisioning disks, the command is blocked by the backend and not sent to VDSM
verified on si8