Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 875708

Summary: deleteImage operations failing due to busy vols (KeyError: 'de12c33b-fbaa-4c32-a1db-ad41d4c1b389')
Product: Red Hat Enterprise Virtualization Manager Reporter: Dafna Ron <dron>
Component: vdsmAssignee: Eduardo Warszawski <ewarszaw>
Status: CLOSED ERRATA QA Contact: Aharon Canan <acanan>
Severity: high Docs Contact:
Priority: high    
Version: unspecifiedCC: abaron, acanan, amureini, bazulay, cboyle, ewarszaw, iheim, jkt, lpeer, scohen, yeylon
Target Milestone: ---   
Target Release: 3.3.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
When trying to remove a vm that had a broken symlink, the system displayed a key error from VDMS. The root cause of this was live operations and deleteImage operations sent by the engine, when some (top) LV's were still open and failed to be removed. This was fixed in previous code changes.
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-01-21 16:03:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs none

Description Dafna Ron 2012-11-12 12:54:00 UTC
Created attachment 643491 [details]
logs

Description of problem:

I tried to remove a vm that has a broken symlink and we get a key error from vdms 


Thread-54199::ERROR::2012-11-12 13:50:28,922::task::853::TaskManager.Task::(_setError) Task=`26cc797a-8411-48ea-afab-b5386d28c4a9`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 861, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 38, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 1319, in deleteImage
    allVols = dom.getAllVolumes()
  File "/usr/share/vdsm/storage/blockSD.py", line 960, in getAllVolumes
    return getAllVolumes(self.sdUUID)
  File "/usr/share/vdsm/storage/blockSD.py", line 169, in getAllVolumes
    and vImg not in res[vPar]['imgs']:
KeyError: 'de12c33b-fbaa-4c32-a1db-ad41d4c1b389'
Thread-54199::DEBUG::2012-11-12 13:50:28,923::task::872::TaskManager.Task::(_run) Task=`26cc797a-8411-48ea-afab-b5386d28c4a9`::Task._run: 26cc797a-8411-48ea-afab-b5386d28c4a9 ('17449ff0-4359-4866-ad19-5c15a7483b95', '9600a42f-95cc-4b56-
9b1f-cf7429d10b5c', 'f654aef8-a1aa-4f19-afe2-190db949a7a8', 'false', 'false') {} failed - stopping task


Version-Release number of selected component (if applicable):

vdsm-4.9.6-42.0.el6_3.x86_64

How reproducible:

Steps to Reproduce:
1. unlink image
2. try to remove vm
3.
  
Actual results:
we fail to remove vm because of broken symlink

Expected results:

we should be able to remove the vm


Additional info: logs

this happened because a delete of volume failed after the symlink was already removed.

Comment 1 RHEL Program Management 2012-12-14 07:52:28 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 2 Ayal Baron 2012-12-26 09:58:54 UTC
Edu, do we now handle such cases?

Comment 3 Ayal Baron 2013-02-17 06:13:42 UTC
(In reply to comment #2)
> Edu, do we now handle such cases?

To be clear, the question is whether the bug still exists or the code changes introduced in this area solve it.

Comment 4 Eduardo Warszawski 2013-02-20 21:49:17 UTC
(In reply to comment #0)
> Description of problem:
> 
> I tried to remove a vm that has a broken symlink and we get a key error from
> vdms 
> 
Asked Dafna, it is not clear what broken symlink is and how it was broken,


> Steps to Reproduce:
> 1. unlink image
> 2. try to remove vm
> 3.
>
Unrelated.
   
> Actual results:
> we fail to remove vm because of broken symlink
> 
No. deleteImage operations fails because there is an orphan volume.
> 
> this happened because a delete of volume failed after the symlink was
> already removed.
The issue is with incomplete image: f654aef8-a1aa-4f19-afe2-190db949a7a8 on SD: 17449ff0-4359-4866-ad19-5c15a7483b95

Comment 5 Eduardo Warszawski 2013-02-20 22:14:56 UTC
(In reply to comment #3)
> (In reply to comment #2)
> > Edu, do we now handle such cases?
> 
> To be clear, the question is whether the bug still exists or the code
> changes introduced in this area solve it.

Unfortunately, bugs still exist.

The root cause is live operations and deleteImage operations sent by engine when some (top)  LV's are still open failing to be removed.


Image f654aef8-a1aa-4f19-afe2-190db949a7a8 was cloned to SD: 17449ff0-4359-4866-ad19-5c15a7483b95 from SD: b25b91ca-ebd7-4d98-8b8c-7dc7aa1cb4f5

This image is self contained, i.e. no temmplate based, and it is composed by two volumes:
de12c33b-fbaa-4c32-a1db-ad41d4c1b389 <- e81494ba-971d-41e7-b2f3-d22ae8f9048f

Attempt to remove src image (SD: b25b91ca) but LV e81494ba failed to be removed since it is busy.
Then an attempt to remove the clone (SD: 17449ff0) is issued but LV e81494ba failed to be removed since it is busy.
These volumez remains orphan since de12c33b was removed from oth SDs.

Further image operations on  will fail since e81494ba is orphan.


###########################################################

Thread-53444::INFO::2012-11-12 13:36:37,427::logUtils::37::dispatcher::(wrapper) Run and protect: cloneImageStructure(spUUID='9600a42f-95cc-4b56-9b1f-cf7429d10b5c', sdUUID='b25b
91ca-ebd7-4d98-8b8c-7dc7aa1cb4f5', imgUUID='f654aef8-a1aa-4f19-afe2-190db949a7a8', dstSdUUID='17449ff0-4359-4866-ad19-5c15a7483b95')


Thread-53521::INFO::2012-11-12 13:37:48,513::logUtils::37::dispatcher::(wrapper) Run and protect: deleteImage(sdUUID='b25b91ca-ebd7-4d98-8b8c-7dc7aa1cb4f5', spUUID='9600a42f-95cc-4b56-9b1f-cf7429d10b5c', imgUUID='f654aef8-a1aa-4f19-afe2-190db949a7a8', postZero='false', force='false')

Thread-53521::DEBUG::2012-11-12 13:37:59,247::__init__::1164::Storage.Misc.excCmd::(_log) FAILED: <err> = '  /dev/mapper/1Dafna-si24-031352651: read failed after 0 of 4096 at 107374116864: Input/output error\n  /dev/mapper/1Dafna-si24-031352651: read failed after 0 of 4096 at 107374174208: Input/output error\n  /dev/mapper/1Dafna-si24-031352651: read failed after 0 of 4096 at 0: Input/output error\n  WARNING: Error counts reached a limit of 3. Device /dev/mapper/1Dafna-si24-031352651 was disabled\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  Unable to deactivate b25b91ca--ebd7--4d98--8b8c--7dc7aa1cb4f5-de12c33b--fbaa--4c32--a1db--ad41d4c1b389 (253:24)\n  Unable to deactivate logical volume "de12c33b-fbaa-4c32-a1db-ad41d4c1b389"\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  Unable to deactivate b25b91ca--ebd7--4d98--8b8c--7dc7aa1cb4f5-e81494ba--971d--41e7--b2f3--d22ae8f9048f (253:25)\n  Unable to deactivate logical volume "e81494ba-971d-41e7-b2f3-d22ae8f9048f"\n'; <rc> = 5


Thread-53536::INFO::2012-11-12 13:37:59,804::logUtils::37::dispatcher::(wrapper) Run and protect: deleteImage(sdUUID='17449ff0-4359-4866-ad19-5c15a7483b95', spUUID='9600a42f-95cc-4b56-9b1f-cf7429d10b5c', imgUUID='f654aef8-a1aa-4f19-afe2-190db949a7a8', postZero='false', force='false')

Thread-53536::DEBUG::2012-11-12 13:38:05,634::__init__::1164::Storage.Misc.excCmd::(_log) FAILED: <err> = '  /dev/mapper/1Dafna-si24-031352651: read failed after 0 of 4096 at 107374116864: Input/output error\n  /dev/mapper/1Dafna-si24-031352651: read failed after 0 of 4096 at 107374174208: Input/output error\n  /dev/mapper/1Dafna-si24-031352651: read failed after 0 of 4096 at 0: Input/output error\n  WARNING: Error counts reached a limit of 3. Device /dev/mapper/1Dafna-si24-031352651 was disabled\n  WARNING: This metadata update is NOT backed up\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  Unable to deactivate 17449ff0--4359--4866--ad19--5c15a7483b95-e81494ba--971d--41e7--b2f3--d22ae8f9048f (253:27)\n  Unable to deactivate logical volume "e81494ba-971d-41e7-b2f3-d22ae8f9048f"\n'; <rc> = 5

# Fail to remove another image.
Thread-53755::INFO::2012-11-12 13:41:54,492::logUtils::37::dispatcher::(wrapper) Run and protect: deleteImage(sdUUID='17449ff0-4359-4866-ad19-5c15a7483b95', spUUID='9600a42f-95c
c-4b56-9b1f-cf7429d10b5c', imgUUID='1e7ec647-608b-4739-b974-5e8d73ea9f8a', postZero='false', force='false')


Thread-53755::ERROR::2012-11-12 13:41:55,184::task::853::TaskManager.Task::(_setError) Task=`51df8ae2-2875-43c1-a2ec-6edcd4a4e7f7`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 861, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 38, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 1319, in deleteImage
    allVols = dom.getAllVolumes()
  File "/usr/share/vdsm/storage/blockSD.py", line 960, in getAllVolumes
    return getAllVolumes(self.sdUUID)
  File "/usr/share/vdsm/storage/blockSD.py", line 169, in getAllVolumes
    and vImg not in res[vPar]['imgs']:
KeyError: 'de12c33b-fbaa-4c32-a1db-ad41d4c1b389'

Comment 6 Ayal Baron 2013-07-08 19:19:49 UTC
Haim, looking at the code I don't think this happens anymore. Even if there is a bug. Can you try to reproduce?

Comment 7 Ayal Baron 2013-07-10 09:40:23 UTC
After discussing with Haim, moving to ON_QA

Comment 8 Aharon Canan 2013-07-17 12:00:41 UTC
verified using is5

verification steps - 
1. create VM with few disks
2. remove one of the VM's symlink (under /rhev/data-center/)
3. remove the VM

VM removed

Comment 9 Charlie 2013-11-28 00:30:27 UTC
This bug is currently attached to errata RHBA-2013:15291. If this change is not to be documented in the text for this errata please either remove it from the errata, set the requires_doc_text flag to 
minus (-), or leave a "Doc Text" value of "--no tech note required" if you do not have permission to alter the flag.

Otherwise to aid in the development of relevant and accurate release documentation, please fill out the "Doc Text" field above with these four (4) pieces of information:

* Cause: What actions or circumstances cause this bug to present.
* Consequence: What happens when the bug presents.
* Fix: What was done to fix the bug.
* Result: What now happens when the actions or circumstances above occur. (NB: this is not the same as 'the bug doesn't present anymore')

Once filled out, please set the "Doc Type" field to the appropriate value for the type of change made and submit your edits to the bug.

For further details on the Cause, Consequence, Fix, Result format please refer to:

https://bugzilla.redhat.com/page.cgi?id=fields.html#cf_release_notes 

Thanks in advance.

Comment 10 errata-xmlrpc 2014-01-21 16:03:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0040.html