Bug 672493 - [vdsm] When migrating multiple VMs unsuccessfully, "call destroy" failed to release the image lock on the destenation migration.
Summary: [vdsm] When migrating multiple VMs unsuccessfully, "call destroy" failed to r...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: vdsm
Version: 6.1
Hardware: x86_64
OS: Linux
low
high
Target Milestone: rc
: 6.1
Assignee: Igor Lvovsky
QA Contact: yeylon@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-01-25 10:45 UTC by David Naori
Modified: 2016-04-18 06:37 UTC (History)
9 users (show)

Fixed In Version: vdsm-4.9-47.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-08-19 15:23:42 UTC
Target Upstream Version:


Attachments (Terms of Use)
vdsm-log (1.35 MB, application/x-gzip)
2011-01-25 14:13 UTC, David Naori
no flags Details

Description David Naori 2011-01-25 10:45:05 UTC
Description of problem:

When migrating 30 VMs simultaneously, some are hanging in "migrating from" status in the source and getting back up after approximately 10 minutes.
in the destenation "call destroy" failed to release the image lock.

Vdsm log:

Thread-4737::DEBUG::2011-01-24 16:14:18,079::clientIF::40::vds::(wrapper) [10.35.116.10]::call destroy with ('3e2681e7-dc43-409b-b5df-709f8f3d19c0',) {}
Thread-4737::DEBUG::2011-01-24 16:14:18,080::libvirtvm::931::vds.vmlog.3e2681e7-dc43-409b-b5df-709f8f3d19c0::(destroy) destroy Called
Thread-4737::WARNING::2011-01-24 16:14:18,080::libvirtvm::938::vds.vmlog.3e2681e7-dc43-409b-b5df-709f8f3d19c0::(destroy) <<<<< Destroy are welcome >>>>>
Thread-4737::WARNING::2011-01-24 16:14:18,081::libvirtvm::953::vds.vmlog.3e2681e7-dc43-409b-b5df-709f8f3d19c0::(destroy) Traceback (most recent call last):
  File "/usr/share/vdsm/libvirtvm.py", line 947, in destroy
    self._dom.destroy()
  File "/usr/share/vdsm/libvirtvm.py", line 155, in f
    raise e
libvirtError: Domain not found: no domain with matching uuid '3e2681e7-dc43-409b-b5df-709f8f3d19c0'

Thread-4737::DEBUG::2011-01-24 16:14:18,081::utils::570::Storage.Misc.excCmd::(execCmd) '/usr/bin/sudo -n /sbin/service ksmtuned retune' (cwd None)
Thread-4737::DEBUG::2011-01-24 16:14:18,113::utils::570::Storage.Misc.excCmd::(execCmd) SUCCESS: <err> = ''; <rc> = 0
Thread-4737::DEBUG::2011-01-24 16:14:18,114::libvirtvm::966::vds.vmlog.3e2681e7-dc43-409b-b5df-709f8f3d19c0::(destroy) Total desktops after destroy of 3e2681e7-dc43-409b-b5df-709f8f3d19c0 is 6
Thread-4737::INFO::2011-01-24 16:14:18,114::dispatcher::95::Storage.Dispatcher.Protect::(run) Run and protect: teardownVolume, args: ( sdUUID=5264fc3d-c4cb-4e86-be4b-144a905a5fdc spUUID=4e0cb55f-c43e-4e3c-aa9e-13334e4469b1 imgUUID=b48d068d-3d31-4889-9bd1-9ba4e7d9a04b volUUID=fec97928-2f0e-4873-9b72-30fa228fffd0 rw=True)
Thread-4737::DEBUG::2011-01-24 16:14:18,115::task::491::TaskManager.Task::(_debug) Task 13f5d9d2-6eca-40a4-a798-680db9a5420c: moving from state init -> state preparing
Thread-4737::INFO::2011-01-24 16:14:18,115::sp::1349::Storage.StoragePool::(getDomains) Get storage pool domains: {'5264fc3d-c4cb-4e86-be4b-144a905a5fdc': 'Active', '231f7bb3-0b77-481c-8dc8-9b32271e09fe': 'Active'}
Thread-4737::INFO::2011-01-24 16:14:18,115::fileSD::184::Storage.StorageDomain::(validate) sdUUID=5264fc3d-c4cb-4e86-be4b-144a905a5fdc
Thread-4737::DEBUG::2011-01-24 16:14:18,117::metadata::88::Storage.Metadata::(__validate) metadata.__validate() __cksum(md) = 078ef9aab0d6ddbf7795cebecb390baccc487466
Thread-4737::DEBUG::2011-01-24 16:14:18,118::resourceManager::154::ResourceManager.Request::(__init__) ResName=`Storage.5264fc3d-c4cb-4e86-be4b-144a905a5fdc`ReqID=`2ef10eef-f410-490b-90e0-a997a5fe6dfe`::Request was made in '/usr/share/vdsm/storage/hsm.py' line '1714' at 'public_teardownVolume'
Thread-4737::DEBUG::2011-01-24 16:14:18,118::resourceManager::467::ResourceManager::(registerResource) Trying to register resource 'Storage.5264fc3d-c4cb-4e86-be4b-144a905a5fdc' for lock type 'shared'
Thread-4737::DEBUG::2011-01-24 16:14:18,118::resourceManager::508::ResourceManager::(registerResource) Resource 'Storage.5264fc3d-c4cb-4e86-be4b-144a905a5fdc' is free. Now locking as 'shared' (1 active user)
Thread-4737::DEBUG::2011-01-24 16:14:18,119::resourceManager::191::ResourceManager.Request::(grant) ResName=`Storage.5264fc3d-c4cb-4e86-be4b-144a905a5fdc`ReqID=`2ef10eef-f410-490b-90e0-a997a5fe6dfe`::Granted request
Thread-4737::DEBUG::2011-01-24 16:14:18,119::task::491::TaskManager.Task::(_debug) Task 13f5d9d2-6eca-40a4-a798-680db9a5420c: _resourcesAcquired: Storage.5264fc3d-c4cb-4e86-be4b-144a905a5fdc (shared)
Thread-4737::DEBUG::2011-01-24 16:14:18,120::task::491::TaskManager.Task::(_debug) Task 13f5d9d2-6eca-40a4-a798-680db9a5420c: ref 1 aborting False
Thread-4737::DEBUG::2011-01-24 16:14:18,120::fileVolume::526::Storage.Volume::(validateVolumePath) validate path for fec97928-2f0e-4873-9b72-30fa228fffd0
Thread-4737::ERROR::2011-01-24 16:14:18,121::task::854::TaskManager.Task::(_setError) Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 862, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/storage/hsm.py", line 1715, in public_teardownVolume
    SDF.produce(sdUUID=sdUUID).produceVolume(imgUUID=imgUUID, volUUID=volUUID).teardown()
  File "/usr/share/vdsm/storage/fileSD.py", line 138, in produceVolume
    return fileVolume.FileVolume(repoPath, self.sdUUID, imgUUID, volUUID)
  File "/usr/share/vdsm/storage/fileVolume.py", line 32, in __init__
    volume.Volume.__init__(self, repoPath, sdUUID, imgUUID, volUUID)
  File "/usr/share/vdsm/storage/volume.py", line 113, in __init__
    self.validate()
  File "/usr/share/vdsm/storage/volume.py", line 120, in validate
    self.validateVolumePath()
  File "/usr/share/vdsm/storage/fileVolume.py", line 537, in validateVolumePath
    self.validateMetaVolumePath()
  File "/usr/share/vdsm/storage/fileVolume.py", line 546, in validateMetaVolumePath
    raise se.VolumeDoesNotExist(self.volUUID)
VolumeDoesNotExist: Volume does not exist: ('fec97928-2f0e-4873-9b72-30fa228fffd0',)
Thread-4737::DEBUG::2011-01-24 16:14:18,123::task::491::TaskManager.Task::(_debug) Task 13f5d9d2-6eca-40a4-a798-680db9a5420c: Task._run: 13f5d9d2-6eca-40a4-a798-680db9a5420c ('5264fc3d-c4cb-4e86-be4b-144a905a5fdc', '4e0cb55f-c43e-4e3c-aa9e-13334e4469b1', 'b48d068d-3d31-4889-9bd1-9ba4e7d9a04b', 'fec97928-2f0e-4873-9b72-30fa228fffd0', True) {} failed - stopping task
Thread-4737::DEBUG::2011-01-24 16:14:18,123::task::491::TaskManager.Task::(_debug) Task 13f5d9d2-6eca-40a4-a798-680db9a5420c: stopping in state preparing (force False)
Thread-4737::DEBUG::2011-01-24 16:14:18,124::task::491::TaskManager.Task::(_debug) Task 13f5d9d2-6eca-40a4-a798-680db9a5420c: ref 1 aborting True
Thread-4737::INFO::2011-01-24 16:14:18,124::task::1166::TaskManager.Task::(prepare) aborting: Task is aborted: 'Volume does not exist' - code 201
Thread-4737::DEBUG::2011-01-24 16:14:18,124::task::491::TaskManager.Task::(_debug) Task 13f5d9d2-6eca-40a4-a798-680db9a5420c: Prepare: aborted: Volume does not exist
Thread-4737::DEBUG::2011-01-24 16:14:18,125::task::491::TaskManager.Task::(_debug) Task 13f5d9d2-6eca-40a4-a798-680db9a5420c: ref 0 aborting True
Thread-4737::DEBUG::2011-01-24 16:14:18,125::task::491::TaskManager.Task::(_debug) Task 13f5d9d2-6eca-40a4-a798-680db9a5420c: Task._doAbort: force False
Thread-4737::DEBUG::2011-01-24 16:14:18,125::resourceManager::821::irs::(cancelAll) Owner.cancelAll requests {}
Thread-4737::DEBUG::2011-01-24 16:14:18,125::task::491::TaskManager.Task::(_debug) Task 13f5d9d2-6eca-40a4-a798-680db9a5420c: moving from state preparing -> state aborting
Thread-4737::DEBUG::2011-01-24 16:14:18,126::task::491::TaskManager.Task::(_debug) Task 13f5d9d2-6eca-40a4-a798-680db9a5420c: _aborting: recover policy none
Thread-4737::DEBUG::2011-01-24 16:14:18,126::task::491::TaskManager.Task::(_debug) Task 13f5d9d2-6eca-40a4-a798-680db9a5420c: moving from state aborting -> state failed
Thread-4737::DEBUG::2011-01-24 16:14:18,126::resourceManager::786::irs::(releaseAll) Owner.releaseAll requests {} resources {'Storage.5264fc3d-c4cb-4e86-be4b-144a905a5fdc': < ResourceRef 'Storage.5264fc3d-c4cb-4e86-be4b-144a905a5fdc', isValid: 'True' obj: 'None'>}
Thread-4737::DEBUG::2011-01-24 16:14:18,127::resourceManager::821::irs::(cancelAll) Owner.cancelAll requests {}
Thread-4737::DEBUG::2011-01-24 16:14:18,127::resourceManager::517::ResourceManager::(releaseResource) Trying to release resource 'Storage.5264fc3d-c4cb-4e86-be4b-144a905a5fdc'
Thread-4737::DEBUG::2011-01-24 16:14:18,127::resourceManager::532::ResourceManager::(releaseResource) Released resource 'Storage.5264fc3d-c4cb-4e86-be4b-144a905a5fdc' (0 active users)
Thread-4737::DEBUG::2011-01-24 16:14:18,127::resourceManager::537::ResourceManager::(releaseResource) Resource 'Storage.5264fc3d-c4cb-4e86-be4b-144a905a5fdc' is free, finding out if anyone is waiting for it.
Thread-4737::DEBUG::2011-01-24 16:14:18,128::resourceManager::544::ResourceManager::(releaseResource) No one is waiting for resource 'Storage.5264fc3d-c4cb-4e86-be4b-144a905a5fdc', Clearing records.
Thread-4737::ERROR::2011-01-24 16:14:18,128::dispatcher::104::Storage.Dispatcher.Protect::(run) {'status': {'message': "Volume does not exist: ('fec97928-2f0e-4873-9b72-30fa228fffd0',)", 'code': 201}, 'args': [('fec97928-2f0e-4873-9b72-30fa228fffd0',)]}
Thread-4737::DEBUG::2011-01-24 16:14:18,129::clientIF::45::vds::(wrapper) return destroy with {'status': {'message': 'Machine destroyed', 'code': 0}}


Version-Release number of selected component (if applicable):
-vdsm-4.9-43.1
-libvirt-0.8.7-3
-Rhevm 2.3
-rhel 6.1 based hosts


repro steps:

1) Create 30 VMs from pool.
2) Migrate all of them together from one host to another.

Comment 1 Moran Goldboim 2011-01-25 11:06:41 UTC
David please attach full logs

Comment 2 David Naori 2011-01-25 14:13:29 UTC
Created attachment 475173 [details]
vdsm-log

Comment 3 David Naori 2011-02-03 14:48:43 UTC
Verified


Note You need to log in before you can comment on or make changes to this bug.