Bug 979753 - [vdsm] LV metadata corruption after failed LSM and snapshots merge
Summary: [vdsm] LV metadata corruption after failed LSM and snapshots merge
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.2.0
Hardware: x86_64
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.3.0
Assignee: Federico Simoncelli
QA Contact: Leonid Natapov
URL:
Whiteboard: storage
Depends On: 957703
Blocks: 1019461
TreeView+ depends on / blocked
 
Reported: 2013-06-30 08:53 UTC by Elad
Modified: 2018-07-16 12:32 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
After a failed live storage migration, attempting to merge the snapshots caused logical volume metadata corruption. This problem has been fixed, so the corruption no longer occurs, and the disks can be moved after an unsuccessful live migration.
Clone Of:
Environment:
Last Closed: 2014-01-21 16:26:38 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:
abaron: Triaged+


Attachments (Terms of Use)
logs and lv metadata (17.01 MB, application/x-gzip)
2013-06-30 09:14 UTC, Elad
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2014:0040 0 normal SHIPPED_LIVE vdsm bug fix and enhancement update 2014-01-21 20:26:21 UTC

Description Elad 2013-06-30 08:53:52 UTC
Description of problem:
After failed live storage migration, we stopped the VM and merged it's snapshots. After that, disk migration (offline movement) failed with:

d63e6cef-0bcc-41d9-a481-d39bc43bb36a::ERROR::2013-06-20 15:57:07,434::image::332::Storage.Image::(getChain) There is no leaf in the image a79175bf-08e8-4854-8acb-74d21a65366a
d63e6cef-0bcc-41d9-a481-d39bc43bb36a::WARNING::2013-06-20 15:57:07,434::resourceManager::520::ResourceManager::(registerResource) Resource factory failed to create resource 'ae744a61-0e73-480e-a113-14649d0e8e73_im
ageNS.a79175bf-08e8-4854-8acb-74d21a65366a'. Canceling request.
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/resourceManager.py", line 518, in registerResource
    obj = namespaceObj.factory.createResource(name, lockType)
  File "/usr/share/vdsm/storage/resourceFactories.py", line 193, in createResource
    lockType)
  File "/usr/share/vdsm/storage/resourceFactories.py", line 122, in __getResourceCandidatesList
    imgUUID=resourceName)
  File "/usr/share/vdsm/storage/image.py", line 333, in getChain
    raise se.ImageIsNotLegalChain(imgUUID)
ImageIsNotLegalChain: Image is not a legal chain: ('a79175bf-08e8-4854-8acb-74d21a65366a',)
d63e6cef-0bcc-41d9-a481-d39bc43bb36a::DEBUG::2013-06-20 15:57:07,461::resourceManager::186::ResourceManager.Request::(cancel) ResName=`ae744a61-0e73-480e-a113-14649d0e8e73_imageNS.a79175bf-08e8-4854-8acb-74d21a653
66a`ReqID=`75e4d4c2-cc51-43e6-b585-f6b73cada850`::Canceled request
d63e6cef-0bcc-41d9-a481-d39bc43bb36a::WARNING::2013-06-20 15:57:07,462::resourceManager::180::ResourceManager.Request::(cancel) ResName=`ae744a61-0e73-480e-a113-14649d0e8e73_imageNS.a79175bf-08e8-4854-8acb-74d21a6
5366a`ReqID=`75e4d4c2-cc51-43e6-b585-f6b73cada850`::Tried to cancel a processed request
d63e6cef-0bcc-41d9-a481-d39bc43bb36a::ERROR::2013-06-20 15:57:07,463::task::850::TaskManager.Task::(_setError) Task=`d63e6cef-0bcc-41d9-a481-d39bc43bb36a`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 857, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/storage/task.py", line 318, in run
    return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/share/vdsm/storage/securable.py", line 68, in wrapper
    return f(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 1744, in moveImage
    rmanager.acquireResource(dstImageResourcesNamespace, imgUUID, rm.LockType.exclusive)):
  File "/usr/share/vdsm/storage/resourceManager.py", line 468, in acquireResource
    raise se.ResourceAcqusitionFailed()
ResourceAcqusitionFailed: Could not acquire resource. Probably resource factory threw an exception.: ()




Version-Release number of selected component (if applicable):
vdsm-4.10.2-19.0.el6ev.x86_64
rhevm-3.2.0-11.29.el6ev.noarch


How reproducible:
100%

Steps to Reproduce:
1. on block pool with more than one data domain
2. get a failure in LSM (we got when we tried to move a large number of disks (~30) )
3. shutdown the vm and merge it's snapshots
4. try to move the image (offline movement)

Actual results:

vdsm reports:
ImageIsNotLegalChain: Image is not a legal chain: ('a79175bf-08e8-4854-8acb-74d21a65366a',)


Expected results:
there should not be any corruption in lv metadata

Additional info:
logs and lv metadata

Comment 1 Elad 2013-06-30 09:14:00 UTC
Created attachment 767025 [details]
logs and lv metadata

Comment 2 Ayal Baron 2013-07-10 09:10:57 UTC
Does vdsm restart solve the problem?

Comment 3 Elad 2013-07-17 15:12:28 UTC
(In reply to Ayal Baron from comment #2)
> Does vdsm restart solve the problem?

I didn't try to do so. Those disks are no longer on the system. 
I'll try to reproduce it on another 3.2 env.

Comment 4 Elad 2013-08-27 08:27:29 UTC
(In reply to Elad from comment #3)
> (In reply to Ayal Baron from comment #2)
> > Does vdsm restart solve the problem?
> 
> I didn't try to do so. Those disks are no longer on the system. 
> I'll try to reproduce it on another 3.2 env.

I'm having trouble reproduce it, so it's unclear whether vdsm restart solve this problem or not

Comment 5 Ayal Baron 2013-09-05 20:27:13 UTC
These logs have so many errors, it's hard to know where to begin.  The problem happened here on multiple images but the first one is missing info so tracking flow of live storage migration for image: a79175bf-08e8-4854-8acb-74d21a65366a:

Thread-278447::INFO::2013-06-19 16:04:53,200::logUtils::40::dispatcher::(wrapper) Run and protect: createVolume(sdUUID='ebe26784-f182-430f-9422-ddc162ca064c', spUUID='ba63f0c8-37ca-492b-b28c-55ae09c92a6a', imgUUID='a79175bf-08e8-4854-8acb-74d21a65366a', size='12884901888', volFormat=4, preallocate=2, diskType=2, volUUID='6df1ac7e-1992-4789-b3c3-de7a7f1d984f', desc='', srcImgUUID='a79175bf-08e8-4854-8acb-74d21a65366a', srcVolUUID='51a6795a-e26f-4a5c-a0db-635180539bbf')

7667b392-6998-4cdd-b4ab-c41006d72347::DEBUG::2013-06-19 16:05:34,934::misc::83::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /sbin/lvm lvchange --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ \'a%3514f0c542a000184|36006016012823000787411ed01d9e211|36006016012823000daf89aa8c7d8e211%\', \'r%.*%\' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1 }  backup { retain_min = 50  retain_days = 0 } " --autobackup n --available n ebe26784-f182-430f-9422-ddc162ca064c/51a6795a-e26f-4a5c-a0db-635180539bbf' (cwd None)

7667b392-6998-4cdd-b4ab-c41006d72347::DEBUG::2013-06-19 16:05:40,340::misc::83::Storage.Misc.excCmd::(<lambda>) FAILED: <err> = '  /dev/mapper/36006016012823000daf89aa8c7d8e211: read failed after 0 of 4096 at 2199023190016: Input/output error\n  /dev/mapper/36006016012823000daf89aa8c7d8e211: read failed after 0 of 4096 at 2199023247360: Input/output error\n  /dev/mapper/36006016012823000daf89aa8c7d8e211: read failed after 0 of 4096 at 0: Input/output error\n  WARNING: Error counts reached a limit of 3. Device /dev/mapper/36006016012823000daf89aa8c7d8e211 was disabled\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  device-mapper: remove ioctl on  failed: Device or resource busy\n  Unable to deactivate ebe26784--f182--430f--9422--ddc162ca064c-51a6795a--e26f--4a5c--a0db--635180539bbf (253:59)\n'; <rc> = 5

(lvchange failed for several volumes...)

Thread-278504::DEBUG::2013-06-19 16:06:13,017::BindingXMLRPC::913::vds::(wrapper) client [10.35.64.13]::call vmSnapshot with ('b18d74ad-b087-441c-9e9d-2a1112927a7e', [{'baseVolumeID': '51a6795a-e26f-4a5c-a0db-635180539bbf', 'domainID': 'ebe26784-f182-430f-9422-ddc162ca064c', 'volumeID': '6df1ac7e-1992-4789-b3c3-de7a7f1d984f', 'imageID': 'a79175bf-08e8-4854-8acb-74d21a65366a'}]) {}

Thread-278504::DEBUG::2013-06-19 16:06:24,488::BindingXMLRPC::920::vds::(wrapper) return vmSnapshot with {'status': {'message': 'Done', 'code': 0}, 'quiesce': False}

Thread-278512::INFO::2013-06-19 16:06:24,749::logUtils::40::dispatcher::(wrapper) Run and protect: getVolumeInfo(sdUUID='ebe26784-f182-430f-9422-ddc162ca064c', spUUID='ba63f0c8-37ca-492b-b28c-55ae09c92a6a', imgUUID='a79175bf-08e8-4854-8acb-74d21a65366a', volUUID='6df1ac7e-1992-4789-b3c3-de7a7f1d984f', options=None)

Thread-278512::INFO::2013-06-19 16:06:26,087::logUtils::42::dispatcher::(wrapper) Run and protect: getVolumeInfo, Return response: {'info': {'status': 'OK', 'domain': 'ebe26784-f182-430f-9422-ddc162ca064c', 'voltype': 'LEAF', 'description': '', 'parent': '51a6795a-e26f-4a5c-a0db-635180539bbf', 'format': 'COW', 'image': 'a79175bf-08e8-4854-8acb-74d21a65366a', 'ctime': '1371654371', 'disktype': '2', 'legality': 'LEGAL', 'mtime': '1371654371', 'apparentsize': '1073741824', 'children': [], 'pool': '', 'capacity': '12884901888', 'uuid': '6df1ac7e-1992-4789-b3c3-de7a7f1d984f', 'truesize': '1073741824', 'type': 'SPARSE'}}

Thread-278513::INFO::2013-06-19 16:06:26,392::logUtils::40::dispatcher::(wrapper) Run and protect: cloneImageStructure(spUUID='ba63f0c8-37ca-492b-b28c-55ae09c92a6a', sdUUID='ebe26784-f182-430f-9422-ddc162ca064c', imgUUID='a79175bf-08e8-4854-8acb-74d21a65366a', dstSdUUID='ae744a61-0e73-480e-a113-14649d0e8e73')

Thread-278513::INFO::2013-06-19 16:06:28,560::logUtils::42::dispatcher::(wrapper) Run and protect: cloneImageStructure, Return response: None

(clone task finished successfully)

Thread-278564::DEBUG::2013-06-19 16:07:36,412::BindingXMLRPC::913::vds::(wrapper) client [10.35.64.13]::call vmDiskReplicateStart with ('b18d74ad-b087-441c-9e9d-2a1112927a7e', {'device': 'disk', 'domainID': 'ebe26784-f182-430f-9422-ddc162ca064c', 'volumeID': '6df1ac7e-1992-4789-b3c3-de7a7f1d984f', 'poolID': 'ba63f0c8-37ca-492b-b28c-55ae09c92a6a', 'imageID': 'a79175bf-08e8-4854-8acb-74d21a65366a'}, {'device': 'disk', 'domainID': 'ae744a61-0e73-480e-a113-14649d0e8e73', 'volumeID': '6df1ac7e-1992-4789-b3c3-de7a7f1d984f', 'poolID': 'ba63f0c8-37ca-492b-b28c-55ae09c92a6a', 'imageID': 'a79175bf-08e8-4854-8acb-74d21a65366a'}) {} flowID [db2f067]

Thread-278566::INFO::2013-06-19 16:07:38,258::logUtils::40::dispatcher::(wrapper) Run and protect: syncImageData(spUUID='ba63f0c8-37ca-492b-b28c-55ae09c92a6a', sdUUID='ebe26784-f182-430f-9422-ddc162ca064c', imgUUID='a79175bf-08e8-4854-8acb-74d21a65366a', dstSdUUID='ae744a61-0e73-480e-a113-14649d0e8e73', syncType='INTERNAL')

Thread-278570::DEBUG::2013-06-19 16:07:44,062::BindingXMLRPC::913::vds::(wrapper) client [10.35.64.13]::call vmGetStats with ('b18d74ad-b087-441c-9e9d-2a1112927a7e',) {}flowID [79550d4e]

Thread-278570::DEBUG::2013-06-19 16:07:44,088::BindingXMLRPC::920::vds::(wrapper) return vmGetStats with {'status': {'message': 'Done', 'code': 0}, 'statsList': [{'username': u'None', 'memUsage': '55', 'acpiEnable': 'true', 'pid': '6879', 'displayPort': u'5902', 'session': 'Unknown', 'displaySecurePort': u'5903', 'timeOffset': 3601L, 'memoryStats': {u'swap_out': '0', u'majflt': '0', u'mem_free': '1399628', u'swap_in': '0', u'pageflt': '9', u'mem_total': '2956800', u'mem_unused': '377788'}, 'balloonInfo': {'balloon_max': 3145728, 'balloon_cur': 3145728}, 'pauseCode': 'NOERR', 'disksUsage': [{u'path': u'/', u'total': '11418218496', u'fs': u'ext3', u'used': '3419381760'}, {u'path': u'/proc/bus/usb', u'total': '0', u'fs': u'usbfs', u'used': '0'}, {u'path': u'/boot', u'total': '203097088', u'fs': u'ext3', u'used': '32711680'}, {u'path': u'/var/lib/jenkins/workspace', u'total': '1057951318016', u'fs': u'nfs', u'used': '1000840626176'}], 'network': {u'vnet1': {'macAddr': '00:1a:4a:16:97:11', 'rxDropped': '0', 'rxErrors': '0', 'txDropped': '0', 'txRate': '0.0', 'rxRate': '0.0', 'txErrors': '0', 'state': 'unknown', 'speed': '1000', 'name': u'vnet1'}}, 'displayType': 'qxl', 'guestName': u'jenkins-vm-06.scl.lab.tlv.redhat.com', 'elapsedTime': '-153281', 'vmType': 'kvm', 'cpuSys': '4.78', 'appsList': [u'rhevm-guest-agent-common-1.0.7-11.el6ev', u'kernel-2.6.32-358.el6'], 'guestOs': u'2.6.32-358.el6.x86_64', 'status': 'Paused', 'hash': '188640836589636024', 'vmId': 'b18d74ad-b087-441c-9e9d-2a1112927a7e', 'displayIp': '0', 'lastLogout': 1371807745.481838, 'lastUser': u'None', 'clientIp': '', 'kvmEnable': 'true', 'disks': {u'vda': {'readLatency': '0', 'apparentsize': '1073741824', 'writeLatency': '1446138', 'imageID': 'a79175bf-08e8-4854-8acb-74d21a65366a', 'flushLatency': '253698', 'readRate': '0.00', 'truesize': '1073741824', 'writeRate': '6946.87'}, u'hdc': {'readLatency': '0', 'apparentsize': '0', 'writeLatency': '0', 'flushLatency': '0', 'readRate': '0.00', 'truesize': '0', 'writeRate': '0.00'}}, 'monitorResponse': '0', 'statsAge': '-321435.10', 'cpuUser': '10.40', 'lastLogin': 1371639608.1277881, 'guestIPs': u'10.35.161.17', 'netIfaces': [{u'name': u'eth0', u'inet6': [u'fe80::21a:4aff:fe16:9711'], u'inet': [u'10.35.161.17'], u'hw': u'00:1a:4a:16:97:11'}]}]}

Thread-278748::DEBUG::2013-06-19 16:11:39,438::BindingXMLRPC::913::vds::(wrapper) client [10.35.64.13]::call vmShutdown with ('b18d74ad-b087-441c-9e9d-2a1112927a7e', '30', 'System Administrator has initiated shutdown of this Virtual Machine. Virtual Machine is shutting down.') {} flowID [36fa4a1e]

libvirtEventLoop::ERROR::2013-06-19 16:12:51,320::libvirtvm::2255::vm.Vm::(_onBlockJobEvent) vmId=`b18d74ad-b087-441c-9e9d-2a1112927a7e`::Live merge completed for an unexpected path: /rhev/data-center/ba63f0c8-37ca-492b-b28c-55ae09c92a6a/ebe26784-f182-430f-9422-ddc162ca064c/images/a79175bf-08e8-4854-8acb-74d21a65366a/6df1ac7e-1992-4789-b3c3-de7a7f1d984f

libvirtEventLoop::DEBUG::2013-06-19 16:12:51,321::libvirtvm::3060::vm.Vm::(_onLibvirtLifecycleEvent) vmId=`b18d74ad-b087-441c-9e9d-2a1112927a7e`::event Stopped detail 0 opaque None

Thread-278838::DEBUG::2013-06-19 16:12:52,581::BindingXMLRPC::913::vds::(wrapper) client [10.35.64.13]::call vmGetStats with ('b18d74ad-b087-441c-9e9d-2a1112927a7e',) {} flowID [2eef795c]

Thread-278838::DEBUG::2013-06-19 16:12:52,581::BindingXMLRPC::920::vds::(wrapper) return vmGetStats with {'status': {'message': 'Done', 'code': 0}, 'statsList': [{'status': 'Down', 'hash': '188640836589636024', 'exitMessage': 'User shut down', 'vmId': 'b18d74ad-b087-441c-9e9d-2a1112927a7e', 'timeOffset': 428402L, 'exitCode': 0}]}

Thread-278840::DEBUG::2013-06-19 16:12:52,676::BindingXMLRPC::913::vds::(wrapper) client [10.35.64.13]::call vmDestroy with ('b18d74ad-b087-441c-9e9d-2a1112927a7e',) {} flowID [2eef795c]
Thread-278840::DEBUG::2013-06-19 16:12:52,679::BindingXMLRPC::920::vds::(wrapper) return vmDestroy with {'status': {'message': 'Machine destroyed', 'code': 0}}

2013-06-24 16:13:06,717 WARN  [org.ovirt.engine.core.bll.RemoveSnapshotCommand] (http-/0.0.0.0:8443-5) [5dfb4d76] CanDoAction of action RemoveSnapshot failed. Reasons:ACTION_TYPE_FAILED_VM_IS_DURING_SNAPSHOT,VAR__TYPE__SNAPSHOT,VAR__ACTION__REMOVE

44e0467c-9e18-4262-9f9a-ba232aee7523::DEBUG::2013-06-19 17:40:31,263::threadPool::55::Misc.ThreadPool::(setRunningTask) Number of running tasks: 10

2013-06-24 17:40:23,628 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-55) [2439cb3d] BaseAsyncTask::OnTaskEndSuccess: Task 44e0467c-9e18-4262-9f9a-ba232aee7523 (Parent Command LiveMigrateDisk, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters) ended successfully.

2013-06-24 17:40:23,629 INFO  [org.ovirt.engine.core.bll.EntityAsyncTask] (pool-3-thread-27) [2439cb3d] EntityAsyncTask::EndCommandAction [within thread] context: Attempting to EndAction LiveMigrateDisk, executionIndex: 1

2013-06-24 17:40:23,720 ERROR [org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand] (pool-3-thread-27) [db2f067] Ending command with failure: org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand

2013-06-24 17:40:23,720 ERROR [org.ovirt.engine.core.vdsbroker.ResourceManager] (pool-3-thread-27) [db2f067] Cannot get vdsManager for vdsid=00000000-0000-0000-0000-000000000000

2013-06-24 17:40:23,721 ERROR [org.ovirt.engine.core.vdsbroker.ResourceManager] (pool-3-thread-27) [db2f067] CreateCommand failed: org.ovirt.engine.core.common.errors.VdcBLLException: VdcBLLException: Vds with id: 00000000-0000-0000-0000-000000000000 was not found


Thread-282346::DEBUG::2013-06-19 17:40:33,141::taskManager::93::TaskManager::(getTaskStatus) Entry. taskID: 44e0467c-9e18-4262-9f9a-ba232aee7523
Thread-282346::DEBUG::2013-06-19 17:40:33,142::taskManager::96::TaskManager::(getTaskStatus) Return. Response: {'code': 0, 'message': '1 jobs completed successfully', 'taskState': 'finished', 'taskResult': 'success', 'taskID': '44e0467c-9e18-4262-9f9a-ba232aee7523'}

Thread-316530::DEBUG::2013-06-20 09:20:24,138::task::579::TaskManager.Task::(_updateState) Task=`8e7c5fa7-a848-47d6-9a0e-85ef8399f763`::moving from state init -> state preparing

Thread-316530::INFO::2013-06-20 09:20:24,138::logUtils::40::dispatcher::(wrapper) Run and protect: moveImage(spUUID='ba63f0c8-37ca-492b-b28c-55ae09c92a6a', srcDomUUID='ebe26784-f182-430f-9422-ddc162ca064c', dstDomUUID='ae744a61-0e73-480e-a113-14649d0e8e73', imgUUID='a79175bf-08e8-4854-8acb-74d21a65366a', vmUUID='', op=2, postZero='false', force='false')


8e7c5fa7-a848-47d6-9a0e-85ef8399f763::ERROR::2013-06-20 09:22:33,559::image::532::Storage.Image::(_createTargetImage) Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/image.py", line 517, in _createTargetImage
    srcVolUUID=volParams['parent'])
  File "/usr/share/vdsm/storage/blockSD.py", line 610, in createVolume
    volUUID, desc, srcImgUUID, srcVolUUID)
  File "/usr/share/vdsm/storage/volume.py", line 418, in create
    raise se.VolumeAlreadyExists(volUUID)
VolumeAlreadyExists: Volume already exists: ('c0b26357-538a-46fa-b5ba-e491c8ee7c7e',)


Lots of storage errors in the middle including a previous lsm which failed and caused same error, and:

ad1f3ea8-a01b-4bbc-8912-7d115d5008df::DEBUG::2013-06-19 16:09:15,908::misc::83::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /sbin/lvm lvcreate --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ \'a%3514f0c542a000184|36006016012823000787411ed01d9e211|36006016012823000daf89aa8c7d8e211%\', \'r%.*%\' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1 }  backup {  retain_min = 50  retain_days = 0 } " --autobackup n --contiguous n --size 12288m --addtag OVIRT_VOL_INITIALIZING --name 4a1d6bd4-b874-422d-aa64-acf7c28c10b5 ae744a61-0e73-480e-a113-14649d0e8e73' (cwd None)

ad1f3ea8-a01b-4bbc-8912-7d115d5008df::DEBUG::2013-06-19 16:09:17,252::misc::83::Storage.Misc.excCmd::(<lambda>) FAILED: <err> = '  /dev/mapper/36006016012823000daf89aa8c7d8e211: read failed after 0 of 4096 at 2199023190016: Input/output error\n  /dev/mapper/36006016012823000daf89aa8c7d8e211: read failed after 0 of 4096 at 2199023247360: Input/output error\n  /dev/mapper/36006016012823000daf89aa8c7d8e211: read failed after 0 of 4096 at 0: Input/output error\n  WARNING: Error counts reached a limit of 3. Device /dev/mapper/36006016012823000daf89aa8c7d8e211 was disabled\n  WARNING: This metadata update is NOT backed up\n  device-mapper: create ioctl on ae744a61--0e73--480e--a113--14649d0e8e73-4a1d6bd4--b874--422d--aa64--acf7c28c10b5 failed: Device or resource busy\n  Failed to activate new LV.\n  WARNING: This metadata update is NOT backed up\n'; <rc> = 5

Thread-281333::WARNING::2013-06-19 17:13:44,418::blockSD::172::Storage.BlockSD::(getAllVolumes) Found broken image 5ff884cf-dddc-4cb1-9f9b-1803d606f9d6, orphan volume ae744a61-0e73-480e-a113-14649d0e8e73/bcac1899-987d-4b79-9461-c526b01dec37, parent 84de075b-9b0c-4af3-aca5-7bc1b3a82ddd

Searched for: b18d74ad-b087-441c-9e9d-2a1112927a7e|syncImage|Thread-278504:|6df1ac7e-1992-4789-b3c3-de7a7f1d984f|Thread-278447:|7667b392-6998-4cdd-b4ab-c41006d72347|Thread-278512:|Thread-278513:|ba6e3d5a-8a12-45da-aa8f-5742adc8246d:|FAILED|^44e0467c-9e18-4262-9f9a-ba232aee7523|Thread-278570:|I am

Comment 6 Ayal Baron 2013-09-05 20:31:27 UTC
So from the logs above:
1. user called LSM
2. during syncImageData, user shut down the VM
3. when syncImageData finished, engine threw an exception
4. moveImage fails on VolumeAlreadyExists

Still need to track further to see how we reach the 'There is no leaf in the image' error.

Comment 7 Federico Simoncelli 2013-10-10 07:43:57 UTC
2013-06-24 17:40:23,721 ERROR [org.ovirt.engine.core.vdsbroker.ResourceManager] (pool-3-thread-27) [db2f067] CreateCommand failed: org.ovirt.engine.core.common.errors.VdcBLLException: VdcBLLException: Vds with id: 00000000-0000-0000-0000-000000000000 was not found

That is fixed by:

http://gerrit.ovirt.org/#/c/19876/

Comment 8 Federico Simoncelli 2013-10-10 15:29:50 UTC
(In reply to Ayal Baron from comment #6)
> So from the logs above:
> 1. user called LSM
> 2. during syncImageData, user shut down the VM
> 3. when syncImageData finished, engine threw an exception
> 4. moveImage fails on VolumeAlreadyExists

The exception at point 3 should be covered (as per comment 7) by:

http://gerrit.ovirt.org/#/c/19876/

> Still need to track further to see how we reach the 'There is no leaf in the
> image' error.

The 'no leaf in image' error happens before the flow analyzed in comment 6 and it's generated by a different run.

As per cloneImageStructure in sp.py and __getResourceCandidatesList in resourceFactories.py:

 def cloneImageStructure(self, sdUUID, imgUUID, dstSdUUID):
     ...
     # Preparing the ordered resource list to be acquired
     resList = (rmanager.acquireResource(*x) for x in sorted((
         (srcImgResNs, imgUUID, rm.LockType.shared),
         (dstImgResNs, imgUUID, rm.LockType.exclusive),
     )))
     ...

 def __getResourceCandidatesList(self, resourceName, lockType):
     ...
     try:
         chain = image.Image(repoPath).getChain(sdUUID=self.sdUUID,
                                                imgUUID=resourceName)
     except se.ImageDoesNotExistInSD:
         log.debug("Image %s does not exist in domain %s",
                   resourceName, self.sdUUID)
         return []
     ...

The resource manager always tries to find the image using getChain and in this case on the destination there was an incomplete leftover (maybe from a previous run?).

Logs:

Thread-274244::DEBUG::2013-06-19 14:06:55,136::task::579::TaskManager.Task::(_updateState) Task=`9e015034-826d-4d51-93a3-c8caea3c179c`::moving from state init -> state preparing
Thread-274244::INFO::2013-06-19 14:06:55,137::logUtils::40::dispatcher::(wrapper) Run and protect: cloneImageStructure(spUUID='ba63f0c8-37ca-492b-b28c-55ae09c92a6a', sdUUID='ebe26784-f182-430f-9422-ddc162ca064c', imgUUID='c95c5fec-9e3d-4f30-bb75-728a8e772ccf', dstSdUUID='ae744a61-0e73-480e-a113-14649d0e8e73')

...

9e015034-826d-4d51-93a3-c8caea3c179c::DEBUG::2013-06-19 14:06:57,710::resourceManager::486::ResourceManager::(registerResource) Trying to register resource 'ae744a61-0e73-480e-a113-14649d0e8e73_imageNS.c95c5fec-9e3d-4f30-bb75-728a8e772ccf' for lock type 'exclusive'

...

9e015034-826d-4d51-93a3-c8caea3c179c::ERROR::2013-06-19 14:06:58,529::image::332::Storage.Image::(getChain) There is no leaf in the image c95c5fec-9e3d-4f30-bb75-728a8e772ccf
9e015034-826d-4d51-93a3-c8caea3c179c::WARNING::2013-06-19 14:06:58,530::resourceManager::520::ResourceManager::(registerResource) Resource factory failed to create resource 'ae744a61-0e73-480e-a113-14649d0e8e73_imageNS.c95c5fec-9e3d-4f30-bb75-728a8e772ccf'. Canceling request.
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/resourceManager.py", line 518, in registerResource
    obj = namespaceObj.factory.createResource(name, lockType)
  File "/usr/share/vdsm/storage/resourceFactories.py", line 193, in createResource
    lockType)
  File "/usr/share/vdsm/storage/resourceFactories.py", line 122, in __getResourceCandidatesList
    imgUUID=resourceName)
  File "/usr/share/vdsm/storage/image.py", line 333, in getChain
    raise se.ImageIsNotLegalChain(imgUUID)
ImageIsNotLegalChain: Image is not a legal chain: ('c95c5fec-9e3d-4f30-bb75-728a8e772ccf',)

As we see the resource we're trying to acquire here is the destination one (exclusive) on the target storage domain ae744a61-0e73-480e-a113-14649d0e8e73.
In that domain there was still probably a partial leftover of image c95c5fec-9e3d-4f30-bb75-728a8e772ccf from a previous run, and I've been able to find some trace of it in the metadata:

DOMAIN=ae744a61-0e73-480e-a113-14649d0e8e73
VOLTYPE=INTERNAL
CTIME=1371720608
FORMAT=RAW
IMAGE=c95c5fec-9e3d-4f30-bb75-728a8e772ccf
DISKTYPE=2
PUUID=00000000-0000-0000-0000-000000000000
LEGALITY=LEGAL
MTIME=1371722493
POOL_UUID=
SIZE=25165824
TYPE=PREALLOCATED
DESCRIPTION=
EOF

It is possible to reproduce this manually:

# lvs @IU_5cc65754-ab9c-44f3-b088-4e5faa025bc3
  LV                                   VG                                   Attr      LSize
  048e78d7-f15a-44f2-8596-865070bde72b 864df2d0-b022-435f-a9d1-b7ac0bd766bb -wi------ 128.00m
  0625195c-c1fb-4a20-be66-789921bcfe68 864df2d0-b022-435f-a9d1-b7ac0bd766bb -wi------ 128.00m

# vdsClient 0 cloneImageStructure 98da1408-948d-4cab-9a8b-418914be9f07 864df2d0-b022-435f-a9d1-b7ac0bd766bb 5cc65754-ab9c-44f3-b088-4e5faa025bc3 c8c60dca-3ec8-4ea0-8135-d929070055cb

# lvs @IU_5cc65754-ab9c-44f3-b088-4e5faa025bc3
  LV                                   VG                                   Attr      LSize
  048e78d7-f15a-44f2-8596-865070bde72b 864df2d0-b022-435f-a9d1-b7ac0bd766bb -wi------ 128.00m
  0625195c-c1fb-4a20-be66-789921bcfe68 864df2d0-b022-435f-a9d1-b7ac0bd766bb -wi------ 128.00m
  048e78d7-f15a-44f2-8596-865070bde72b c8c60dca-3ec8-4ea0-8135-d929070055cb -wi------ 128.00m
  0625195c-c1fb-4a20-be66-789921bcfe68 c8c60dca-3ec8-4ea0-8135-d929070055cb -wi------ 128.00m

# lvremove c8c60dca-3ec8-4ea0-8135-d929070055cb/0625195c-c1fb-4a20-be66-789921bcfe68
  Logical volume "0625195c-c1fb-4a20-be66-789921bcfe68" successfully removed

# lvs @IU_5cc65754-ab9c-44f3-b088-4e5faa025bc3
  LV                                   VG                                   Attr      LSize
  048e78d7-f15a-44f2-8596-865070bde72b 864df2d0-b022-435f-a9d1-b7ac0bd766bb -wi------ 128.00m
  0625195c-c1fb-4a20-be66-789921bcfe68 864df2d0-b022-435f-a9d1-b7ac0bd766bb -wi------ 128.00m
  048e78d7-f15a-44f2-8596-865070bde72b c8c60dca-3ec8-4ea0-8135-d929070055cb -wi------ 128.00m

# vdsClient 0 cloneImageStructure 98da1408-948d-4cab-9a8b-418914be9f07 864df2d0-b022-435f-a9d1-b7ac0bd766bb 5cc65754-ab9c-44f3-b088-4e5faa025bc3 c8c60dca-3ec8-4ea0-8135-d929070055cb

6f189c2d-6d35-4757-bf9e-71a538f7288c::ERROR::2013-10-10 10:02:37,829::image::203::Storage.Image::(getChain) There is no leaf in the image 5cc65754-ab9c-44f3-b088-4e5faa025bc3
6f189c2d-6d35-4757-bf9e-71a538f7288c::WARNING::2013-10-10 10:02:37,829::resourceManager::590::ResourceManager::(registerResource) Resource factory failed to create resource 'c8c60dca-3ec8-4ea0-8135-d929070055cb_imageNS.5cc65754-ab9c-44f3-b088-4e5faa025bc3'. Canceling request.
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/resourceManager.py", line 586, in registerResource
    obj = namespaceObj.factory.createResource(name, lockType)
  File "/usr/share/vdsm/storage/resourceFactories.py", line 193, in createResource
    lockType)
  File "/usr/share/vdsm/storage/resourceFactories.py", line 122, in __getResourceCandidatesList
    imgUUID=resourceName)
  File "/usr/share/vdsm/storage/image.py", line 204, in getChain
    raise se.ImageIsNotLegalChain(imgUUID)
ImageIsNotLegalChain: Image is not a legal chain: ('5cc65754-ab9c-44f3-b088-4e5faa025bc3',)


Anyway we don't have any indication on how that leftover was still/already present on the destination (no logs) but I think it might have been caused by a previous syncImageData that was keeping the lv open during the deletion.

In my opinion there's nothing new to take care of here.

This is probably a duplicate of both bug 957703 and bug 1000816

Comment 10 Charlie 2013-11-28 00:29:10 UTC
This bug is currently attached to errata RHBA-2013:15291. If this change is not to be documented in the text for this errata please either remove it from the errata, set the requires_doc_text flag to 
minus (-), or leave a "Doc Text" value of "--no tech note required" if you do not have permission to alter the flag.

Otherwise to aid in the development of relevant and accurate release documentation, please fill out the "Doc Text" field above with these four (4) pieces of information:

* Cause: What actions or circumstances cause this bug to present.
* Consequence: What happens when the bug presents.
* Fix: What was done to fix the bug.
* Result: What now happens when the actions or circumstances above occur. (NB: this is not the same as 'the bug doesn't present anymore')

Once filled out, please set the "Doc Type" field to the appropriate value for the type of change made and submit your edits to the bug.

For further details on the Cause, Consequence, Fix, Result format please refer to:

https://bugzilla.redhat.com/page.cgi?id=fields.html#cf_release_notes 

Thanks in advance.

Comment 11 Leonid Natapov 2013-12-29 16:02:37 UTC
is29. image can be moved after unsuccessful LSM and merge snapshots.

Comment 13 errata-xmlrpc 2014-01-21 16:26:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0040.html


Note You need to log in before you can comment on or make changes to this bug.