Description of problem: Sometimes storage migration (I've seen it so far only for cold migration) fails to complete and logs are showing the error - Cannot create Logical Volume. Version-Release number of selected component (if applicable): ovirt-engine-4.2.0-0.0.master.20171001151854.git75e5c70.el7.centos.noarch vdsm-4.20.3-126.git48cdf1b.el7.centos.x86_64 How reproducible: ~50% Steps to Reproduce: 1. create vm with disks 2. migrate the disks to a different storage domain Actual results: In some cases, the migration fails Expected results: All disks should move to the new storage domain Additional info: relevant disk migration correlationID - disks_syncAction_f463ead4-87ac-44c8 engine.log 2017-10-02 23:26:20,640+03 ERROR [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-58) [] BaseAsyncTask::logEndTaskFailure: Task '4978ff6e-cd20-4c88-9659-67d9fa0e0b5e' (Parent Command 'CreateVolumeContainer', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended with failure: -- Result: 'cleanSuccess' -- Message: 'VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = Cannot create Logical Volume: ('1fcd4803-1fe6-4526-a778-aecec4f3f235', 'd50cc533-3398-4b2e-8ef0-93a55e372fc2'), code = 550', -- Exception: 'VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = Cannot create Logical Volume: ('1fcd4803-1fe6-4526-a778-aecec4f3f235', 'd50cc533-3398-4b2e-8ef0-93a55e372fc2'), code = 550' 2017-10-02 23:26:31,055+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-87) [] EVENT_ID: USER_MOVED_DISK_FINISHED_FAILURE(2,011), User admin@internal-authz have failed to move disk disk_virtio_scsiraw_0223223354 to domain iscsi_0. vdsm.log 2017-10-02 23:26:11,940+0300 ERROR (tasks/4) [storage.Volume] Failed to create volume /rhev/data-center/mnt/blockSD/1fcd4803-1fe6-4526-a778-aecec4f3f235/images/78292e21-b44c-4be2-9879-33636057c9c3/d50cc533-3398-4b2e-8ef0-93a55e372fc2: Cannot create Logical Volume: ('1fcd4803-1fe6-4526-a778-aecec4f3f235', 'd50cc533-3398-4b2e-8ef0-93a55e372fc2') (volume:1159) 2017-10-02 23:26:11,941+0300 ERROR (tasks/4) [storage.Volume] Unexpected error (volume:1196) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 1156, in create initialSize=initialSize) File "/usr/lib/python2.7/site-packages/vdsm/storage/blockVolume.py", line 501, in _create initialTags=(sc.TAG_VOL_UNINIT,)) File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", line 1161, in createLV raise se.CannotCreateLogicalVolume(vgName, lvName) CannotCreateLogicalVolume: Cannot create Logical Volume: ('1fcd4803-1fe6-4526-a778-aecec4f3f235', 'd50cc533-3398-4b2e-8ef0-93a55e372fc2')
Created attachment 1333528 [details] logs
Unfortunately, the log isn't too useful, as the error in VDSM just indicates that the LV could not be created but doesn't indicate WHY this happened. The attached patch, or some variant thereof, should improve the log to carry this information and help debugging this issue.
Lilach - now that the attached patch is merged, we should have some better logs if such a failure occurs. Could you please rerun your scenario with a build that includes commit 00c30e7be (If you wait for **today**'s nightly, you could probably use it tomorrow morning) and attach the logs here? Thanks!
Created attachment 1340046 [details] new-logs Adding new engine and vdsm logs relevant migration correlation ID is: disks_syncAction_f30c8e95-2106-458c
From VDSM log: 2017-10-18 11:32:38,509+0300 DEBUG (tasks/4) [storage.Misc.excCmd] FAILED: <err> = ' Logical Volume "791f95fd-8965-432c-b197-23554c19dbdd" already exists in volume group "73461dc7-d144-4cb2-a016-e8c633b13630"\n'; <rc> = 5 (commands:94) Benny, does this make any sense to you?
2017-10-18 11:32:39,331+0300 ERROR (tasks/4) [storage.TaskManager.Task] (Task='128df0df-af53-45de-a3c8-35262b83eaf6') Unexpected error (task:872) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 879, in _run return fn(*args, **kargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 333, in run return self.cmd(*self.argslist, **self.argsdict) File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper return method(self, *args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 1936, in createVolume initialSize=initialSize) File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 770, in createVolume initialSize=initialSize) File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 1201, in create (volUUID, e)) VolumeCreationError: Error creating a new volume: ('Volume creation 791f95fd-8965-432c-b197-23554c19dbdd failed: [Errno 2] No such file or directory',) Looks like 791f95fd-8965-432c-b197-23554c19dbdd was removed earlier: 2017-10-18 11:32:39,324+0300 WARN (tasks/4) [root] File: /rhev/data-center/mnt/blockSD/73461dc7-d144-4cb2-a016-e8c633b13630/images/490e84a1-3348-4095-b4e3-d2d201f8ed93/791f95fd-8965-432c-b197-23554c19dbdd already removed (fileutils:51) 2017-10-18 11:32:39,119+0300 DEBUG (tasks/5) [storage.StorageDomainManifest] removed image dir: /rhev/data-center/mnt/blockSD/73461dc7-d144-4cb2-a016-e8c633b13630/images/490e84a1-3348-4095-b4e3-d2d201f8ed93 (blockSD:702) I'll have a deeper look at this
Updates: This looks like a race between the volume removal and volume creation when moving the same disk twice back to the source relatively fast. Since the same disk is moved twice from and to the same SD, the cold move operation which consists of copy and remove might still not have finished the remove part, which can be observed in the log snippets below ----- volume delete issued --------- 2017-10-18 11:32:29,347+0300 INFO (jsonrpc/0) [vdsm.api] START deleteImage(sdUUID='73461dc7-d144-4cb2-a016-e8c633b13630', spUUID='7241d37d-72b2-4857-b522-0162a0872fd8', imgUUID='490e84a1-3348-4095-b4e3-d2d201f8ed93', postZero='true', force='false', discard=True) from=::ffff:10.35.69.27,58374, flow_id=disks_syncAction_7be3cc5b-2c0b-4fa0, task_id=e0c7c0c8-0579-4330-90b9-4a330c6cf574 (api:46) ---- volume create issued --------- 2017-10-18 11:32:37,431+0300 INFO (jsonrpc/6) [vdsm.api] START createVolume(sdUUID='73461dc7-d144-4cb2-a016-e8c633b13630', spUUID='7241d37d-72b2-4857-b522-0162a0872fd8', imgUUID='490e84a1-3348-4095-b4e3-d2d201f8ed93', size='1073741824', volFormat=5, preallocate=1, diskType=2, volUUID='791f95fd-8965-432c-b197-23554c19dbdd', desc=None, srcImgUUID='00000000-0000-0000-0000-000000000000', srcVolUUID='00000000-0000-0000-0000-000000000000', initialSize=None) from=::ffff:10.35.69.27,58374, flow_id=disks_syncAction_f30c8e95-2106-458c, task_id=128df0df-af53-45de-a3c8-35262b83eaf6 (api:46) 2017-10-18 11:32:39,119+0300 DEBUG (tasks/5) [storage.StorageDomainManifest] removed: ['/rhev/data-center/mnt/blockSD/73461dc7-d144-4cb2-a016-e8c633b13630/images/490e84a1-3348-4095-b4e3-d2d201f8ed93/791f95fd-8965-432c-b197-23554c19dbdd'] (blockSD:691) 2017-10-18 11:32:39,119+0300 DEBUG (tasks/5) [storage.StorageDomainManifest] removed image dir: /rhev/data-center/mnt/blockSD/73461dc7-d144-4cb2-a016-e8c633b13630/images/490e84a1-3348-4095-b4e3-d2d201f8ed93 (blockSD:702) ---- volume delete finished ---- 2017-10-18 11:32:39,148+0300 INFO (tasks/5) [storage.ThreadPool.WorkerThread] FINISH task e0c7c0c8-0579-4330-90b9-4a330c6cf574 (threadPool:210) This leaves us without the 490e84a1-3348-4095-b4e3-d2d201f8ed93 image where 791f95fd-8965-432c-b197-23554c19dbdd is about to be created So the steps to reproduce can be updated to: 1. Move a disk and make sure removal is delayed 2. Move the same disk again before removal is complete
*** Bug 1559929 has been marked as a duplicate of this bug. ***
Created attachment 1455748 [details] new-logs2 Encountered again in our automation, logs attached. 2018-06-29 15:49:19,728+0300 ERROR (tasks/4) [storage.TaskManager.Task] (Task='2195d959-9472-4a32-b4d7-b06a4b9b578e') Unexpected error (task:875) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run return fn(*args, **kargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 336, in run return self.cmd(*self.argslist, **self.argsdict) File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper return method(self, *args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 1943, in createVolume initialSize=initialSize) File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 801, in createVolume initialSize=initialSize) File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 1203, in create initialSize=initialSize) File "/usr/lib/python2.7/site-packages/vdsm/storage/blockVolume.py", line 505, in _create initialTags=(sc.TAG_VOL_UNINIT,)) File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", line 1153, in createLV raise se.CannotCreateLogicalVolume(vgName, lvName, err) CannotCreateLogicalVolume: Cannot create Logical Volume: 'vgname=f54afc10-74ad-4ae5-99b5-32f1ac874c44 lvname=f28ca5d7-61ed-4337-b265-f03c3751296c err=[\' Logical Volume "f28ca5d7-61ed-4337-b265-f03c3751296c" al ready exists in volume group "f54afc10-74ad-4ae5-99b5-32f1ac874c44"\']'
Created attachment 1480751 [details] logs3 Occurs every time for me for a specific disk. Is there any progress here? 2018-09-04 13:49:50,105+0300 INFO (tasks/9) [storage.ThreadPool.WorkerThread] START task de164555-6635-4f37-b431-c452f5c94057 (cmd=<bound method Task.commit of <vdsm.storage.task.Task instance at 0x7f7508782ef0 >>, args=None) (threadPool:208) 2018-09-04 13:49:50,561+0300 INFO (tasks/9) [storage.Volume] Creating volume b35ae5df-994f-47a2-8bc0-06a883f273ab (volume:1185) 2018-09-04 13:49:50,632+0300 INFO (tasks/9) [storage.LVM] Creating LV (vg=320dcacd-8372-4a4d-84dd-09b2fea91536, lv=b35ae5df-994f-47a2-8bc0-06a883f273ab, size=6784m, activate=True, contiguous=False, initialTags= ('OVIRT_VOL_INITIALIZING',)) (lvm:1139) 2018-09-04 13:49:51,006+0300 ERROR (tasks/9) [storage.Volume] Failed to create volume /rhev/data-center/mnt/blockSD/320dcacd-8372-4a4d-84dd-09b2fea91536/images/4998ee5e-a5db-42e0-bda7-4a5f9dcfbb1c/b35ae5df-994f- 47a2-8bc0-06a883f273ab: Cannot create Logical Volume: u'vgname=320dcacd-8372-4a4d-84dd-09b2fea91536 lvname=b35ae5df-994f-47a2-8bc0-06a883f273ab err=[\' Logical Volume "b35ae5df-994f-47a2-8bc0-06a883f273ab" alre ady exists in volume group "320dcacd-8372-4a4d-84dd-09b2fea91536"\']' (volume:1209) 2018-09-04 13:49:51,007+0300 ERROR (tasks/9) [storage.Volume] Unexpected error (volume:1246) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 1206, in create initialSize=initialSize) File "/usr/lib/python2.7/site-packages/vdsm/storage/blockVolume.py", line 505, in _create initialTags=(sc.TAG_VOL_UNINIT,)) File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", line 1153, in createLV raise se.CannotCreateLogicalVolume(vgName, lvName, err) CannotCreateLogicalVolume: Cannot create Logical Volume: u'vgname=320dcacd-8372-4a4d-84dd-09b2fea91536 lvname=b35ae5df-994f-47a2-8bc0-06a883f273ab err=[\' Logical Volume "b35ae5df-994f-47a2-8bc0-06a883f273ab" a lready exists in volume group "320dcacd-8372-4a4d-84dd-09b2fea91536"\']' 2018-09-04 13:49:51,008+0300 ERROR (tasks/9) [storage.TaskManager.Task] (Task='de164555-6635-4f37-b431-c452f5c94057') Unexpected error (task:875) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run return fn(*args, **kargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 336, in run return self.cmd(*self.argslist, **self.argsdict) File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper return method(self, *args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 1943, in createVolume initialSize=initialSize) File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 799, in createVolume initialSize=initialSize) File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 1206, in create initialSize=initialSize) File "/usr/lib/python2.7/site-packages/vdsm/storage/blockVolume.py", line 505, in _create initialTags=(sc.TAG_VOL_UNINIT,)) File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", line 1153, in createLV raise se.CannotCreateLogicalVolume(vgName, lvName, err) CannotCreateLogicalVolume: Cannot create Logical Volume: u'vgname=320dcacd-8372-4a4d-84dd-09b2fea91536 lvname=b35ae5df-994f-47a2-8bc0-06a883f273ab err=[\' Logical Volume "b35ae5df-994f-47a2-8bc0-06a883f273ab" already exists in volume group "320dcacd-8372-4a4d-84dd-09b2fea91536"\']' 2018-09-04 13:49:51,115+0300 INFO (tasks/9) [storage.Volume] startCreateVolumeRollback: sdUUID=320dcacd-8372-4a4d-84dd-09b2fea91536 imgUUID=4998ee5e-a5db-42e0-bda7-4a5f9dcfbb1c volUUID=b35ae5df-994f-47a2-8bc0-06a883f273ab (volume:1094
LV b35ae5df-994f-47a2-8bc0-06a883f273ab does not exist on VG 320dcacd-8372-4a4d-84dd-09b2fea91536: [root@tigris04 ~]# lvs 320dcacd-8372-4a4d-84dd-09b2fea91536 |grep b35ae5df-994f-47a2-8bc0-06a883f273ab [root@tigris04 ~]#
This bug has not been marked as blocker for oVirt 4.3.0. Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.
The issue is the way deleted volumes are renamed. On block storage, we change the image id to __remove_<image-id> or _remove_me_ZERO_<image-id> This removes the volume from the namespace, but creating the same volume again will fail, since we don't rename the remove volume lv. To fix this we must change management of deleted volumes, so both image-id (stored in lvm tag IU_<image-id>) and volume name are renamed. The issue is doing both operation in a single lvm command, so the operation is atomic. David, do we have a way to preform both: lvrename vg/lv vg/lv_new lvchange --addtag IU__removing__ --deltag IU_image-id vg/lv_new in a single atomic operation? This should be possible since both operation on inactive lvs are changing the vg metadata which can be fully written to the metadata area as one transaction. If not we need to do 2 operations, and handle new failure more when the first operation succeeded, but the second failed. However we implement it, we need to change the flow so engine set the new image id and volume id, and keep tracking deleted volumes. In the current code engine cannot track the image after calling Volume.delete() or Image.delete(), since vdsm controls the deleted image id. This sounds like RFE for 4.4.
This bug/RFE is more than 2 years old and it didn't get enough attention so far, and is now flagged as pending close. Please review if it is still relevant and provide additional details/justification/patches if you believe it should get more attention for the next oVirt release.
This bug didn't get any attention in a long time, and it's not planned in foreseeable future. oVirt development team has no plans to work on it. Please feel free to reopen if you have a plan how to contribute this feature/bug fix.