Description of problem: This is taken from Bug 1611929 , all logs are there. You can see from that bug Benny's remark that cleanup is attempted for the first time and fails (due to blocked storage ) . The expected result: Add logic in the case of a leftover LV to wait for the SD to be available again and then delete the leftover LV Description of problem: VM + 4 disks have 4 matching volumes before snapshot . Create snapshot and block connection SPM -> disks storage domain thus failing the snapshot creation. Indeed I see snapshot is deleted but when I check the number of volumes(lvs) per each disk I see that one disk has 2 lv's instead of one - Leftover from failed snapshot operation. VDSM - ERRORS Appears in ROLLBACK : 2018-08-02 22:07:55,384+0300 DEBUG (tasks/5) [storage.TaskManager.Task] (Task='f838b186-7ee4-4643-a0b2-3addf43d5482') running recovery Create volume metadata rollback: BlockVolume->createVolumeMetadataRollback(c a5552b7-305a-4667-9688-a42c180806ff,12) (task:812) 2018-08-02 22:07:55,384+0300 INFO (tasks/5) [storage.Volume] Metadata rollback for sdUUID=ca5552b7-305a-4667-9688-a42c180806ff offs=12 (blockVolume:489) 2018-08-02 22:07:55,389+0300 ERROR (tasks/5) [storage.TaskManager.Task] (Task='f838b186-7ee4-4643-a0b2-3addf43d5482') Unexpected error (task:875) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run return fn(*args, **kargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 405, in run return function(self.task, *argslist) File "/usr/lib/python2.7/site-packages/vdsm/storage/blockVolume.py", line 491, in createVolumeMetadataRollback {"NONE": "#" * (sc.METADATA_SIZE - 10)}) File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 873, in _putMetadata cls.manifestClass._putMetadata(metaId, meta) File "/usr/lib/python2.7/site-packages/vdsm/storage/blockVolume.py", line 244, in _putMetadata f.write(data) File "/usr/lib/python2.7/site-packages/vdsm/storage/directio.py", line 161, in write raise OSError(err, msg) OSError: [Errno 5] Input/output error 2018-08-02 22:07:55,389+0300 DEBUG (tasks/5) [storage.TaskManager.Task] (Task='f838b186-7ee4-4643-a0b2-3addf43d5482') Task._run: f838b186-7ee4-4643-a0b2-3addf43d5482 () {} failed - stopping task (task:894) 2018-08-02 22:07:55,389+0300 DEBUG (tasks/5) [storage.TaskManager.Task] (Task='f838b186-7ee4-4643-a0b2-3addf43d5482') stopping in state raborting (force False) (task:1256) 2018-08-02 22:07:55,389+0300 DEBUG (tasks/5) [storage.TaskManager.Task] (Task='f838b186-7ee4-4643-a0b2-3addf43d5482') ref 1 aborting True (task:1002) 2018-08-02 22:07:55,389+0300 WARN (tasks/5) [storage.TaskManager.Task] (Task='f838b186-7ee4-4643-a0b2-3addf43d5482') task f838b186-7ee4-4643-a0b2-3addf43d5482: recovery failed: Task is aborted: u'[Errno 5] Input/output error' - code 100 (task:818) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 815, in _recover self._run(rec.run) File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 896, in _run raise se.TaskAborted(message, code) TaskAborted: Task is aborted: u'[Errno 5] Input/output error' - code 100 2018-08-02 22:07:55,390+0300 DEBUG (tasks/5) [storage.TaskManager.Task] (Task='f838b186-7ee4-4643-a0b2-3addf43d5482') moving from state raborting -> state raborting (task:602) 2018-08-02 22:08:03,412+0300 ERROR (monitor/ca5552b) [storage.Monitor] Error checking domain ca5552b7-305a-4667-9688-a42c180806ff (monitor:424) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 405, in _checkDomainStatus self.domain.selftest() File "/usr/lib/python2.7/site-packages/vdsm/storage/blockSD.py", line 1171, in selftest elif lvm.getVG(self.sdUUID).partial != lvm.VG_OK: File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", line 81, in __getattr__ raise AttributeError("Failed reload: %s" % self.name) AttributeError: Failed reload: ca5552b7-305a-4667-9688-a42c180806ff ENGINE log: 2018-08-02 22:08:08,672+03 ERROR [org.ovirt.engine.core.bll.snapshots.CreateSnapshotCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-94) [snapshots_create_cbd1c26b-d708-47f7] Ending command 'org.ovirt.engine.core.bll.snapshots.CreateSnapshotCommand' with failure. 2018-08-02 22:08:08,692+03 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.DestroyImageVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-94) [snapshots_create_cbd1c26b-d708-47f7] START, DestroyImageVDSCommand( DestroyImageVDSCommandParameters:{storagePoolId='289db27b-7067-4c88-8311-88e0092e692d', ignoreFailoverLimit='false', storageDomainId='ca5552b7-305a-4667-9688-a42c180806ff', imageGroupId='b4f0b2c4-e99f-47b4-a581-2fd19bd19feb', imageId='00000000-0000-0000-0000-000000000000', imageList='[9dfe49e0-ec55-4e7d-b16c-2c45dce3d6e9]', postZero='false', force='true'}), log id: 32b1981b 2018-08-02 22:08:09,812+03 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.DestroyImageVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-94) [snapshots_create_cbd1c26b-d708-47f7] FINISH, DestroyImageVDSCommand, log id: 32b1981b 2018-08-02 22:08:09,858+03 INFO [org.ovirt.engine.core.bll.snapshots.CreateSnapshotForVmCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-94) [] Lock freed to object 'EngineLock:{exclusiveLocks='[42cb2951-a16b-4231-be68-7129c39e8dae=VM]', sharedLocks=''}' 2018-08-02 22:08:09,883+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-94) [] EVENT_ID: USER_CREATE_SNAPSHOT_FINISHED_FAILURE(69), Failed to complete snapshot 'snap_TestCase6015_0222065805' creation for VM 'vm_TestCase6015_0222022658'. 2018-08-02 22:08:09,884+03 WARN [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-94) [] Command 'CreateSnapshotForVm' id: '8d8028d7-58c1-4043-878a-f8a476a701ab' end method execution failed, as the command isn't marked for endAction() retries silently ignoring 2018-08-02 22:08:30,893+03 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (EE-ManagedThreadFactory-engine-Thread-52059) [] Domain 'd2fb82b9-7545-49be-88e0-988f9ce9c46e:iscsi_0' recovered from problem. vds: 'host_mixed_1' 2018-08-02 22:08:30,893+03 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (EE-ManagedThreadFactory-engine-Thread-52059) [] Domain 'd2fb82b9-7545-49be-88e0-988f9ce9c46e:iscsi_0' has recovered from problem. No active host in the DC is reporting it as problematic, so clearing the domain recovery timer. 2018-08-02 22:08:30,894+03 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (EE-ManagedThreadFactory-engine-Thread-52059) [] Domain 'ca5552b7-305a-4667-9688-a42c180806ff:iscsi_1' recovered from problem. vds: 'host_mixed_1' 2018-08-02 22:08:30,894+03 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (EE-ManagedThreadFactory-engine-Thread-52059) [] Domain 'ca5552b7-305a-4667-9688-a42c180806ff:iscsi_1' has recovered from problem. No active host in the DC is reporting it as problematic, so clearing the domain recovery timer. 2018-08-02 22:08:30,894+03 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (EE-ManagedThreadFactory-engine-Thread-52059) [] Domain '3367b82a-21bc-47e0-83d4-741e0745b241:iscsi_2' recovered from problem. vds: 'host_mixed_1' 2018-08-02 22:08:30,894+03 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (EE-ManagedThreadFactory-engine-Thread-52059) [] Domain '3367b82a-21bc-47e0-83d4-741e0745b241:iscsi_2' has recovered from problem. No active host in the DC is reporting it as problematic, so clearing the domain recovery timer. Version-Release number of selected component (if applicable): 4.2.6_SNAPSHOT-84.gad3de30.0.scratch.master.el7ev How reproducible: Happened once so far Steps to Reproduce: 1) VM with 4 disks attached , each disk has 1 volume . Disks ID: 2035083e-d5af-4145-9b2b-71b9b77a2873 43f313e1-7679-44fd-8903-078f2cf740b5 43f313e1-7679-44fd-8903-078f2cf740b5 43f313e1-7679-44fd-8903-078f2cf740b5 2) check on SPM the matching volumes to the created disks . As each disk have 1 volumes we get a total of 4 volumes . 3) Fail Create snapshot by blocking connection from SPM to storage domain that the disks resides on (in this case SD 'iscsi_1') -> Snapshot creation Failed Aug 2, 2018, 10:08:09 PM Failed to complete snapshot 'snap_TestCase6015_0222065805' creation for VM 'vm_TestCase6015_0222022658'. 3) Unblock connection from SPM -> SD . 2018-08-02 22:08:14 4) Now check on SPM the matching volumes 2018-08-02 22:08:37 Actual results: We expect 4 volumes(lvs) but instead we get 5 volumes! Leftover on 1 disk image seen by running the following 3 commands from SPM host : [root@storage-ge8-vdsm1 ~]# lvs -o lv_name,lv_tagslvs -o lv_name,lv_tags | grep 2035083e-d5af-4145-9b2b-71b9b77a2873 | wc -l => GIVES 2 volumes instead of once MEANING LEFTOVER IF FOUND [root@storage-ge8-vdsm1 ~]# lvs -o lv_name,lv_tagslvs -o lv_name,lv_tags | grep 2035083e-d5af-4145-9b2b-71b9b77a2873 | 1) 43f313e1-7679-44fd-8903-078f2cf740b5 IU_2035083e-d5af-4145-9b2b-71b9b77a2873,MD_9,PU_00000000-0000-0000-0000-000000000000 2) 70a9c30e-1845-4278-8685-159b78f2d051 IU_2035083e-d5af-4145-9b2b-71b9b77a2873,MD_12,PU_43f313e1-7679-44fd-8903-078f2cf740b5 => THIS IS THE LEFTOVER FROM THE SNAPSHOT !!! YOU CAN SEE HIS PARENT IS THE ORIGINAL DISK VOLUME . 2018-08-03 06:58:00,873 - MainThread - RemoteExecutor - DEBUG - [root.82.79/qum5net] Executing: lvs -o lv_name,lv_tags | grep b4f0b2c4-e99f-47b4-a581-2fd19bd19feb | wc -l 2018-08-03 06:58:01,762 - MainThread - RemoteExecutor - DEBUG - [root.82.79/qum5net] Results of command: lvs -o lv_name,lv_tags | grep b4f0b2c4-e99f-47b4-a581-2fd19bd19feb | wc -l 2018-08-03 06:58:01,763 - MainThread - RemoteExecutor - DEBUG - [root.82.79/qum5net] OUT: 1 -- 2018-08-03 06:58:04,484 - MainThread - RemoteExecutor - DEBUG - [root.82.79/qum5net] Executing: lvs -o lv_name,lv_tags | grep 2035083e-d5af-4145-9b2b-71b9b77a2873 | wc -l 2018-08-03 06:58:05,477 - MainThread - RemoteExecutor - DEBUG - [root.82.79/qum5net] Results of command: lvs -o lv_name,lv_tags | grep 2035083e-d5af-4145-9b2b-71b9b77a2873 | wc -l 2018-08-03 06:58:05,478 - MainThread - RemoteExecutor - DEBUG - [root.82.79/qum5net] OUT: 2 => LEFTOVER FOUND !!! -- 2018-08-03 06:58:08,492 - MainThread - RemoteExecutor - DEBUG - [root.82.79/qum5net] Executing: lvs -o lv_name,lv_tags | grep 7ed04b54-e589-4a1d-bccb-912d53e298e2 | wc -l 2018-08-03 06:58:09,496 - MainThread - RemoteExecutor - DEBUG - [root.82.79/qum5net] Results of command: lvs -o lv_name,lv_tags | grep 7ed04b54-e589-4a1d-bccb-912d53e298e2 | wc -l 2018-08-03 06:58:09,498 - MainThread - RemoteExecutor - DEBUG - [root.82.79/qum5net] OUT: 1 -- 2018-08-03 06:58:12,392 - MainThread - RemoteExecutor - DEBUG - [root.82.79/qum5net] Executing: lvs -o lv_name,lv_tags | grep df895e8d-5f44-447c-9331-a18516efc2b9 | wc -l 2018-08-03 06:58:13,201 - MainThread - RemoteExecutor - DEBUG - [root.82.79/qum5net] Results of command: lvs -o lv_name,lv_tags | grep df895e8d-5f44-447c-9331-a18516efc2b9 | wc -l 2018-08-03 06:58:13,202 - MainThread - RemoteExecutor - DEBUG - [root.82.79/qum5net] OUT: 1 Expected results: We expect to get the same volume amount as before snapshot creation. Additional info:
I would expect that we either: 1) Wait until storage domain is up before trying to delete the leftover volume or 2)Re-attempt removing leftover volumes every X seconds so when the storage is available we will remove it. If we leave it in the current solution we can have multiple volumes leftovers after a failed snapshot creation ( think of a VM/s with multiple disks)
This bug/RFE is more than 2 years old and it didn't get enough attention so far, and is now flagged as pending close. Please review if it is still relevant and provide additional details/justification/patches if you believe it should get more attention for the next oVirt release.
This bug didn't get any attention in a long time, and it's not planned in foreseeable future. oVirt development team has no plans to work on it. Please feel free to reopen if you have a plan how to contribute this feature/bug fix.