+++ This bug is a downstream clone. The original bug is: +++ +++ bug 1502488 +++ ====================================================================== Description of problem: As per the bug 1420405, we are using qemu-img check to find the actual size of the volume. However, if the base image is having leaked cluster, then the qemu-img check will be having a return value of 3. # qemu-img check /rhev/data-center/00000001-0001-0001-0001-000000000311/8257cf14-d88d-4e4e-998c-9f8976dac2a2/images/7455de38-1df1-4acd-b07c-9dc2138aafb3/be4a4d85-d7e6-4725-b7f5-90c9d935c336 ; echo $? 200 leaked clusters were found on the image. This means waste of disk space, but no harm to data. 109461/180224 = 60.74% allocated, 15.29% fragmented, 0.00% compressed clusters Image end offset: 7188578304 3 Anything other than 0 exit code will fail with below error in the vdsm side during cold merge. === 2017-10-16 10:09:32,950+0530 DEBUG (tasks/0) [root] /usr/bin/taskset --cpu-list 0-3 /usr/bin/qemu-img check --output json -f qcow2 /rhev/data-center/mnt/blockSD/8257cf14-d88d-4e4e-998c-9f8976dac2a2/images/7455de38-1df1-4acd-b07c-9dc2138aafb3/be4a4d85-d7e6-4725-b7f5-90c9d935c336 (cwd None) (commands:69) 2017-10-16 10:09:33,576+0530 ERROR (tasks/0) [storage.TaskManager.Task] (Task='59404af6-b400-4e08-9691-9a64cdf00374') Unexpected error (task:872) Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 879, in _run return fn(*args, **kargs) File "/usr/share/vdsm/storage/task.py", line 333, in run return self.cmd(*self.argslist, **self.argsdict) File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper return method(self, *args, **kwargs) File "/usr/share/vdsm/storage/sp.py", line 1892, in finalizeMerge merge.finalize(subchainInfo) File "/usr/share/vdsm/storage/merge.py", line 271, in finalize optimal_size = subchain.base_vol.optimal_size() File "/usr/share/vdsm/storage/blockVolume.py", line 440, in optimal_size check = qemuimg.check(self.getVolumePath(), qemuimg.FORMAT.QCOW2) File "/usr/lib/python2.7/site-packages/vdsm/qemuimg.py", line 156, in check out = _run_cmd(cmd) File "/usr/lib/python2.7/site-packages/vdsm/qemuimg.py", line 416, in _run_cmd raise QImgError(cmd, rc, out, err) QImgError: cmd=['/usr/bin/qemu-img', 'check', '--output', 'json', '-f', 'qcow2', '/rhev/data-center/mnt/blockSD/8257cf14-d88d-4e4e-998c-9f8976dac2a2/images/7455de38-1df1-4acd-b07c-9dc2138aafb3/be4a4d85-d7e6-4725-b7f5-90c9d935c336'], ecode=3, stdout={ QImgError: cmd=['/usr/bin/qemu-img', 'check', '--output', 'json', '-f', 'qcow2', '/rhev/data-center/mnt/blockSD/8257cf14-d88d-4e4e-998c-9f8976dac2a2/images/7455de38-1df1-4acd-b07c-9dc2138aafb3/be4a4d85-d7e6-4725-b7f5-90c9d935c336'], ecode=3, stdout={ "image-end-offset": 7188578304, "total-clusters": 180224, "check-errors": 0, "leaks": 200, "leaks-fixed": 0, "allocated-clusters": 109461, "filename": "/rhev/data-center/mnt/blockSD/8257cf14-d88d-4e4e-998c-9f8976dac2a2/images/7455de38-1df1-4acd-b07c-9dc2138aafb3/be4a4d85-d7e6-4725-b7f5-90c9d935c336", "format": "qcow2", "fragmented-clusters": 16741 } , stderr=Leaked cluster 109202 refcount=1 reference=0 === And in engine log === 2017-10-16 00:39:29,600-04 ERROR [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler7) [12cf8f7] BaseAsyncTask::logEndTaskFailure: Task '59404af6-b400-4e08-9691-9a64cdf00374' (Parent Command 'FinalizeMerge', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended with failure: -- Result: 'cleanSuccess' -- Message: 'VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = cmd=['/usr/bin/qemu-img', 'check', '--output', 'json', '-f', 'qcow2', '/rhev/data-center/mnt/blockSD/8257cf14-d88d-4e4e-998c-9f8976dac2a2/images/7455de38-1df1-4acd-b07c-9dc2138aafb3/be4a4d85-d7e6-4725-b7f5-90c9d935c336'], ecode=3, stdout={ 2017-10-16 00:39:32,468-04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler2) [8b540950-5f08-432d-bcf9-5b2999284532] EVENT_ID: USER_REMOVE_SNAPSHOT_FINISHED_FAILURE(357), Correlation ID: 8b540950-5f08-432d-bcf9-5b2999284532, Job ID: d31f4487-2545-4ba7-a3e4-2cd03f0ea305, Call Stack: null, Custom ID: null, Custom Event ID: -1, Message: Failed to delete snapshot 'test_snap' for VM 'test_vm'. === The cold merge will make the image illegal and the user will not be able to start the VM because of illegal legality in the metadata. We need manual changes to recover the VM. IIUC, the leaked cluster doesn't cause any harm to data and it's just wastage of space. Version-Release number of selected component (if applicable): vdsm-4.19.31-1.el7ev.x86_64 ovirt-engine-4.1.6.2-0.1.el7.noarch How reproducible: 100% Steps to Reproduce: 1. Create a thin provisioned disk in RHV-M and assign to a VM. 2. Kill the qemu-kvm process while we write the data from the VM. While writing the data from the VM, the qcow2 will allocate new cluster and killing the process at this time will cause leaked cluster in the image. 3. Create a snapshot for the VM. 4. Do a cold merge. Actual results: Cold merge will fail if there is leaked cluster in the base image Expected results: Cold merge should work if the base image is having leaked cluster. Additional info: (Originally by Nijin Ashok)
Patches on master already ready; will be backported as soon as they are merged
-------------------------------------- Tested with the following code: ---------------------------------------- rhevm-4.1.8-0.1.el7.noarch vdsm-4.19.38-1.el7ev.x86_64 Tested with the following scenario: Steps to Reproduce: 1. Create a thin provisioned disk in RHV-M and assign to a VM. 2. Kill the qemu-kvm process while we write the data from the VM. While writing the data from the VM, the qcow2 will allocate new cluster and killing the process at this time will cause leaked cluster in the image. 3. Create a snapshot for the VM. 4. Do a cold merge. Actual results: Cold merge is completed successfully even when the image has leaked clusters Expected results: Moving to VERIFIED!
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:3428
Hello, Customer has hit his on vdsm-4.19.43-3 and rhevm-4.1.8.2-0.1, reopening RPMs: rhevm-4.1.8.2-0.1.el7.noarch Tue Jan 9 07:52:58 2018 vdsm-4.19.43-3.el7ev.x86_64 Wed Jan 3 00:17:54 2018 Processes: ovirt 1016 15.1 22.3 4930564 1719236 ? Sl Jan17 1858:19 ovirt-engine -server -XX:+TieredCompilation..... vdsm 4652 7.4 0.0 4885248 177840 ? S<sl Jan10 3663:09 /usr/bin/python2 /usr/share/vdsm/vdsm Cold snapshot merge of an image with leaked clusters still fails. audit_log: 2018-01-25 19:19:42.971+01 | 10802 | | rhev-2 | VDSM rhev-2 command HSMGetAllTasksStatusesVDS failed: cmd=['/usr/bin/qemu-img', 'check', '--output', 'json', '-f', 'qcow2', '/rhev/data-center/mnt/blockSD/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/images/yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy/zzzzzzzz-zzzz-zzzz-zzzz-zzzzzzzzzzzz'], ecode=3, stdout={ + | | | | "image-end-offset": 87419584512, + | | | | "total-clusters": 3276800, + | | | | "check-errors": 0, + | | | | "leaks": 1111, + | | | | "leaks-fixed": 0, + | | | | "allocated-clusters": 1332537, + | | | | "filename": "/rhev/data-center/mnt/blockSD/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/images/yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy/zzzzzzzz-zzzz-zzzz-zzzz-zzzzzzzzzzzz", + | | | | "format": "qcow2", + | | | | "fragmented-clusters": 233541 + | | | | } + | | | | , stderr=Leaked cluster 475133 refcount=1 reference=0 + | | | | Leaked cluster 475134 refcount=1 reference=0 + | | | | Leaked cluster 475135 refcount=1 reference=0 + [...each leaked cluster reported, output manually truncated...] | | | | Leaked cluster 1163606 refcount=1 reference=0 + | | | | , message=None 2018-01-25 19:19:47.67+01 | 357 | a1cde685-520a-449e-bd72-5a01a90a1eeb | | Failed to delete snapshot 'snapshot-name' for VM 'vm-name'. vdsm log: 2018-01-25 19:19:37,674+0100 ERROR (tasks/5) [storage.TaskManager.Task] (Task='a536102b-a17a-4281-a948-214f246322c2') Unexpected error (task:872) Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 879, in _run return fn(*args, **kargs) File "/usr/share/vdsm/storage/task.py", line 333, in run return self.cmd(*self.argslist, **self.argsdict) File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper return method(self, *args, **kwargs) File "/usr/share/vdsm/storage/sp.py", line 1892, in finalizeMerge merge.finalize(subchainInfo) File "/usr/share/vdsm/storage/merge.py", line 271, in finalize optimal_size = subchain.base_vol.optimal_size() File "/usr/share/vdsm/storage/blockVolume.py", line 440, in optimal_size check = qemuimg.check(self.getVolumePath(), qemuimg.FORMAT.QCOW2) File "/usr/lib/python2.7/site-packages/vdsm/qemuimg.py", line 157, in check out = _run_cmd(cmd) File "/usr/lib/python2.7/site-packages/vdsm/qemuimg.py", line 426, in _run_cmd raise QImgError(cmd, rc, out, err) QImgError: cmd=['/usr/bin/qemu-img', 'check', '--output', 'json', '-f', 'qcow2', '/rhev/data-center/mnt/blockSD/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/images/yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy/zzzzzzzz-zzzz-zzzz-zzzz-zzzzzzzzzzzz'], ecode=3, stdout={ "image-end-offset": 87419584512, "total-clusters": 3276800, "check-errors": 0, "leaks": 1111, "leaks-fixed": 0, "allocated-clusters": 1332537, "filename": "/rhev/data-center/mnt/blockSD/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/images/yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy/zzzzzzzz-zzzz-zzzz-zzzz-zzzzzzzzzzzz", "format": "qcow2", "fragmented-clusters": 233541 } , stderr=Leaked cluster 475133 refcount=1 reference=0 Leaked cluster 475134 refcount=1 reference=0 Leaked cluster 475135 refcount=1 reference=0 [...] Leaked cluster 1163605 refcount=1 reference=0 Leaked cluster 1163606 refcount=1 reference=0 , message=None 2018-01-25 19:19:37,739+0100 INFO (tasks/5) [storage.ThreadPool.WorkerThread] FINISH task a536102b-a17a-4281-a948-214f246322c2 (threadPool:210) engine.log: 2018-01-25 19:19:45,058+01 ERROR [org.ovirt.engine.core.bll.snapshots.ColdMergeSnapshotSingleDiskCommand] (DefaultQuartzScheduler5) [a1cde685-520a-449e-bd72-5a01a90a1eeb] Command 'ColdMergeSnapshotSingleDisk' id '342ebd0a-2bb7-4d36-81cd-57e580c2b2c1' failed executing step 'FINALIZE_MERGE' 2018-01-25 19:19:45,058+01 INFO [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (DefaultQuartzScheduler5) [a1cde685-520a-449e-bd72-5a01a90a1eeb] Command 'ColdMergeSnapshotSingleDisk' id: '342ebd0a-2bb7-4d36-81cd-57e580c2b2c1' child commands '[99ccc8e0-8890-4971-936a-440cf3355de7, 5ec4be26-7bff-4cab-8da6-254eebb0dcc8, 02d0e273-8ea6-424c-aa31-bf04f0fb009c]' executions were completed, status 'FAILED' 2018-01-25 19:19:46,137+01 ERROR [org.ovirt.engine.core.bll.snapshots.ColdMergeSnapshotSingleDiskCommand] (DefaultQuartzScheduler3) [a1cde685-520a-449e-bd72-5a01a90a1eeb] Ending command 'org.ovirt.engine.core.bll.snapshots.ColdMergeSnapshotSingleDiskCommand' with failure. 2018-01-25 19:19:46,279+01 INFO [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (DefaultQuartzScheduler3) [a1cde685-520a-449e-bd72-5a01a90a1eeb] Command 'RemoveSnapshot' id: 'e81b54ec-9211-4740-b350-913bbb7af5c7' child commands '[10ba0892-6a91-4ad5-a2f4-840dbd88166d, 342ebd0a-2bb7-4d36-81cd-57e580c2b2c1, 8abe62ee-9cce-4fca-9205-cfacef4f481e, 57ece9d8-2598-4a57-973d-e4fafde95a74, dd1ab67d-98d6-4c87-8979-9ce8846e2ba9]' executions were completed, status 'FAILED' 2018-01-25 19:19:47,576+01 ERROR [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand] (DefaultQuartzScheduler2) [a1cde685-520a-449e-bd72-5a01a90a1eeb] Ending command 'org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand' with failure. 2018-01-25 19:19:47,693+01 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler2) [a1cde685-520a-449e-bd72-5a01a90a1eeb] EVENT_ID: USER_REMOVE_SNAPSHOT_FINISHED_FAILURE(357), Correlation ID: a1cde685-520a-449e-bd72-5a01a90a1eeb, Job ID: d571ed23-e270-4d1d-9ad0-54e29d65aa60, Call Stack: null, Custom ID: null, Custom Event ID: -1, Message: Failed to delete snapshot 'snapshot-name' for VM 'vm-name'. Looks like regression, adding relevant tag -- Kind Regards, Igor Netkachev Technical Support Engineer, RHCE, RHCVA Red Hat Global Support Services
BZ<2>Jira re-sync
sync2jira