+++ This bug is a downstream clone. The original bug is: +++ +++ bug 1545708 +++ ====================================================================== Description of problem: As per the bug 1420405, we are using qemu-img check to find the actual size of the volume. However, if the base image is having leaked cluster, then the qemu-img check will be having a return value of 3. # qemu-img check /rhev/data-center/00000001-0001-0001-0001-000000000311/8257cf14-d88d-4e4e-998c-9f8976dac2a2/images/7455de38-1df1-4acd-b07c-9dc2138aafb3/be4a4d85-d7e6-4725-b7f5-90c9d935c336 ; echo $? 200 leaked clusters were found on the image. This means waste of disk space, but no harm to data. 109461/180224 = 60.74% allocated, 15.29% fragmented, 0.00% compressed clusters Image end offset: 7188578304 3 Anything other than 0 exit code will fail with below error in the vdsm side during cold merge. The cold merge will make the image illegal and the user will not be able to start the VM because of illegal legality in the metadata. We need manual changes to recover the VM. Version-Release number of selected component (if applicable): How reproducible: somewhat reproducible Steps to Reproduce: 1. Create a thin provisioned disk in RHV-M and assign to a VM. 2. Kill the qemu-kvm process while we write the data from the VM. While writing the data from the VM, the qcow2 will allocate new cluster and killing the process at this time will cause leaked cluster in the image. 3. Create a snapshot for the VM. 4. Do a cold merge. Actual results: Cold merge will fail if there is leaked cluster in the base image Expected results: Cold merge should work if the base image is having leaked cluster. Additional info: audit_log: 2018-01-25 19:19:42.971+01 | 10802 | | rhev-2 | VDSM rhev-2 command HSMGetAllTasksStatusesVDS failed: cmd=['/usr/bin/qemu-img', 'check', '--output', 'json', '-f', 'qcow2', '/rhev/data-center/mnt/blockSD/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/images/yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy/zzzzzzzz-zzzz-zzzz-zzzz-zzzzzzzzzzzz'], ecode=3, stdout={ + | | | | "image-end-offset": 87419584512, + | | | | "total-clusters": 3276800, + | | | | "check-errors": 0, + | | | | "leaks": 1111, + | | | | "leaks-fixed": 0, + | | | | "allocated-clusters": 1332537, + | | | | "filename": "/rhev/data-center/mnt/blockSD/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/images/yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy/zzzzzzzz-zzzz-zzzz-zzzz-zzzzzzzzzzzz", + | | | | "format": "qcow2", + | | | | "fragmented-clusters": 233541 + | | | | } + | | | | , stderr=Leaked cluster 475133 refcount=1 reference=0 + | | | | Leaked cluster 475134 refcount=1 reference=0 + | | | | Leaked cluster 475135 refcount=1 reference=0 + [...each leaked cluster reported, output manually truncated...] | | | | Leaked cluster 1163606 refcount=1 reference=0 + | | | | , message=None 2018-01-25 19:19:47.67+01 | 357 | a1cde685-520a-449e-bd72-5a01a90a1eeb | | Failed to delete snapshot 'snapshot-name' for VM 'vm-name'. vdsm log: 2018-01-25 19:19:37,674+0100 ERROR (tasks/5) [storage.TaskManager.Task] (Task='a536102b-a17a-4281-a948-214f246322c2') Unexpected error (task:872) Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 879, in _run return fn(*args, **kargs) File "/usr/share/vdsm/storage/task.py", line 333, in run return self.cmd(*self.argslist, **self.argsdict) File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper return method(self, *args, **kwargs) File "/usr/share/vdsm/storage/sp.py", line 1892, in finalizeMerge merge.finalize(subchainInfo) File "/usr/share/vdsm/storage/merge.py", line 271, in finalize optimal_size = subchain.base_vol.optimal_size() File "/usr/share/vdsm/storage/blockVolume.py", line 440, in optimal_size check = qemuimg.check(self.getVolumePath(), qemuimg.FORMAT.QCOW2) File "/usr/lib/python2.7/site-packages/vdsm/qemuimg.py", line 157, in check out = _run_cmd(cmd) File "/usr/lib/python2.7/site-packages/vdsm/qemuimg.py", line 426, in _run_cmd raise QImgError(cmd, rc, out, err) QImgError: cmd=['/usr/bin/qemu-img', 'check', '--output', 'json', '-f', 'qcow2', '/rhev/data-center/mnt/blockSD/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/images/yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy/zzzzzzzz-zzzz-zzzz-zzzz-zzzzzzzzzzzz'], ecode=3, stdout={ "image-end-offset": 87419584512, "total-clusters": 3276800, "check-errors": 0, "leaks": 1111, "leaks-fixed": 0, "allocated-clusters": 1332537, "filename": "/rhev/data-center/mnt/blockSD/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/images/yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy/zzzzzzzz-zzzz-zzzz-zzzz-zzzzzzzzzzzz", "format": "qcow2", "fragmented-clusters": 233541 } , stderr=Leaked cluster 475133 refcount=1 reference=0 Leaked cluster 475134 refcount=1 reference=0 Leaked cluster 475135 refcount=1 reference=0 [...] Leaked cluster 1163605 refcount=1 reference=0 Leaked cluster 1163606 refcount=1 reference=0 , message=None 2018-01-25 19:19:37,739+0100 INFO (tasks/5) [storage.ThreadPool.WorkerThread] FINISH task a536102b-a17a-4281-a948-214f246322c2 (threadPool:210) engine.log: 2018-01-25 19:19:45,058+01 ERROR [org.ovirt.engine.core.bll.snapshots.ColdMergeSnapshotSingleDiskCommand] (DefaultQuartzScheduler5) [a1cde685-520a-449e-bd72-5a01a90a1eeb] Command 'ColdMergeSnapshotSingleDisk' id '342ebd0a-2bb7-4d36-81cd-57e580c2b2c1' failed executing step 'FINALIZE_MERGE' 2018-01-25 19:19:45,058+01 INFO [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (DefaultQuartzScheduler5) [a1cde685-520a-449e-bd72-5a01a90a1eeb] Command 'ColdMergeSnapshotSingleDisk' id: '342ebd0a-2bb7-4d36-81cd-57e580c2b2c1' child commands '[99ccc8e0-8890-4971-936a-440cf3355de7, 5ec4be26-7bff-4cab-8da6-254eebb0dcc8, 02d0e273-8ea6-424c-aa31-bf04f0fb009c]' executions were completed, status 'FAILED' 2018-01-25 19:19:46,137+01 ERROR [org.ovirt.engine.core.bll.snapshots.ColdMergeSnapshotSingleDiskCommand] (DefaultQuartzScheduler3) [a1cde685-520a-449e-bd72-5a01a90a1eeb] Ending command 'org.ovirt.engine.core.bll.snapshots.ColdMergeSnapshotSingleDiskCommand' with failure. 2018-01-25 19:19:46,279+01 INFO [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (DefaultQuartzScheduler3) [a1cde685-520a-449e-bd72-5a01a90a1eeb] Command 'RemoveSnapshot' id: 'e81b54ec-9211-4740-b350-913bbb7af5c7' child commands '[10ba0892-6a91-4ad5-a2f4-840dbd88166d, 342ebd0a-2bb7-4d36-81cd-57e580c2b2c1, 8abe62ee-9cce-4fca-9205-cfacef4f481e, 57ece9d8-2598-4a57-973d-e4fafde95a74, dd1ab67d-98d6-4c87-8979-9ce8846e2ba9]' executions were completed, status 'FAILED' 2018-01-25 19:19:47,576+01 ERROR [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand] (DefaultQuartzScheduler2) [a1cde685-520a-449e-bd72-5a01a90a1eeb] Ending command 'org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand' with failure. 2018-01-25 19:19:47,693+01 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler2) [a1cde685-520a-449e-bd72-5a01a90a1eeb] EVENT_ID: USER_REMOVE_SNAPSHOT_FINISHED_FAILURE(357), Correlation ID: a1cde685-520a-449e-bd72-5a01a90a1eeb, Job ID: d571ed23-e270-4d1d-9ad0-54e29d65aa60, Call Stack: null, Custom ID: null, Custom Event ID: -1, Message: Failed to delete snapshot 'snapshot-name' for VM 'vm-name'. Looks like regression, adding relevant tag -- Kind Regards, Igor Netkachev Technical Support Engineer, RHCE, RHCVA Red Hat Global Support Services (Originally by Igor Netkachev)
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [Found non-acked flags: '{'rhevm-4.1.z': '?'}', ] For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [Found non-acked flags: '{'rhevm-4.1.z': '?'}', ] For more info please contact: rhv-devops (Originally by rhv-bugzilla-bot)
Verified with the following code: -------------------------------------- ovirt-engine-4.1.10.1-0.1.el7.noarch vdsm-4.19.48-1.el7ev.x86_64 Verified with the following scenario: ------------------------------------- Steps to Reproduce: 1. create a vm with think proision disk 2. write data to the disk, any data 3. while writing, kill qemu-kvm process on the host - the VM goes down 4. create a snapshot 5. delete/merge the parent of the created snapshot Cold merge succeed without errors Moving to VERIFIED!
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:0563
BZ<2>Jira Resync
sync2jira