Description of problem: As per the bug 1420405, we are using qemu-img check to find the actual size of the volume. However, if the base image is having leaked cluster, then the qemu-img check will be having a return value of 3. # qemu-img check /rhev/data-center/00000001-0001-0001-0001-000000000311/8257cf14-d88d-4e4e-998c-9f8976dac2a2/images/7455de38-1df1-4acd-b07c-9dc2138aafb3/be4a4d85-d7e6-4725-b7f5-90c9d935c336 ; echo $? 200 leaked clusters were found on the image. This means waste of disk space, but no harm to data. 109461/180224 = 60.74% allocated, 15.29% fragmented, 0.00% compressed clusters Image end offset: 7188578304 3 Anything other than 0 exit code will fail with below error in the vdsm side during cold merge. The cold merge will make the image illegal and the user will not be able to start the VM because of illegal legality in the metadata. We need manual changes to recover the VM. Version-Release number of selected component (if applicable): How reproducible: somewhat reproducible Steps to Reproduce: 1. Create a thin provisioned disk in RHV-M and assign to a VM. 2. Kill the qemu-kvm process while we write the data from the VM. While writing the data from the VM, the qcow2 will allocate new cluster and killing the process at this time will cause leaked cluster in the image. 3. Create a snapshot for the VM. 4. Do a cold merge. Actual results: Cold merge will fail if there is leaked cluster in the base image Expected results: Cold merge should work if the base image is having leaked cluster. Additional info: audit_log: 2018-01-25 19:19:42.971+01 | 10802 | | rhev-2 | VDSM rhev-2 command HSMGetAllTasksStatusesVDS failed: cmd=['/usr/bin/qemu-img', 'check', '--output', 'json', '-f', 'qcow2', '/rhev/data-center/mnt/blockSD/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/images/yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy/zzzzzzzz-zzzz-zzzz-zzzz-zzzzzzzzzzzz'], ecode=3, stdout={ + | | | | "image-end-offset": 87419584512, + | | | | "total-clusters": 3276800, + | | | | "check-errors": 0, + | | | | "leaks": 1111, + | | | | "leaks-fixed": 0, + | | | | "allocated-clusters": 1332537, + | | | | "filename": "/rhev/data-center/mnt/blockSD/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/images/yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy/zzzzzzzz-zzzz-zzzz-zzzz-zzzzzzzzzzzz", + | | | | "format": "qcow2", + | | | | "fragmented-clusters": 233541 + | | | | } + | | | | , stderr=Leaked cluster 475133 refcount=1 reference=0 + | | | | Leaked cluster 475134 refcount=1 reference=0 + | | | | Leaked cluster 475135 refcount=1 reference=0 + [...each leaked cluster reported, output manually truncated...] | | | | Leaked cluster 1163606 refcount=1 reference=0 + | | | | , message=None 2018-01-25 19:19:47.67+01 | 357 | a1cde685-520a-449e-bd72-5a01a90a1eeb | | Failed to delete snapshot 'snapshot-name' for VM 'vm-name'. vdsm log: 2018-01-25 19:19:37,674+0100 ERROR (tasks/5) [storage.TaskManager.Task] (Task='a536102b-a17a-4281-a948-214f246322c2') Unexpected error (task:872) Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 879, in _run return fn(*args, **kargs) File "/usr/share/vdsm/storage/task.py", line 333, in run return self.cmd(*self.argslist, **self.argsdict) File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper return method(self, *args, **kwargs) File "/usr/share/vdsm/storage/sp.py", line 1892, in finalizeMerge merge.finalize(subchainInfo) File "/usr/share/vdsm/storage/merge.py", line 271, in finalize optimal_size = subchain.base_vol.optimal_size() File "/usr/share/vdsm/storage/blockVolume.py", line 440, in optimal_size check = qemuimg.check(self.getVolumePath(), qemuimg.FORMAT.QCOW2) File "/usr/lib/python2.7/site-packages/vdsm/qemuimg.py", line 157, in check out = _run_cmd(cmd) File "/usr/lib/python2.7/site-packages/vdsm/qemuimg.py", line 426, in _run_cmd raise QImgError(cmd, rc, out, err) QImgError: cmd=['/usr/bin/qemu-img', 'check', '--output', 'json', '-f', 'qcow2', '/rhev/data-center/mnt/blockSD/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/images/yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy/zzzzzzzz-zzzz-zzzz-zzzz-zzzzzzzzzzzz'], ecode=3, stdout={ "image-end-offset": 87419584512, "total-clusters": 3276800, "check-errors": 0, "leaks": 1111, "leaks-fixed": 0, "allocated-clusters": 1332537, "filename": "/rhev/data-center/mnt/blockSD/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/images/yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy/zzzzzzzz-zzzz-zzzz-zzzz-zzzzzzzzzzzz", "format": "qcow2", "fragmented-clusters": 233541 } , stderr=Leaked cluster 475133 refcount=1 reference=0 Leaked cluster 475134 refcount=1 reference=0 Leaked cluster 475135 refcount=1 reference=0 [...] Leaked cluster 1163605 refcount=1 reference=0 Leaked cluster 1163606 refcount=1 reference=0 , message=None 2018-01-25 19:19:37,739+0100 INFO (tasks/5) [storage.ThreadPool.WorkerThread] FINISH task a536102b-a17a-4281-a948-214f246322c2 (threadPool:210) engine.log: 2018-01-25 19:19:45,058+01 ERROR [org.ovirt.engine.core.bll.snapshots.ColdMergeSnapshotSingleDiskCommand] (DefaultQuartzScheduler5) [a1cde685-520a-449e-bd72-5a01a90a1eeb] Command 'ColdMergeSnapshotSingleDisk' id '342ebd0a-2bb7-4d36-81cd-57e580c2b2c1' failed executing step 'FINALIZE_MERGE' 2018-01-25 19:19:45,058+01 INFO [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (DefaultQuartzScheduler5) [a1cde685-520a-449e-bd72-5a01a90a1eeb] Command 'ColdMergeSnapshotSingleDisk' id: '342ebd0a-2bb7-4d36-81cd-57e580c2b2c1' child commands '[99ccc8e0-8890-4971-936a-440cf3355de7, 5ec4be26-7bff-4cab-8da6-254eebb0dcc8, 02d0e273-8ea6-424c-aa31-bf04f0fb009c]' executions were completed, status 'FAILED' 2018-01-25 19:19:46,137+01 ERROR [org.ovirt.engine.core.bll.snapshots.ColdMergeSnapshotSingleDiskCommand] (DefaultQuartzScheduler3) [a1cde685-520a-449e-bd72-5a01a90a1eeb] Ending command 'org.ovirt.engine.core.bll.snapshots.ColdMergeSnapshotSingleDiskCommand' with failure. 2018-01-25 19:19:46,279+01 INFO [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (DefaultQuartzScheduler3) [a1cde685-520a-449e-bd72-5a01a90a1eeb] Command 'RemoveSnapshot' id: 'e81b54ec-9211-4740-b350-913bbb7af5c7' child commands '[10ba0892-6a91-4ad5-a2f4-840dbd88166d, 342ebd0a-2bb7-4d36-81cd-57e580c2b2c1, 8abe62ee-9cce-4fca-9205-cfacef4f481e, 57ece9d8-2598-4a57-973d-e4fafde95a74, dd1ab67d-98d6-4c87-8979-9ce8846e2ba9]' executions were completed, status 'FAILED' 2018-01-25 19:19:47,576+01 ERROR [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand] (DefaultQuartzScheduler2) [a1cde685-520a-449e-bd72-5a01a90a1eeb] Ending command 'org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand' with failure. 2018-01-25 19:19:47,693+01 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler2) [a1cde685-520a-449e-bd72-5a01a90a1eeb] EVENT_ID: USER_REMOVE_SNAPSHOT_FINISHED_FAILURE(357), Correlation ID: a1cde685-520a-449e-bd72-5a01a90a1eeb, Job ID: d571ed23-e270-4d1d-9ad0-54e29d65aa60, Call Stack: null, Custom ID: null, Custom Event ID: -1, Message: Failed to delete snapshot 'snapshot-name' for VM 'vm-name'. Looks like regression, adding relevant tag -- Kind Regards, Igor Netkachev Technical Support Engineer, RHCE, RHCVA Red Hat Global Support Services
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [Found non-acked flags: '{'rhevm-4.1.z': '?'}', ] For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [Found non-acked flags: '{'rhevm-4.1.z': '?'}', ] For more info please contact: rhv-devops
INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [No relevant external trackers attached] For more info please contact: rhv-devops
So, what's the issue here? :)
(In reply to Robert Scheck from comment #8) > So, what's the issue here? :) Please specify who the needinfo is to.
I'm sorry, wasn't sure who is responsible for what the bot claimed…
This bug was created in order to clone it to 4.1.z, as mandated by the process. The "real" bug is BZ 1548436, and is already verified. Root cause of this bug was a bad backport to 4.1 branch. All the bugs in this area are now verified: BZ 1502488 - 4.2 bug BZ 1548436 - 4.1 bug
BZ<2>Jira Resync
sync2jira