Bug 1506503
Summary: | [downstream clone - 4.1.8] Cold merge will fail if the base qcow2 image reports leaked cluster | ||
---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | rhev-integ |
Component: | vdsm | Assignee: | Ala Hino <ahino> |
Status: | CLOSED ERRATA | QA Contact: | Lilach Zitnitski <lzitnits> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.1.6 | CC: | amureini, bazulay, bgraveno, gveitmic, inetkach, lsurette, lveyde, rabraham, srevivo, tnisan, ycui, ykaul, ylavi |
Target Milestone: | ovirt-4.1.8 | Keywords: | Regression, Reopened, ZStream |
Target Release: | --- | Flags: | lsvaty:
testing_plan_complete-
|
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | vdsm v4.19.38 | Doc Type: | Bug Fix |
Doc Text: |
This update ensures that during a cold merge the system will identify and handle leaked clusters reported by the QCOW2 image, to ensure that the cold merge succeeds.
|
Story Points: | --- |
Clone Of: | 1502488 | Environment: | |
Last Closed: | 2018-02-15 13:46:41 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1502488 | ||
Bug Blocks: |
Description
rhev-integ
2017-10-26 08:57:23 UTC
Patches on master already ready; will be backported as soon as they are merged -------------------------------------- Tested with the following code: ---------------------------------------- rhevm-4.1.8-0.1.el7.noarch vdsm-4.19.38-1.el7ev.x86_64 Tested with the following scenario: Steps to Reproduce: 1. Create a thin provisioned disk in RHV-M and assign to a VM. 2. Kill the qemu-kvm process while we write the data from the VM. While writing the data from the VM, the qcow2 will allocate new cluster and killing the process at this time will cause leaked cluster in the image. 3. Create a snapshot for the VM. 4. Do a cold merge. Actual results: Cold merge is completed successfully even when the image has leaked clusters Expected results: Moving to VERIFIED! Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:3428 Hello, Customer has hit his on vdsm-4.19.43-3 and rhevm-4.1.8.2-0.1, reopening RPMs: rhevm-4.1.8.2-0.1.el7.noarch Tue Jan 9 07:52:58 2018 vdsm-4.19.43-3.el7ev.x86_64 Wed Jan 3 00:17:54 2018 Processes: ovirt 1016 15.1 22.3 4930564 1719236 ? Sl Jan17 1858:19 ovirt-engine -server -XX:+TieredCompilation..... vdsm 4652 7.4 0.0 4885248 177840 ? S<sl Jan10 3663:09 /usr/bin/python2 /usr/share/vdsm/vdsm Cold snapshot merge of an image with leaked clusters still fails. audit_log: 2018-01-25 19:19:42.971+01 | 10802 | | rhev-2 | VDSM rhev-2 command HSMGetAllTasksStatusesVDS failed: cmd=['/usr/bin/qemu-img', 'check', '--output', 'json', '-f', 'qcow2', '/rhev/data-center/mnt/blockSD/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/images/yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy/zzzzzzzz-zzzz-zzzz-zzzz-zzzzzzzzzzzz'], ecode=3, stdout={ + | | | | "image-end-offset": 87419584512, + | | | | "total-clusters": 3276800, + | | | | "check-errors": 0, + | | | | "leaks": 1111, + | | | | "leaks-fixed": 0, + | | | | "allocated-clusters": 1332537, + | | | | "filename": "/rhev/data-center/mnt/blockSD/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/images/yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy/zzzzzzzz-zzzz-zzzz-zzzz-zzzzzzzzzzzz", + | | | | "format": "qcow2", + | | | | "fragmented-clusters": 233541 + | | | | } + | | | | , stderr=Leaked cluster 475133 refcount=1 reference=0 + | | | | Leaked cluster 475134 refcount=1 reference=0 + | | | | Leaked cluster 475135 refcount=1 reference=0 + [...each leaked cluster reported, output manually truncated...] | | | | Leaked cluster 1163606 refcount=1 reference=0 + | | | | , message=None 2018-01-25 19:19:47.67+01 | 357 | a1cde685-520a-449e-bd72-5a01a90a1eeb | | Failed to delete snapshot 'snapshot-name' for VM 'vm-name'. vdsm log: 2018-01-25 19:19:37,674+0100 ERROR (tasks/5) [storage.TaskManager.Task] (Task='a536102b-a17a-4281-a948-214f246322c2') Unexpected error (task:872) Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 879, in _run return fn(*args, **kargs) File "/usr/share/vdsm/storage/task.py", line 333, in run return self.cmd(*self.argslist, **self.argsdict) File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper return method(self, *args, **kwargs) File "/usr/share/vdsm/storage/sp.py", line 1892, in finalizeMerge merge.finalize(subchainInfo) File "/usr/share/vdsm/storage/merge.py", line 271, in finalize optimal_size = subchain.base_vol.optimal_size() File "/usr/share/vdsm/storage/blockVolume.py", line 440, in optimal_size check = qemuimg.check(self.getVolumePath(), qemuimg.FORMAT.QCOW2) File "/usr/lib/python2.7/site-packages/vdsm/qemuimg.py", line 157, in check out = _run_cmd(cmd) File "/usr/lib/python2.7/site-packages/vdsm/qemuimg.py", line 426, in _run_cmd raise QImgError(cmd, rc, out, err) QImgError: cmd=['/usr/bin/qemu-img', 'check', '--output', 'json', '-f', 'qcow2', '/rhev/data-center/mnt/blockSD/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/images/yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy/zzzzzzzz-zzzz-zzzz-zzzz-zzzzzzzzzzzz'], ecode=3, stdout={ "image-end-offset": 87419584512, "total-clusters": 3276800, "check-errors": 0, "leaks": 1111, "leaks-fixed": 0, "allocated-clusters": 1332537, "filename": "/rhev/data-center/mnt/blockSD/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/images/yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy/zzzzzzzz-zzzz-zzzz-zzzz-zzzzzzzzzzzz", "format": "qcow2", "fragmented-clusters": 233541 } , stderr=Leaked cluster 475133 refcount=1 reference=0 Leaked cluster 475134 refcount=1 reference=0 Leaked cluster 475135 refcount=1 reference=0 [...] Leaked cluster 1163605 refcount=1 reference=0 Leaked cluster 1163606 refcount=1 reference=0 , message=None 2018-01-25 19:19:37,739+0100 INFO (tasks/5) [storage.ThreadPool.WorkerThread] FINISH task a536102b-a17a-4281-a948-214f246322c2 (threadPool:210) engine.log: 2018-01-25 19:19:45,058+01 ERROR [org.ovirt.engine.core.bll.snapshots.ColdMergeSnapshotSingleDiskCommand] (DefaultQuartzScheduler5) [a1cde685-520a-449e-bd72-5a01a90a1eeb] Command 'ColdMergeSnapshotSingleDisk' id '342ebd0a-2bb7-4d36-81cd-57e580c2b2c1' failed executing step 'FINALIZE_MERGE' 2018-01-25 19:19:45,058+01 INFO [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (DefaultQuartzScheduler5) [a1cde685-520a-449e-bd72-5a01a90a1eeb] Command 'ColdMergeSnapshotSingleDisk' id: '342ebd0a-2bb7-4d36-81cd-57e580c2b2c1' child commands '[99ccc8e0-8890-4971-936a-440cf3355de7, 5ec4be26-7bff-4cab-8da6-254eebb0dcc8, 02d0e273-8ea6-424c-aa31-bf04f0fb009c]' executions were completed, status 'FAILED' 2018-01-25 19:19:46,137+01 ERROR [org.ovirt.engine.core.bll.snapshots.ColdMergeSnapshotSingleDiskCommand] (DefaultQuartzScheduler3) [a1cde685-520a-449e-bd72-5a01a90a1eeb] Ending command 'org.ovirt.engine.core.bll.snapshots.ColdMergeSnapshotSingleDiskCommand' with failure. 2018-01-25 19:19:46,279+01 INFO [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (DefaultQuartzScheduler3) [a1cde685-520a-449e-bd72-5a01a90a1eeb] Command 'RemoveSnapshot' id: 'e81b54ec-9211-4740-b350-913bbb7af5c7' child commands '[10ba0892-6a91-4ad5-a2f4-840dbd88166d, 342ebd0a-2bb7-4d36-81cd-57e580c2b2c1, 8abe62ee-9cce-4fca-9205-cfacef4f481e, 57ece9d8-2598-4a57-973d-e4fafde95a74, dd1ab67d-98d6-4c87-8979-9ce8846e2ba9]' executions were completed, status 'FAILED' 2018-01-25 19:19:47,576+01 ERROR [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand] (DefaultQuartzScheduler2) [a1cde685-520a-449e-bd72-5a01a90a1eeb] Ending command 'org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand' with failure. 2018-01-25 19:19:47,693+01 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler2) [a1cde685-520a-449e-bd72-5a01a90a1eeb] EVENT_ID: USER_REMOVE_SNAPSHOT_FINISHED_FAILURE(357), Correlation ID: a1cde685-520a-449e-bd72-5a01a90a1eeb, Job ID: d571ed23-e270-4d1d-9ad0-54e29d65aa60, Call Stack: null, Custom ID: null, Custom Event ID: -1, Message: Failed to delete snapshot 'snapshot-name' for VM 'vm-name'. Looks like regression, adding relevant tag -- Kind Regards, Igor Netkachev Technical Support Engineer, RHCE, RHCVA Red Hat Global Support Services BZ<2>Jira re-sync sync2jira sync2jira |