Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1502488 - Cold merge will fail if the base qcow2 image reports leaked cluster
Cold merge will fail if the base qcow2 image reports leaked cluster
Status: CLOSED ERRATA
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm (Show other bugs)
4.1.6
All Linux
unspecified Severity high
: ovirt-4.2.0
: ---
Assigned To: Ala Hino
Raz Tamir
: ZStream
Depends On:
Blocks: 1506503
  Show dependency treegraph
 
Reported: 2017-10-16 02:24 EDT by nijin ashok
Modified: 2018-05-15 13:53 EDT (History)
12 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Leaked clusters on an image are now correctly identified and handled, allowing cold merges to succeed when they are present.
Story Points: ---
Clone Of:
: 1506503 (view as bug list)
Environment:
Last Closed: 2018-05-15 13:52:46 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3218271 None None None 2017-10-18 00:25 EDT
oVirt gerrit 83170 master MERGED qemuimg: Handle leaked clusters when running qemuimg check 2017-10-30 17:34 EDT
Red Hat Product Errata RHEA-2018:1489 None None None 2018-05-15 13:53 EDT

  None (edit)
Description nijin ashok 2017-10-16 02:24:05 EDT
Description of problem:

As per the bug 1420405, we are using qemu-img check to find the actual size of the volume. However, if the base image is having leaked cluster, then the qemu-img check will be having a return value of 3.

# qemu-img check /rhev/data-center/00000001-0001-0001-0001-000000000311/8257cf14-d88d-4e4e-998c-9f8976dac2a2/images/7455de38-1df1-4acd-b07c-9dc2138aafb3/be4a4d85-d7e6-4725-b7f5-90c9d935c336  ; echo $?
200 leaked clusters were found on the image.
This means waste of disk space, but no harm to data.
109461/180224 = 60.74% allocated, 15.29% fragmented, 0.00% compressed clusters
Image end offset: 7188578304
3

Anything other than 0 exit code will fail with below error in the vdsm side during cold merge.

===
2017-10-16 10:09:32,950+0530 DEBUG (tasks/0) [root] /usr/bin/taskset --cpu-list 0-3 /usr/bin/qemu-img check --output json -f qcow2 /rhev/data-center/mnt/blockSD/8257cf14-d88d-4e4e-998c-9f8976dac2a2/images/7455de38-1df1-4acd-b07c-9dc2138aafb3/be4a4d85-d7e6-4725-b7f5-90c9d935c336 (cwd None) (commands:69)
2017-10-16 10:09:33,576+0530 ERROR (tasks/0) [storage.TaskManager.Task] (Task='59404af6-b400-4e08-9691-9a64cdf00374') Unexpected error (task:872)
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 879, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/storage/task.py", line 333, in run
    return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 1892, in finalizeMerge
    merge.finalize(subchainInfo)
  File "/usr/share/vdsm/storage/merge.py", line 271, in finalize
    optimal_size = subchain.base_vol.optimal_size()
  File "/usr/share/vdsm/storage/blockVolume.py", line 440, in optimal_size
    check = qemuimg.check(self.getVolumePath(), qemuimg.FORMAT.QCOW2)
  File "/usr/lib/python2.7/site-packages/vdsm/qemuimg.py", line 156, in check
    out = _run_cmd(cmd)
  File "/usr/lib/python2.7/site-packages/vdsm/qemuimg.py", line 416, in _run_cmd
    raise QImgError(cmd, rc, out, err)
QImgError: cmd=['/usr/bin/qemu-img', 'check', '--output', 'json', '-f', 'qcow2', '/rhev/data-center/mnt/blockSD/8257cf14-d88d-4e4e-998c-9f8976dac2a2/images/7455de38-1df1-4acd-b07c-9dc2138aafb3/be4a4d85-d7e6-4725-b7f5-90c9d935c336'], ecode=3, stdout={
QImgError: cmd=['/usr/bin/qemu-img', 'check', '--output', 'json', '-f', 'qcow2', '/rhev/data-center/mnt/blockSD/8257cf14-d88d-4e4e-998c-9f8976dac2a2/images/7455de38-1df1-4acd-b07c-9dc2138aafb3/be4a4d85-d7e6-4725-b7f5-90c9d935c336'], ecode=3, stdout={
    "image-end-offset": 7188578304,
    "total-clusters": 180224,
    "check-errors": 0,
    "leaks": 200,
    "leaks-fixed": 0,
    "allocated-clusters": 109461,
    "filename": "/rhev/data-center/mnt/blockSD/8257cf14-d88d-4e4e-998c-9f8976dac2a2/images/7455de38-1df1-4acd-b07c-9dc2138aafb3/be4a4d85-d7e6-4725-b7f5-90c9d935c336",
    "format": "qcow2",
    "fragmented-clusters": 16741
}
, stderr=Leaked cluster 109202 refcount=1 reference=0
===

And in engine log

===
2017-10-16 00:39:29,600-04 ERROR [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler7) [12cf8f7] BaseAsyncTask::logEndTaskFailure: Task '59404af6-b400-4e08-9691-9a64cdf00374' (Parent Command 'FinalizeMerge', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended with failure:
-- Result: 'cleanSuccess'
-- Message: 'VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = cmd=['/usr/bin/qemu-img', 'check', '--output', 'json', '-f', 'qcow2', '/rhev/data-center/mnt/blockSD/8257cf14-d88d-4e4e-998c-9f8976dac2a2/images/7455de38-1df1-4acd-b07c-9dc2138aafb3/be4a4d85-d7e6-4725-b7f5-90c9d935c336'], ecode=3, stdout={
2017-10-16 00:39:32,468-04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler2) [8b540950-5f08-432d-bcf9-5b2999284532] EVENT_ID: USER_REMOVE_SNAPSHOT_FINISHED_FAILURE(357), Correlation ID: 8b540950-5f08-432d-bcf9-5b2999284532, Job ID: d31f4487-2545-4ba7-a3e4-2cd03f0ea305, Call Stack: null, Custom ID: null, Custom Event ID: -1, Message: Failed to delete snapshot 'test_snap' for VM 'test_vm'.
===


The cold merge will make the image illegal and the user will not be able to start the VM because of illegal legality in the metadata. We need manual changes to recover the VM.

IIUC, the leaked cluster doesn't cause any harm to data and it's just wastage of  space.


Version-Release number of selected component (if applicable):

vdsm-4.19.31-1.el7ev.x86_64
ovirt-engine-4.1.6.2-0.1.el7.noarch


How reproducible:

100%

Steps to Reproduce:

1. Create a thin provisioned disk in RHV-M and assign to a VM.

2. Kill the qemu-kvm process while we write the data from the VM. While writing the data from the VM, the qcow2 will allocate new cluster and killing the process at this time will cause leaked cluster in the image.

3. Create a snapshot for the VM.

4. Do a cold merge.


Actual results:

Cold merge will fail if there is leaked cluster in the base image

Expected results:

Cold merge should work if the base image is having leaked cluster.

Additional info:
Comment 3 Allon Mureinik 2017-10-31 07:26:13 EDT
Ala, can we add a sentence or two about leaked clusters (what they are, how they happen, etc)?
Comment 4 Ala Hino 2017-10-31 07:48:00 EDT
(In reply to Allon Mureinik from comment #3)
> Ala, can we add a sentence or two about leaked clusters (what they are, how
> they happen, etc)?

I will it here rather than in the doctext:

Leaked clusters could happen when qemu-kvm process is killed while the Vm is writing data.
Leaked clusters mean waste of disk space, but no harm to data.
Fixing leaks is possible by running qemu-img check -r leaks
Comment 6 Lilach Zitnitski 2017-11-15 04:09:53 EST
--------------------------------------
Tested with the following code:
----------------------------------------
ovirt-engine-4.2.0-0.0.master.20171112130303.git8bc889c.el7.centos.noarch
vdsm-4.20.6-62.gitd3023e4.el7.centos.x86_64

Tested with the following scenario:

Steps to Reproduce:
1. Create a thin provisioned disk in RHV-M and assign to a VM.
2. Kill the qemu-kvm process while we write the data from the VM. While writing the data from the VM, the qcow2 will allocate new cluster and killing the process at this time will cause leaked cluster in the image.
3. Create a snapshot for the VM.
4. Do a cold merge.

Actual results:
Cold merge is completed successfully even when the image has leaked clusters

Expected results:

Moving to VERIFIED!
Comment 12 errata-xmlrpc 2018-05-15 13:52:46 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:1489

Note You need to log in before you can comment on or make changes to this bug.