1593111 – qemu-img is very slow when changing the backing file from qcow2 image to luks image

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1593111 - qemu-img is very slow when changing the backing file from qcow2 image to luks image

Summary: qemu-img is very slow when changing the backing file from qcow2 image to luks...

Keywords:
Status:	CLOSED CANTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	qemu-kvm-rhev
Sub Component:
Version:	7.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Eric Blake
QA Contact:	Tingting Mao
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-06-20 05:33 UTC by Tingting Mao
Modified:	2018-06-25 03:34 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-06-25 03:34:41 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Tingting Mao 2018-06-20 05:33:26 UTC

Description of problem:
It is too slow to rebase a qcow2 snapshot to a luks file.

Version-Release number of selected component (if applicable):
qemu-kvm-rhev-2.12.0-3.el7
kernel-3.10.0-906.el7

How reproducible:
100%

Steps to Reproduce:
1. For luks base image
1.1 Create base.luks-->sn1.qcow2-->sn2.qcow2.
# qemu-img create -f luks --object secret,id=sec0,data=base -o key-secret=sec0 base.luks 10G
# qemu-img create -f qcow2 -F luks --object secret,id=sec0,data=base -b 'json:{"driver": "luks", "file": {"driver": "file", "filename": "base.luks"}, "key-secret": "sec0"}' sn1.qcow2
# qemu-img create -f qcow2 -F qcow2 --object secret,id=sec0,data=base -b sn1.qcow2 sn2.qcow2
1.2 Rebase sn2.qcow2 to base.luks
# time qemu-img rebase -f qcow2 -F luks --object secret,id=sec0,data=base -b 'json:{"driver": "luks", "file": {"driver": "file", "filename": "base.luks"}, "key-secret": "sec0"}' sn2.qcow2 -p
    (100.00/100%)

real	5m45.608s ---------------> too long time
user	5m29.808s
sys	0m15.147s
1.3 Check info of sn2.qcow2
# qemu-img info sn2.qcow2 
image: sn2.qcow2
file format: qcow2
virtual size: 10G (10737418240 bytes)
disk size: 196K
cluster_size: 65536
backing file: json:{"driver": "luks", "file": {"driver": "file", "filename": "base.luks"}, "key-secret": "sec0"}
backing file format: luks
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

2. For qcow2 base image
2.1 Create base.qcow2-->sn1.qcow2-->sn2.qcow2.
# qemu-img create -f qcow2 base.qcow2 10G
# qemu-img create -f qcow2 -b base.qcow2 sn1.qcow2
# qemu-img create -f qcow2 -b sn1.qcow2 sn2.qcow2
2.2 Rebase sn2.qcow2 to base.qcow2.
# time qemu-img rebase -f qcow2 -F qcow2 -b base.qcow2 sn2.qcow2 

real	0m2.155s
user	0m2.118s
sys	0m0.024s
2.3 Check the info of sn2.qcow2
# qemu-img info sn2.qcow2 
image: sn2.qcow2
file format: qcow2
virtual size: 10G (10737418240 bytes)
disk size: 196K
cluster_size: 65536
backing file: base.qcow2
backing file format: qcow2
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

Actual results:


Expected results:


Additional info:

Comment 2 Tingting Mao 2018-06-20 06:22:14 UTC

There is also the same issue in rhel7.5, so it is not a regression.

Tested packages:
qemu-kvm-rhev-2.10.0-21.el7_5.1
kernel-3.10.0-862.el7

Comment 4 Daniel Berrangé 2018-06-22 08:35:51 UTC

(In reply to timao from comment #0)

> 1. For luks base image
> 1.1 Create base.luks-->sn1.qcow2-->sn2.qcow2.
> # qemu-img create -f luks --object secret,id=sec0,data=base -o
> key-secret=sec0 base.luks 10G
> # qemu-img create -f qcow2 -F luks --object secret,id=sec0,data=base -b
> 'json:{"driver": "luks", "file": {"driver": "file", "filename":
> "base.luks"}, "key-secret": "sec0"}' sn1.qcow2
> # qemu-img create -f qcow2 -F qcow2 --object secret,id=sec0,data=base -b
> sn1.qcow2 sn2.qcow2

Here you have created a 10G backing file which is completely empty, however, the LUKS format does not care about this. 

As far as LUKS is concerned you have 10G of data present here. It juts happens it will all be garbage because you've
not written anything to it.

The two qcow2 overlays are both empty, so all I/O requests will be satisfied from the luks backing file

> 1.2 Rebase sn2.qcow2 to base.luks
> # time qemu-img rebase -f qcow2 -F luks --object secret,id=sec0,data=base -b
> 'json:{"driver": "luks", "file": {"driver": "file", "filename":
> "base.luks"}, "key-secret": "sec0"}' sn2.qcow2 -p
>     (100.00/100%)

Now you are rebasing the qcow2 overlay onto the luks backing file, to eliminate the middle layer.

Since you have 10 GB of data in the LUKS backing file, qemu-img is going to do 10 GB worth of I/O

In fact it will do 20 GB of I/O, because rebase has to read both the original backing chain and the new backing chain. There is no optimization in the rebase procedure that notices both the original & new chains ultimately point to the same luks file.

> real	5m45.608s ---------------> too long time
> user	5m29.808s
> sys	0m15.147s

So taking 6 minutes for 10 GB of I/O is completely normal


> 2. For qcow2 base image
> 2.1 Create base.qcow2-->sn1.qcow2-->sn2.qcow2.
> # qemu-img create -f qcow2 base.qcow2 10G
> # qemu-img create -f qcow2 -b base.qcow2 sn1.qcow2
> # qemu-img create -f qcow2 -b sn1.qcow2 sn2.qcow2

Ok, so with this series of steps you have a 10G backing file, which is completely empty.

You then created overlays which are also completely empty

> 2.2 Rebase sn2.qcow2 to base.qcow2.
> # time qemu-img rebase -f qcow2 -F qcow2 -b base.qcow2 sn2.qcow2 

When doing the rebase, because the backing files are both empty, qemu-img doesn't
have any data that needs comparing, so there is almost no I/O performed.

> real	0m2.155s
> user	0m2.118s
> sys	0m0.024s

So completing in 2 seconds is reasonably normal.

Comment 5 Tingting Mao 2018-06-22 09:21:47 UTC

(In reply to Daniel Berrange from comment #4)

> Here you have created a 10G backing file which is completely empty, however,
> the LUKS format does not care about this. 
> 
> As far as LUKS is concerned you have 10G of data present here. It juts
> happens it will all be garbage because you've
> not written anything to it.

You mentioned "LUKS is concerned you have 10G of data present here", is this because of the bug below?

https://bugzilla.redhat.com/show_bug.cgi?id=1535894

Thanks for your information.

Comment 6 Daniel Berrangé 2018-06-22 09:23:59 UTC

It is not a bug, it is normal behaviour of the LUKS format. It does not track which disk sectors have been written to, so there is no notion of a sparse LUKS image.  The driver beneath the LUKS driver may happen to be sparse, but LUKS does not expose that.

Comment 7 Tingting Mao 2018-06-22 10:59:27 UTC

Hi Daniel

I Re-tested this issue with qcow2 base file, which was written 10G data and the steps are like below. However, the time is much shorter than the one in #comment 0. I am still a little confused, so could you please tell me what the difference?

Many thanks.



Steps:
1.Create qcow2 base image, and write 10G data to it.The info is like below
# qemu-img info base.qcow2 
image: base.qcow2
file format: qcow2
virtual size: 10G (10737418240 bytes)
disk size: 10G 
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false
2.Create sn1 and sn2
# qemu-img create -f qcow2 -b base.qcow2 sn1
# qemu-img create -f qcow2 -b sn1 sn2
3. Rebase sn2 to base.qcow2
# time qemu-img rebase -f qcow2 -F qcow2 -b base.qcow2 sn2

real	1m43.612s 
user	0m2.480s
sys	0m14.722s

Comment 8 Daniel Berrangé 2018-06-22 11:30:40 UTC

(In reply to timao from comment #7)
> Hi Daniel
> 
> I Re-tested this issue with qcow2 base file, which was written 10G data and
> the steps are like below. However, the time is much shorter than the one in
> #comment 0. I am still a little confused, so could you please tell me what
> the difference?
> 
> Many thanks.
> 
> 
> 
> Steps:
> 1.Create qcow2 base image, and write 10G data to it.The info is like below
> # qemu-img info base.qcow2 
> image: base.qcow2
> file format: qcow2
> virtual size: 10G (10737418240 bytes)
> disk size: 10G 
> cluster_size: 65536
> Format specific information:
>     compat: 1.1
>     lazy refcounts: false
>     refcount bits: 16
>     corrupt: false
> 2.Create sn1 and sn2
> # qemu-img create -f qcow2 -b base.qcow2 sn1
> # qemu-img create -f qcow2 -b sn1 sn2
> 3. Rebase sn2 to base.qcow2
> # time qemu-img rebase -f qcow2 -F qcow2 -b base.qcow2 sn2
> 
> real	1m43.612s 
> user	0m2.480s
> sys	0m14.722s

Ok, so this is approx x4 faster than with LUKS.

It is possible this this is just illustrating the overhead of the encryption.

You could create a 10 GB qcow2 file and 10 GB luks file, and then use 'qemu-io writev' to compare time to write + read 10 GB of data. This might show similar time difference to the rebase operation

Comment 9 Tingting Mao 2018-06-25 03:34:41 UTC

Based on the comment 8, close this bug.

Note You need to log in before you can comment on or make changes to this bug.