Bug 1744525

Summary:	Writing data to the qcow2 image over RBD is too slow
Product:	Red Hat Enterprise Linux Advanced Virtualization	Reporter:	Tingting Mao <timao>
Component:	qemu-kvm	Assignee:	Stefano Garzarella <sgarzare>
qemu-kvm sub component:	Ceph	QA Contact:	zixchen
Status:	CLOSED WONTFIX	Docs Contact:
Severity:	medium
Priority:	medium	CC:	areis, berrange, coli, jherrman, jinzhao, juzhang, kwolf, ngu, qzhang, sgarzare, virt-maint, zhenyzha, zixchen
Version:	8.1
Target Milestone:	rc
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-03-04 09:20:49 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Tingting Mao 2019-08-22 10:42:16 UTC

Description of problem:
When tried to verify BZ 1171007, I noticed that the write performance for the qcow2 images is too low. The write speed of qcow2+rbd is almost 15 times slower than that of raw+rbd(Tested by convert, refer to below info, please). While for read, the speed is nearly the same!


Version-Release number of selected component (if applicable):
qemu-kvm-4.1.0-4.module+el8.1.0+4020+16089f93
kernel-4.18.0-134.el8


How reproducible:
100%


Steps to Reproduce:
Scenario 1(Test write performance with 'qemu-img convert')

1. Create a file with 512M locally.
# dd if=/dev/urandom of=test.img bs=1M count=512
512+0 records in
512+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 3.59897 s, 149 MB/s

2. Convert it to raw image over rbd
# time qemu-img convert -t none -T none -f raw -O raw test.img rbd:kvmtest-pool/tgt.img -p
    (100.00/100%)

real	0m41.138s
user	0m1.464s
sys	0m0.376s

3. Convert it to qcow2 image over rbd
# time qemu-img convert -t none -T none -f raw -O qcow2 test.img rbd:kvmtest-pool/tgt.qcow2 -p
    (100.00/100%)

real	10m15.055s ------------------------------------- Too slow!!
user	0m6.364s
sys	0m1.456s



Scenario 2(Test read performance with 'qemu-img convert')
1. Read from raw over rbd
# qemu-img info rbd:kvmtest-pool/tgt.img
image: json:{"driver": "raw", "file": {"pool": "kvmtest-pool", "image": "tgt.img", "driver": "rbd"}}
file format: raw
virtual size: 512 MiB (536870912 bytes)
disk size: unavailable
cluster_size: 4194304

# time qemu-img convert -t none -T none -f raw -O qcow2 rbd:kvmtest-pool/tgt.img tgt.qcow2 -p
    (100.00/100%)

real	0m5.037s
user	0m1.098s
sys	0m1.163s

2. Read from qcow2 over rbd
# qemu-img info rbd:kvmtest-pool/tgt.qcow2
image: json:{"driver": "qcow2", "file": {"pool": "kvmtest-pool", "image": "tgt.qcow2", "driver": "rbd"}}
file format: qcow2
virtual size: 512 MiB (536870912 bytes)
disk size: unavailable
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

# time qemu-img convert -t none -T none -f qcow2 -O qcow2 rbd:kvmtest-pool/tgt.qcow2 tgt.qcow2 -p
    (100.00/100%)

real	0m5.311s
user	0m1.631s
sys	0m0.951s


Actual results:
As above.


Expected results:
The write performance should be more higher, equal to or higher than that of raw ?


Additional info:
Plus, it almost takes 1 hour for me to install a guest on a qcow2 image over rbd!

Comment 2 Ademar Reis 2019-08-22 18:34:05 UTC

We need to investigate this to understand the numbers and reasons for the difference, but not a high priority given Layered Products typically do not use rbd directly in QEMU.

Comment 4 Ademar Reis 2020-02-05 23:03:38 UTC

QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks

Comment 6 zixchen 2020-09-14 11:45:48 UTC

I am doing qemu 5.1 regression test, version qemu-kvm-5.1.0-3.module+el8.3.0+7708+740a1315.x86_64, hit this issue.

Version:
kernel-4.18.0-234.el8.x86_64
qemu-kvm-5.1.0-3.module+el8.3.0+7708+740a1315.x86_64


Test Steps:
scenario 1:
Install a guest with qcow2 img over ceph server, install can success but it takes around 2.5 hours.

scenario 2:
1. # dd if=/dev/urandom of=test.img bs=1M count=512
512+0 records in
512+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 4.02937 s, 133 MB/s
2. # time qemu-img convert -t none -T none -f raw -O raw test.img rbd:rbd/tgt.img -p
    (100.00/100%)

real	0m38.653s
user	0m0.834s
sys	0m0.466s
3. # time qemu-img convert -t none -T none -f raw -O qcow2 test.img rbd:rbd/tgt.qcow2 -p
    (100.00/100%)

real	13m6.642s
user	0m4.601s
sys	0m2.826s


Actual result:
Compare to previous version, writing data into qcow2 img takes longer time, and install a guest takes around 2.5 hours is too long.

Needinfo:
As the writing data performance gets worse than before, please consider to raise priority to this issue, thanks.
Compare to

Comment 8 Daniel Berrangé 2020-09-16 07:20:14 UTC

(In reply to Ademar Reis from comment #2)
> We need to investigate this to understand the numbers and reasons for the
> difference, but not a high priority given Layered Products typically do not
> use rbd directly in QEMU.

This is not correct, OpenStack uses the in-QEMU RBD client because it has better managability than the in-kernel client.

Comment 13 Stefano Garzarella 2021-02-17 16:08:19 UTC

Recently an upstream series was posted to refactor the rbd driver: 
https://lore.kernel.org/qemu-devel/20210126112540.11880-5-pl@kamp.de/T/

I'll try to check if it improves this case, if not, I'll try to understand what is slowing down the write path with qcow2.

Comment 14 Stefano Garzarella 2021-02-23 18:06:56 UTC

The issue seems related to the objects size.

For the raw file I see '4 MiB objects', for qcow2 I see '64 KiB objects':

rbd info tgt.img  
rbd image 'tgt.img':
	size 512 MiB in 128 objects
	order 22 (4 MiB objects)

rbd info tgt.qcow2             
rbd image 'tgt.qcow2':
	size 24 MiB in 384 objects
	order 16 (64 KiB objects)

Using '-o cluster_size=2097152' with qemu-img seems to have no effect, I'm trying to figure out why.

If I force order=0 (default object size, same as raw) in the qemu RBD driver, the speed increase a lot

Comment 15 Stefano Garzarella 2021-03-04 09:20:49 UTC

As discussed upstream [1], QCOW2 on RBD is not really well supported and it is expected to be removed.
This is also because there doesn't seem to be much advantage to using QCOW2 on RBD.

If there is a specific use case where it is useful to use QCOW2 on RBD, please open a new BZ where the requirements are explained.

AFAIK layered products, such as OpenStack, do not use QCOW2 with in-QEMU RBD.

[1] https://lists.gnu.org/archive/html/qemu-devel/2021-03/msg01045.html

Comment 16 zixchen 2021-03-04 10:05:11 UTC

As QCOW2 on RBD is expected to be removed. QE agrees to close this bug.

Comment 17 zixchen 2021-03-04 10:41:01 UTC

(In reply to Stefano Garzarella from comment #15)
> As discussed upstream [1], QCOW2 on RBD is not really well supported and it
> is expected to be removed.
> This is also because there doesn't seem to be much advantage to using QCOW2
> on RBD.
> 
> If there is a specific use case where it is useful to use QCOW2 on RBD,
> please open a new BZ where the requirements are explained.
> 
> AFAIK layered products, such as OpenStack, do not use QCOW2 with in-QEMU RBD.
> 
> [1] https://lists.gnu.org/archive/html/qemu-devel/2021-03/msg01045.html

Hi Stefano,

As QCOW2 on RBD is expected to removed, could we add this to official documents? 
FYI, I asked Jiri for suggestions, he proposed some points like if it is the case in baseline RHEL+KVM?  AFAIK we don't have any docs that refer to Rados Block Devices, at least on RHEL level. Is this perhaps something that relates more to RHV or OpenStack, in terms of the affected users?

Comment 18 zixchen 2021-03-04 10:41:46 UTC

(In reply to Stefano Garzarella from comment #15)
> As discussed upstream [1], QCOW2 on RBD is not really well supported and it
> is expected to be removed.
> This is also because there doesn't seem to be much advantage to using QCOW2
> on RBD.
> 
> If there is a specific use case where it is useful to use QCOW2 on RBD,
> please open a new BZ where the requirements are explained.
> 
> AFAIK layered products, such as OpenStack, do not use QCOW2 with in-QEMU RBD.
> 
> [1] https://lists.gnu.org/archive/html/qemu-devel/2021-03/msg01045.html

Hi Stefano,

As QCOW2 on RBD is expected to removed, could we add this to official documents? 
FYI, I asked Jiri for suggestions, he proposed some points like if it is the case in baseline RHEL+KVM?  AFAIK we don't have any docs that refer to Rados Block Devices, at least on RHEL level. Is this perhaps something that relates more to RHV or OpenStack, in terms of the affected users?

Comment 19 Stefano Garzarella 2021-03-04 11:24:35 UTC

(In reply to zixchen from comment #18)
> As QCOW2 on RBD is expected to removed, could we add this to official
> documents? 

Continuing the discussion upstream, I'm not sure it will be removed, but it's never been well supported, so I agree we should document it or not say it's supported.

IIUC the only reasonable format to use on RBD is raw.

> FYI, I asked Jiri for suggestions, he proposed some points like if it is the
> case in baseline RHEL+KVM?  AFAIK we don't have any docs that refer to Rados
> Block Devices, at least on RHEL level. Is this perhaps something that
> relates more to RHV or OpenStack, in terms of the affected users?

AFAIK OpenStack doesn't allow this configuration.
I don't know about RHV, but for both cases we should advise that the only reasonable format to use on RBD is raw.

This is also written in the Ceph docs: https://docs.ceph.com/en/latest/rbd/qemu-rbd/#creating-images-with-qemu
"Important
The raw data format is really the only sensible format option to use with RBD. Technically, you could use other QEMU-supported formats (such as qcow2 or vmdk), but doing so would add additional overhead, and would also render the volume unsafe for virtual machine live migration when caching (see below) is enabled."

Comment 20 Daniel Berrangé 2021-03-04 11:49:54 UTC

(In reply to Stefano Garzarella from comment #19)
> This is also written in the Ceph docs:
> https://docs.ceph.com/en/latest/rbd/qemu-rbd/#creating-images-with-qemu
> "Important
> The raw data format is really the only sensible format option to use with
> RBD. Technically, you could use other QEMU-supported formats (such as qcow2
> or vmdk), but doing so would add additional overhead, and would also render
> the volume unsafe for virtual machine live migration when caching (see
> below) is enabled."

NB I don't believe this statement about live migration is correct. If RBD is safe for live migration at the protocol level, then any image format on top is capable of being safe, providing the right cache modes are configured. From the RBD pov the thing above is opaque, it is just reading/writing bytes requested.

More generally this statement is just saying that using non-raw formats on top of a block device doesn't make sense. This is broadly true, but none the less applications have used formats on top of block storage before, most notably RHEV uses qcow2 on block devices. This can actually be useful because by adding the qcow2 format, it lets you add a backing file to the qcow2, which means you have a non-RBD copy-on-write layer below the RBD volume.