Description of problem: When tried to verify BZ 1171007, I noticed that the write performance for the qcow2 images is too low. The write speed of qcow2+rbd is almost 15 times slower than that of raw+rbd(Tested by convert, refer to below info, please). While for read, the speed is nearly the same! Version-Release number of selected component (if applicable): qemu-kvm-4.1.0-4.module+el8.1.0+4020+16089f93 kernel-4.18.0-134.el8 How reproducible: 100% Steps to Reproduce: Scenario 1(Test write performance with 'qemu-img convert') 1. Create a file with 512M locally. # dd if=/dev/urandom of=test.img bs=1M count=512 512+0 records in 512+0 records out 536870912 bytes (537 MB, 512 MiB) copied, 3.59897 s, 149 MB/s 2. Convert it to raw image over rbd # time qemu-img convert -t none -T none -f raw -O raw test.img rbd:kvmtest-pool/tgt.img -p (100.00/100%) real 0m41.138s user 0m1.464s sys 0m0.376s 3. Convert it to qcow2 image over rbd # time qemu-img convert -t none -T none -f raw -O qcow2 test.img rbd:kvmtest-pool/tgt.qcow2 -p (100.00/100%) real 10m15.055s ------------------------------------- Too slow!! user 0m6.364s sys 0m1.456s Scenario 2(Test read performance with 'qemu-img convert') 1. Read from raw over rbd # qemu-img info rbd:kvmtest-pool/tgt.img image: json:{"driver": "raw", "file": {"pool": "kvmtest-pool", "image": "tgt.img", "driver": "rbd"}} file format: raw virtual size: 512 MiB (536870912 bytes) disk size: unavailable cluster_size: 4194304 # time qemu-img convert -t none -T none -f raw -O qcow2 rbd:kvmtest-pool/tgt.img tgt.qcow2 -p (100.00/100%) real 0m5.037s user 0m1.098s sys 0m1.163s 2. Read from qcow2 over rbd # qemu-img info rbd:kvmtest-pool/tgt.qcow2 image: json:{"driver": "qcow2", "file": {"pool": "kvmtest-pool", "image": "tgt.qcow2", "driver": "rbd"}} file format: qcow2 virtual size: 512 MiB (536870912 bytes) disk size: unavailable cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false # time qemu-img convert -t none -T none -f qcow2 -O qcow2 rbd:kvmtest-pool/tgt.qcow2 tgt.qcow2 -p (100.00/100%) real 0m5.311s user 0m1.631s sys 0m0.951s Actual results: As above. Expected results: The write performance should be more higher, equal to or higher than that of raw ? Additional info: Plus, it almost takes 1 hour for me to install a guest on a qcow2 image over rbd!
We need to investigate this to understand the numbers and reasons for the difference, but not a high priority given Layered Products typically do not use rbd directly in QEMU.
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks
I am doing qemu 5.1 regression test, version qemu-kvm-5.1.0-3.module+el8.3.0+7708+740a1315.x86_64, hit this issue. Version: kernel-4.18.0-234.el8.x86_64 qemu-kvm-5.1.0-3.module+el8.3.0+7708+740a1315.x86_64 Test Steps: scenario 1: Install a guest with qcow2 img over ceph server, install can success but it takes around 2.5 hours. scenario 2: 1. # dd if=/dev/urandom of=test.img bs=1M count=512 512+0 records in 512+0 records out 536870912 bytes (537 MB, 512 MiB) copied, 4.02937 s, 133 MB/s 2. # time qemu-img convert -t none -T none -f raw -O raw test.img rbd:rbd/tgt.img -p (100.00/100%) real 0m38.653s user 0m0.834s sys 0m0.466s 3. # time qemu-img convert -t none -T none -f raw -O qcow2 test.img rbd:rbd/tgt.qcow2 -p (100.00/100%) real 13m6.642s user 0m4.601s sys 0m2.826s Actual result: Compare to previous version, writing data into qcow2 img takes longer time, and install a guest takes around 2.5 hours is too long. Needinfo: As the writing data performance gets worse than before, please consider to raise priority to this issue, thanks. Compare to
(In reply to Ademar Reis from comment #2) > We need to investigate this to understand the numbers and reasons for the > difference, but not a high priority given Layered Products typically do not > use rbd directly in QEMU. This is not correct, OpenStack uses the in-QEMU RBD client because it has better managability than the in-kernel client.
Recently an upstream series was posted to refactor the rbd driver: https://lore.kernel.org/qemu-devel/20210126112540.11880-5-pl@kamp.de/T/ I'll try to check if it improves this case, if not, I'll try to understand what is slowing down the write path with qcow2.
The issue seems related to the objects size. For the raw file I see '4 MiB objects', for qcow2 I see '64 KiB objects': rbd info tgt.img rbd image 'tgt.img': size 512 MiB in 128 objects order 22 (4 MiB objects) rbd info tgt.qcow2 rbd image 'tgt.qcow2': size 24 MiB in 384 objects order 16 (64 KiB objects) Using '-o cluster_size=2097152' with qemu-img seems to have no effect, I'm trying to figure out why. If I force order=0 (default object size, same as raw) in the qemu RBD driver, the speed increase a lot
As discussed upstream [1], QCOW2 on RBD is not really well supported and it is expected to be removed. This is also because there doesn't seem to be much advantage to using QCOW2 on RBD. If there is a specific use case where it is useful to use QCOW2 on RBD, please open a new BZ where the requirements are explained. AFAIK layered products, such as OpenStack, do not use QCOW2 with in-QEMU RBD. [1] https://lists.gnu.org/archive/html/qemu-devel/2021-03/msg01045.html
As QCOW2 on RBD is expected to be removed. QE agrees to close this bug.
(In reply to Stefano Garzarella from comment #15) > As discussed upstream [1], QCOW2 on RBD is not really well supported and it > is expected to be removed. > This is also because there doesn't seem to be much advantage to using QCOW2 > on RBD. > > If there is a specific use case where it is useful to use QCOW2 on RBD, > please open a new BZ where the requirements are explained. > > AFAIK layered products, such as OpenStack, do not use QCOW2 with in-QEMU RBD. > > [1] https://lists.gnu.org/archive/html/qemu-devel/2021-03/msg01045.html Hi Stefano, As QCOW2 on RBD is expected to removed, could we add this to official documents? FYI, I asked Jiri for suggestions, he proposed some points like if it is the case in baseline RHEL+KVM? AFAIK we don't have any docs that refer to Rados Block Devices, at least on RHEL level. Is this perhaps something that relates more to RHV or OpenStack, in terms of the affected users?
(In reply to zixchen from comment #18) > As QCOW2 on RBD is expected to removed, could we add this to official > documents? Continuing the discussion upstream, I'm not sure it will be removed, but it's never been well supported, so I agree we should document it or not say it's supported. IIUC the only reasonable format to use on RBD is raw. > FYI, I asked Jiri for suggestions, he proposed some points like if it is the > case in baseline RHEL+KVM? AFAIK we don't have any docs that refer to Rados > Block Devices, at least on RHEL level. Is this perhaps something that > relates more to RHV or OpenStack, in terms of the affected users? AFAIK OpenStack doesn't allow this configuration. I don't know about RHV, but for both cases we should advise that the only reasonable format to use on RBD is raw. This is also written in the Ceph docs: https://docs.ceph.com/en/latest/rbd/qemu-rbd/#creating-images-with-qemu "Important The raw data format is really the only sensible format option to use with RBD. Technically, you could use other QEMU-supported formats (such as qcow2 or vmdk), but doing so would add additional overhead, and would also render the volume unsafe for virtual machine live migration when caching (see below) is enabled."
(In reply to Stefano Garzarella from comment #19) > This is also written in the Ceph docs: > https://docs.ceph.com/en/latest/rbd/qemu-rbd/#creating-images-with-qemu > "Important > The raw data format is really the only sensible format option to use with > RBD. Technically, you could use other QEMU-supported formats (such as qcow2 > or vmdk), but doing so would add additional overhead, and would also render > the volume unsafe for virtual machine live migration when caching (see > below) is enabled." NB I don't believe this statement about live migration is correct. If RBD is safe for live migration at the protocol level, then any image format on top is capable of being safe, providing the right cache modes are configured. From the RBD pov the thing above is opaque, it is just reading/writing bytes requested. More generally this statement is just saying that using non-raw formats on top of a block device doesn't make sense. This is broadly true, but none the less applications have used formats on top of block storage before, most notably RHEV uses qcow2 on block devices. This can actually be useful because by adding the qcow2 format, it lets you add a backing file to the qcow2, which means you have a non-RBD copy-on-write layer below the RBD volume.