+++ This bug is a downstream clone. The original bug is: +++ +++ bug 1511891 +++ ====================================================================== Description of problem: Currently qemu-img has long run-times when importing disks from a export-domain. During run-time we were unable to identify the bottleneck which is causing such high run-time. Version-Release number of selected component (if applicable): vdsm-4.19.31-1.el7ev.x86_64 qemu-img-rhev-2.9.0-16.el7_4.8.x86_64 How reproducible: Any time during import of a larger disk from an NFS based export-domain Steps to Reproduce: 1. Export a sufficiently large enough (100G) virtual machine to a NFS export domain 2. Import that machine to a FC-based storage domain 3. Actual results: Import is running and takes a serious amount of time. - Network is *not* saturated - FC-device is *not* saturated - CPU is *not* running on 100% load - There is lots of free memory Expected results: - Either network or FC-device or CPU is reaching a limit Additional info: (Originally by Andreas Bleischwitz)
returning needinfo to signify we're still waiting for the logs (Originally by amureini)
How slow is the import. Can you provide numbers for this process? (Originally by ylavi)
I also suggest you tell the customer to use a data domain with its import ability. (Originally by ylavi)
Nir, can you please have a look? We might need some tweaking of qemu-img there (Originally by Tal Nisan)
Andreas, what is the original disk format? I have seen very slow qemu-img copies on fast server and storage (XtrmIO) when copying raw preallocated volume to raw preallocated volume. (Originally by Nir Soffer)
Hi Nir, we initially went the way with exporting the vms from SAN to a NFS-based export domain - I assume it will use QCOW2 for that. While the export was not remarkable slow, the import lasted much longer. As we then have been told to use a additional storage-domain for that migration, we used a second SAN based storage-domain and copied the disks from the old to the new storage-domain. This turned out to be even slower than the import from the NFS-export domain. I can no longer provide any numbers and the migration is now close to finished so that we do not even have the ability to re-run a decent export/import process. The effect should be visible regardless their environment. All they had was a vm with close to 2TB of disk. (Originally by Andreas Bleischwitz)
Mordechay, you did not mention how you copied the image - did you use qemu-img manually or move disk via engine? Also, the content of the image matters. Can you attach to this bug the output of: qemu-img info /path/to/image qemu-img map --output json /path/to/image We need to run this on the source image *before* the copy. Finally, you did not mention which NFS version was used. NFS 4.2 supports sparseness, so qemu-img can copy sparse parts much much faster (using fallocate() instead of copying zeros). It will also be interesting to compare the same copy using ovirt-imageio new cio code: You can test using this patch: https://gerrit.ovirt.org/#/c/85640/ To install this, you can download the patch from gerrit: git fetch git://gerrit.ovirt.org/ovirt-imageio refs/changes/40/85640/26 && \ git checkout FETCH_HEAD Then run this from the common directory: export PYTHONPATH=. time python test/tdd.py /path/to/source /path/to/destination (Originally by Nir Soffer)
Raz, we need to reproduce this on real hardware and storage. Mordancy did some tests (see comment 23) but we don't have enough info about the tests. For testing I'll need a decent host (leopard04/03 would be best, but buri/ucs should also be good, and iSCSI/FC/NFS storage (XtremIO would be best). (Originally by Nir Soffer)
Andreas, can you give details about the destination storage server? In comment 32 we learned that the destination storage server is a VM. Is this the same setup that you reported, or a different setup? If the issue is running NFS server on a VM, this bug should more to qemu, it is not related to qemu-img. (Originally by Nir Soffer)
Adding back needinfo for Raz, removed by mistake by some commented. We are blocked waiting for a fast server and storage for reproducing this issue. (Originally by Nir Soffer)
Daniel, As this is a performance related issue, please provide the required HW for testing (Originally by Raz Tamir)
Setting the needinfo again (Originally by Raz Tamir)
I tested copy image performance with raw format, using the new -W option in qemu-img convert. I did not test copying qcow2 to raw/qcow2 files, for tow reasons; qemu is the only tool that can read qcow2 format, and the new -W option cause fragmentation of the qcow2 file, and I'm not sure how this effects performance of the guest. ## Tested images I tested copying 3 versions of sparse image: size format data #holes ---------------------------- 100G raw 19% 6352 100G raw 52% 15561 100G raw 86% 24779 For reference here is a Fedora 27 image created by virt-builder. 6G raw 19% 73 The images are fairly fragmented - these make it harder for qemu-img to get good performance, since qemu-img has to deal with lot of small chunks of data. The 19G image was created like this: - Install Fedora 28 server on 100G FC disk - yum-builddep kernel - get current kernel tree - configure using "make olddefconfig" - make The 52G image was created from the 19G image by duplicating the linux built tree twice. The 86G image was created from the 52G image by adding 2 more duplicates of the linux tree. ## Tested hardware Tested on Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz server with 40 cores, connected to XtremIO storage via 4G FC HBAs, and 4 paths to storage. The NFS server is another server with same spec, exporting a LUN from XtremIO formatted using xfs over single 10G nic. The NFS server is mounted using NFS 4.2. ## Tested commands I compared these commands: 1. qemu-img qemu-img convert p -f raw -O raw -t none -T none src-img dst-img This is how RHV copies images since 3.6. 2. qemu-img/-W qemu-img convert p -f raw -O raw -t none -T none -W src-img dst-img 3. dd For block: blkdiscard -z -p 32m dst-img dd if=src-img of=dst-img bs=8 iflag=direct oflag=direct conv=sparse,fsync For file: truncate -s 0 dst-img truncate -s 100g dst-img dd if=src-img of=dst-img bs=8 iflag=direct oflag=direct conv=sparse,fsync This command is not the same as qemu-img - it treats holes smaller then the block size (8M) as data. But I think this is good enough. 4. parallel dd For block: blkdiscard -z -p 32m dst-img dd if=src-img of=dst-img bs=8 count=6400 iflag=direct oflag=direct \ conv=sparse,fsync & dd if=src-img of=dst-img bs=8 count=6400 seek=6400 skip=6400 iflag=direct \ oflag=direct conv=sparse,fsync & For file: truncate -s 0 dst-img truncate -s 100g dst-img dd if=src-img of=dst-img bs=8 count=6400 iflag=direct oflag=direct \ conv=notrunc,sparse,fsync & dd if=src-img of=dst-img bs=8 count=6400 seek=6400 skip=6400 \ iflag=direct oflag=direct conv=notrunc,sparse,fsync & The parallel dd commands are not very efficient with very sparse images since one process finish before the other, but they are good way to show if possible improvement. ## Versions # rpm -q qemu-img-rhev coreutils qemu-img-rhev-2.10.0-21.el7_5.4.x86_64 coreutils-8.22-21.el7.x86_64 # uname -r 3.10.0-862.6.3.el7.x86_64 ## Setup Before testing copy to FC volume, I discarded the volume: blkdiscard -p 32m dst-img When copying to NFS, I truncated the volume: truncate -s0 dst-img ## Basic read/write throughput For reference, here is the rate we can read or write on this setup: # dd if=/nfs/100-86g.img of=/dev/null bs=8M count=12800 iflag=direct conv=sparse 107374182400 bytes (107 GB) copied, 116.292 s, 923 MB/s # dd if=/dev/zero of=dst-fc1 bs=8M count=12800 oflag=direct conv=fsync 107374182400 bytes (107 GB) copied, 151.491 s, 709 MB/s # dd if=/dev/zero of=/nfs/upload.img bs=8M count=12800 oflag=direct conv=fsync 107374182400 bytes (107 GB) copied, 296.105 s, 363 MB/s ## Copying from NFS 4.2 to FC storage domain This is how raw templates are copied from export domain of from NFS data domain to FC domain, mentioned in comment 0, or how disks are copied when moving disks between storage domains. Time in seconds. image qemu-img qemu-img/-W dd parallel-dd ---------------------------------------------------------- 100/19G 242 41 165 128 100/52G 658 119 197 144 100/86G 1230 189 238 132 We can see that qemu-img give poor results, and it is worse for less sparse images. This reproduces the issue mentioned in comment 0. 1230 seconds for 100G is 83 MiB/s. With the new -W option qemu-img is the fastest with very sparse image, since it does not need to read the holes, using SEEK_DATA/SEEK_HOLE. I did not test NFS < 4.2, where we qemu has to read all the data and detect zeros manually like dd. But we can see that simple parallel dd can be faster for fully allocated images, when qemu-img has to read most of the image. This show there is room for optimization in qemu-img, even with -W. ## Copying from FC storage domain to FC storage domain This is how disks are copied between storage domains. Time in seconds. image qemu-img qemu-img/-W dd parallel-dd ---------------------------------------------------------- 100/19G 383 194 178 141 100/52G 802 282 230 167 100/86G 1229 371 287 154 In this case qemu-img and dd do not have any info on sparseness of the source image and must detect zeros manually. qemu-img with the -W option is again significantly faster, but even simple dd is faster. The difference is bigger as the image contains more data. ## Copying from FC storage domain to NFS 4.2 storage domain This is how disks are copied between storage domains, or how disks are copied to export domain, mentioned in comment 0. Time in seconds. image qemu-img qemu-img/-W dd parallel-dd ---------------------------------------------------------- 100/19G 215 194 200 n/a 100/52G 347 292 301 n/a 100/86G 493 379 398 340 qemu-img with the new -W option is faster like simple dd, but parallel dd is faster. However using -W will cause fragmentation in the destination file system, so I don't think we should use use this option. Maybe we need to test how VM performance is effected by disks copied using -W to NFS storage. ## Summary qemu-img without the -W option is very slow now. When we moved to use qemu-img in 3.6 it was faster than dd. Maybe we did not test it properly (we used 1M buffer size in dd), or maybe there was a performance regression in qemu-img since RHEL 7.2. This is the patch moving to use only qemu-img for copying images: https://github.com/oVirt/vdsm/commit/0b61c4851a528fd6354d9ab77a68085c41f35dc9 We should use -W for coping to raw volumes on block storage. Using dd for block-to-block copy and block-to-nfs is faster, but we want to use single tool for coping images. We will try to improve qemu-img performance for this use case. qemu-img 3.0 supports copy offloading, we need to test if it give better performance for block to block copy. I'll open qmeu-img bug to track the performance issues. (Originally by Nir Soffer)
Created attachment 1476302 [details] Detailed test results 100/19g sparse image (Originally by Nir Soffer)
Created attachment 1476303 [details] Detailed test results 100/52g sparse image (Originally by Nir Soffer)
Created attachment 1476304 [details] Detailed test results 100/86g sparse image (Originally by Nir Soffer)
Created attachment 1476305 [details] Parallel dd test script for file storage (Originally by Nir Soffer)
Created attachment 1476306 [details] Parallel dd test script for block storage (Originally by Nir Soffer)
We are in blocker only stage of 4.2.6. This change requires full regression testing as this is a key flow. Therefore I think we should wait for 4.2.7 to merge this. (Originally by ylavi)
Removing qa_ack+ as this won't be part of 4.2.6 (Originally by Elad Ben Aharon)
Nir Are you merging it in the coming 4.2.7 build, this Sprint
Guy , Nir tested it on our Leopards with NFS you gave him ## Tested hardware Tested on Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz server with 40 cores, connected to XtremIO storage via 4G FC HBAs, and 4 paths to storage. The NFS server is another server with same spec, exporting a LUN from XtremIO formatted using xfs over single 10G nic. The NFS server is mounted using NFS 4.2.
(In reply to Daniel Gur from comment #66) > Nir Are you merging it in the coming 4.2.7 build, this Sprint This was already merged, should be available in first 4.2.7 build.
(In reply to Nir Soffer from comment #68) > (In reply to Daniel Gur from comment #66) > > Nir Are you merging it in the coming 4.2.7 build, this Sprint > > This was already merged, should be available in first 4.2.7 build. What will be required regarding RHEL hosts? Will only be the command used for storage-migration will change, or are there any dependencies in packages on hypervisor?
(In reply to Steffen Froemer from comment #69) > What will be required regarding RHEL hosts? There is no new requirements, we use qemu-img available options introduced in latest version which is already required by vdsm.
If this bug requires doc text for errata release, please set the 'Doc Type' and provide draft text according to the template in the 'Doc Text' field. The documentation team will review, edit, and approve the text. If this bug does not require doc text, please set the 'requires_doc_text' flag to -.
(In reply to Steve Goodman from comment #72) Doc text updated.
I run the following setup : VM with 2 disks : disk 1: Preallocated Size 100 GB disk 2 : Thin provisioned Virtual size 10 GB Actual size 3 GB I have a system with 1 fiber channel SD and an export NFS domain version 4.2. Tested import from NFS, one time with 4.2.7 (vdsm-4.20.43-1), and second with run with 4.2.6 (vdsm-4.20.39.1-1). On 4.2.6 Import took 7 minute and 41 seconds. On 4.2.7 Import took 5 minute and 48 seconds. Thus on 4.2.7 import significantly improved.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:3478
sync2jira