Created attachment 1821291 [details] console output ironic events Description of problem: OCP deployments take too long to complete bootstrap up to 6 hours in servers with 4TB disk size because qemu-convert takes long time to save image into drive Version-Release number of selected component (if applicable): OCP 4.7 and 4.8 (probably 4.6 too, but I haven't tested) How reproducible: Steps to Reproduce: 1. Deploy OCP 4.7 or above in baremetal nodes with large drives 4TB+ 2. Monitor the progress, you'll see in .openshift-install.log that the following step runs for hours module.masters.ironic_deployment.openshift-master-deployment 3. When the deployment is completed you'll see in the summary, the time to prepare the infrastructure runs for 5+ hours. level=debug msg="Time elapsed per stage:" level=debug msg=" Infrastructure: 5h56m40s" level=debug msg="Bootstrap Complete: 22m53s" level=debug msg=" Bootstrap Destroy: 12s" level=debug msg=" Cluster Operators: 19m53s" level=info msg="Time elapsed: 6h39m41s" 4. When checking on the console of one of the servers, we can see the image is getting saved in the disk and then just constant ironic heart-beats are displayed. [see image attached: master-node-writing-disk-ironic.png] 5. Finally when logging to one of the servers during the bootstrap, we can see the process running is qemu-convert[image attached: master-node-writing-disk-ironic.png] Upstream code: https://github.com/openstack/ironic-python-agent/blob/7.0.1/ironic_python_agent/shell/write_image.sh#L52 Actual results: Successful, but very long deployments (6+ hours) Expected results: Faster deployments, (1+ hour) Additional info: These servers are not using RAID controller, they are disks directly connected to the motherboard. It seems servers behind a RAID card in RAID-1 get better deployment times.
I've hit something similar on a Dell when provisioning the RHCOS image to a 1.1T HDD, the provisioning is being delayed when using qemu-img, specifically this command qemu-img convert -O host_device -t directsync -W /tmp/cached-rhcos-49.84.202107010027-0-openstack.x86_64.qcow2 /dev/sda Using a smaller test image and a smaller 8G block device (/dev/sdb3), I can see that qemu-img is spending its time zero'ing out data on the entire disk in 2G chunks, each chunk is taking about 10 seconds (see the calls to fallocate) [root@r640-u08 ~]# rpm -qf `which qemu-img ` qemu-img-4.2.0-34.module+el8.3.0+7976+077be4ec.x86_64 [root@r640-u08 ~]# time strace -T -f -e fallocate,pwrite64 /bin/qemu-img convert -O host_device -t directsync -W afile.qcow2 /dev/sdb3 |& grep "(" [pid 13019] fallocate(10, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 0, 512) = 0 <0.007062> [pid 13019] fallocate(10, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 0, 2147483136) = 0 <9.989639> [pid 13019] fallocate(10, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2147483136, 2147483136) = 0 <9.894393> [pid 13019] fallocate(10, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 4294966272, 2147483136) = 0 <9.966464> [pid 13019] fallocate(10, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 6442449408, 2147483136) = 0 <9.899406> [pid 13019] fallocate(10, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 8589932544, 2048) = 0 <0.016816> [pid 13019] pwrite64(10, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 41943040 <unfinished ...> [pid 13022] pwrite64(10, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 25165824strace: Process 13023 attached [pid 13023] pwrite64(10, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384, 16777216strace: Process 13024 attached [pid 13024] pwrite64(10, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 8650752 <unfinished ...> [pid 13022] pwrite64(10, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 8388608 <unfinished ...> [pid 13019] pwrite64(10, "\2\0\0\0\f\0\1\2.\0\0\0\2\0\0\0\f\0\2\2..\0\0\v\0\0\0\24\0\n\2"..., 16384, 1921024 <unfinished ...> [pid 13025] pwrite64(10, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 274432, 0strace: Process 13027 attached [pid 13027] pwrite64(10, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 278528) = 4096 <0.000364> real 0m39.845s user 0m0.014s sys 0m0.020s When I change to an older version of qemu-img(from RHEL 8.2) the fallocate system calls aren't present and the image write goes from 40 seconds to about .5 of a second. [root@r640-u08 ~]# rpm -qf `which qemu-img ` qemu-img-2.12.0-99.module+el8.2.0+5827+8c39933c.x86_64 [root@r640-u08 ~]# time strace -T -f -e fallocate,pwrite64 /bin/qemu-img convert -O host_device -t directsync -W afile.qcow2 /dev/sdb3 |& grep "(" [pid 12979] pwrite64(10, "\2\0\0\0\f\0\1\2.\0\0\0\2\0\0\0\f\0\2\2..\0\0\v\0\0\0\24\0\n\2"..., 15360, 1921024strace: Process 12980 attached [pid 12975] pwrite64(10, "\0302\0\0\0\310\0\0\0\n\0\0\226\255\0\0\f2\0\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 272384, 1024 <unfinished ...> [pid 12977] pwrite64(10, "\0\0\0\0\0\0\0\0\33w7a\33w7a\33w7a\0\0\0\0\0\0\0\0\0\0\0\0"..., 1536, 279552 <unfinished ...> [pid 12979] pwrite64(10, "\0302\0\0\0\310\0\0\0\n\0\0\230\255\0\0\r2\0\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 1536, 8389632 <unfinished ...> [pid 12973] pwrite64(10, "DEREKH DEADBEEF\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512, 8653824 <unfinished ...> [pid 12975] pwrite64(10, "\300;9\230\0\0\0\4\0\0\0\0\0\0\4\0\0\0\20\0\0\0\0\1\0\0\0\5\0\0\0\0"..., 14848, 16778240 <unfinished ...> [pid 12976] pwrite64(10, "\0302\0\0\0\310\0\0\0\n\0\0\230\255\0\0\r2\0\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 1536, 25166848 <unfinished ...> [pid 12980] pwrite64(10, "\0302\0\0\0\310\0\0\0\n\0\0\230\255\0\0\r2\0\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 1536, 41944064 <unfinished ...> real 0m0.530s user 0m0.017s sys 0m0.032s
*** Bug 2005938 has been marked as a duplicate of this bug. ***
This fix for has merged upstream in ironic and is part of the 4.11 and 4.10 releases.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069