Bug 2002009
| Summary: | IPI Baremetal, qemu-convert takes to long to save image into drive on slow/large disks | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Manuel Rodriguez <manrodri> | ||||
| Component: | Bare Metal Hardware Provisioning | Assignee: | Derek Higgins <derekh> | ||||
| Bare Metal Hardware Provisioning sub component: | ironic | QA Contact: | Lubov <lshilin> | ||||
| Status: | CLOSED ERRATA | Docs Contact: | |||||
| Severity: | medium | ||||||
| Priority: | medium | CC: | derekh, dollierp, fpercoco, lshilin | ||||
| Version: | 4.8 | Keywords: | OtherQA, Triaged | ||||
| Target Milestone: | --- | ||||||
| Target Release: | 4.11.0 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: |
Cause:
When writing the RHCOS image to some disks qemu-img was allocating space on the entire disk (including sparse areas)
Consequence:
On some hardware this could take hours
Fix:
Disable qemu-img sparse image creation
Result:
image writing no longer take hours on effected hardware
|
Story Points: | --- | ||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2022-08-10 10:37:25 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Manuel Rodriguez
2021-09-07 17:12:10 UTC
I've hit something similar on a Dell when provisioning the RHCOS image to a 1.1T HDD,
the provisioning is being delayed when using qemu-img, specifically this command
qemu-img convert -O host_device -t directsync -W /tmp/cached-rhcos-49.84.202107010027-0-openstack.x86_64.qcow2 /dev/sda
Using a smaller test image and a smaller 8G block device (/dev/sdb3), I can see that qemu-img is spending its time
zero'ing out data on the entire disk in 2G chunks, each chunk is taking about 10 seconds (see the calls to fallocate)
[root@r640-u08 ~]# rpm -qf `which qemu-img `
qemu-img-4.2.0-34.module+el8.3.0+7976+077be4ec.x86_64
[root@r640-u08 ~]# time strace -T -f -e fallocate,pwrite64 /bin/qemu-img convert -O host_device -t directsync -W afile.qcow2 /dev/sdb3 |& grep "("
[pid 13019] fallocate(10, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 0, 512) = 0 <0.007062>
[pid 13019] fallocate(10, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 0, 2147483136) = 0 <9.989639>
[pid 13019] fallocate(10, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2147483136, 2147483136) = 0 <9.894393>
[pid 13019] fallocate(10, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 4294966272, 2147483136) = 0 <9.966464>
[pid 13019] fallocate(10, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 6442449408, 2147483136) = 0 <9.899406>
[pid 13019] fallocate(10, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 8589932544, 2048) = 0 <0.016816>
[pid 13019] pwrite64(10, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 41943040 <unfinished ...>
[pid 13022] pwrite64(10, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 25165824strace: Process 13023 attached
[pid 13023] pwrite64(10, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384, 16777216strace: Process 13024 attached
[pid 13024] pwrite64(10, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 8650752 <unfinished ...>
[pid 13022] pwrite64(10, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 8388608 <unfinished ...>
[pid 13019] pwrite64(10, "\2\0\0\0\f\0\1\2.\0\0\0\2\0\0\0\f\0\2\2..\0\0\v\0\0\0\24\0\n\2"..., 16384, 1921024 <unfinished ...>
[pid 13025] pwrite64(10, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 274432, 0strace: Process 13027 attached
[pid 13027] pwrite64(10, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 278528) = 4096 <0.000364>
real 0m39.845s
user 0m0.014s
sys 0m0.020s
When I change to an older version of qemu-img(from RHEL 8.2) the fallocate system calls aren't present and the image write goes from
40 seconds to about .5 of a second.
[root@r640-u08 ~]# rpm -qf `which qemu-img `
qemu-img-2.12.0-99.module+el8.2.0+5827+8c39933c.x86_64
[root@r640-u08 ~]# time strace -T -f -e fallocate,pwrite64 /bin/qemu-img convert -O host_device -t directsync -W afile.qcow2 /dev/sdb3 |& grep "("
[pid 12979] pwrite64(10, "\2\0\0\0\f\0\1\2.\0\0\0\2\0\0\0\f\0\2\2..\0\0\v\0\0\0\24\0\n\2"..., 15360, 1921024strace: Process 12980 attached
[pid 12975] pwrite64(10, "\0302\0\0\0\310\0\0\0\n\0\0\226\255\0\0\f2\0\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 272384, 1024 <unfinished ...>
[pid 12977] pwrite64(10, "\0\0\0\0\0\0\0\0\33w7a\33w7a\33w7a\0\0\0\0\0\0\0\0\0\0\0\0"..., 1536, 279552 <unfinished ...>
[pid 12979] pwrite64(10, "\0302\0\0\0\310\0\0\0\n\0\0\230\255\0\0\r2\0\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 1536, 8389632 <unfinished ...>
[pid 12973] pwrite64(10, "DEREKH DEADBEEF\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512, 8653824 <unfinished ...>
[pid 12975] pwrite64(10, "\300;9\230\0\0\0\4\0\0\0\0\0\0\4\0\0\0\20\0\0\0\0\1\0\0\0\5\0\0\0\0"..., 14848, 16778240 <unfinished ...>
[pid 12976] pwrite64(10, "\0302\0\0\0\310\0\0\0\n\0\0\230\255\0\0\r2\0\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 1536, 25166848 <unfinished ...>
[pid 12980] pwrite64(10, "\0302\0\0\0\310\0\0\0\n\0\0\230\255\0\0\r2\0\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 1536, 41944064 <unfinished ...>
real 0m0.530s
user 0m0.017s
sys 0m0.032s
*** Bug 2005938 has been marked as a duplicate of this bug. *** This fix for has merged upstream in ironic and is part of the 4.11 and 4.10 releases. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |