RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1744602 - qemu-img gets stuck when stream-converting from http
Summary: qemu-img gets stuck when stream-converting from http
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: qemu-kvm
Version: ---
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: rc
: 8.0
Assignee: Hanna Czenczek
QA Contact: CongLi
URL:
Whiteboard:
Depends On: 1745209
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-08-22 14:05 UTC by Ademar Reis
Modified: 2021-12-07 23:15 UTC (History)
13 users (show)

Fixed In Version: qemu-kvm-2.12.0-91.module+el8.2.0+4916+02633b66
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1740193
: 1745209 (view as bug list)
Environment:
Last Closed: 2020-04-28 15:32:15 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-26583 0 None None None 2021-12-07 23:15:39 UTC
Red Hat Product Errata RHEA-2020:1587 0 None None None 2020-04-28 15:33:09 UTC

Description Ademar Reis 2019-08-22 14:05:47 UTC
This seems to be fixed in upstream and in RHEL8-AV, but according to the reports, it's a problem affecting some versions of QEMU (3.x?)

QE: Can we please test and try to reproduce this scenario in RHEL8 (qemu-2.12) and in RHEL-8-AV-8.0.1 (qemu-3.x?)

Max: do you know which patches fixed this? Do you think it's a regression in qemu-3.x that was eventually fixed?

+++ This bug was initially created as a clone of Bug #1740193 +++

...

--- Additional comment from Ademar Reis on 2019-08-22 08:57:51 EDT ---

(In reply to Alexander Wels from comment #11)
> Sure, so the combination of things that need to happen is pretty specific.
> You have to do a streaming conversion from an http source. If you download
> the file to a local location and then convert it works.
> 
> qemu-img convert -p -O raw
> https://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img
> 
> That particular one will start converting and usually hang around 4.7%
> progress. I have heard other people mention any qcow2 from an http source
> will exhibit the same behaviour. So this appears to happen with the 3.1.z
> series of qemu. We had to downgrade the version we were using to 3.0.z
> series in our upstream releases. We have also verified that the 4.0.0
> version works as well. But we had no viable source to get it from easily
> (maybe Fedora 31?)
> 

We have a "virt-preview" repository for Fedora which gets frequent upstream rebases. It's less stable, but might be useful: https://fedoraproject.org/wiki/Virtualization_Preview_Repository

> In short for 3.1.z version qemu any qcow2 from http source will display the
> problem, but the cirros image in small and easily usable in tests.

This is useful info, thank you. I'll clone this BZ and ask our QE to try to reproduce it and work on a test to avoid regressions.

Comment 1 Hanna Czenczek 2019-08-22 17:50:05 UTC
I don’t see how 4.x was fixed, because it is just as stuck for me.

Max

Comment 2 Alexander Wels 2019-08-22 17:56:04 UTC
Interesting, one of my team members did a local compile of the 4.0.0 and he said it worked for him. But if it doesn't then it needs to be addressed.

Comment 3 Hanna Czenczek 2019-08-22 19:05:19 UTC
It’s just that it hangs later (somewhere between 40 and 80 %).  By bisecting, I found that this series is responsible: https://lists.nongnu.org/archive/html/qemu-block/2018-11/msg00011.html

That series has refactored the qcow2 decompression code to use threading, so it’s clear that the problem in the curl code has not been fixed.

(I haven’t found the cause of the hang yet.)

Max

Comment 4 CongLi 2019-08-23 01:20:35 UTC
(In reply to Ademar Reis from comment #0)
> This seems to be fixed in upstream and in RHEL8-AV, but according to the
> reports, it's a problem affecting some versions of QEMU (3.x?)
> 
> QE: Can we please test and try to reproduce this scenario in RHEL8
> (qemu-2.12) and in RHEL-8-AV-8.0.1 (qemu-3.x?)

Hi Tingting,

Could you please have a try ?

Thanks.

Comment 5 Tingting Mao 2019-08-23 08:18:48 UTC
Hi,


In 'qemu-kvm-4.1.0-4.module+el8.1.0+4020+16089f93', convert will hit I/O error.

# qemu-img convert -p -O raw https://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img tgt.img
qemu-img: curl: Operation timed out after 5000 milliseconds with 152000 out of 268278 bytes received
qemu-img: curl: Operation timed out after 5000 milliseconds with 152000 out of 327732 bytes received
qemu-img: curl: Operation timed out after 4913 milliseconds with 312000 out of 327575 bytes received
qemu-img: curl: Operation timed out after 5001 milliseconds with 393216 out of 655360 bytes received
qemu-img: curl: Operation timed out after 5000 milliseconds with 147456 out of 393216 bytes received
qemu-img: curl: Operation timed out after 5000 milliseconds with 163840 out of 327680 bytes received
qemu-img: curl: Operation timed out after 5000 milliseconds with 112000 out of 325812 bytes received
qemu-img: curl: Operation timed out after 5001 milliseconds with 56000 out of 327543 bytes received
qemu-img: error while reading sector 26624: Input/output error
qemu-img: curl: Operation timed out after 5001 milliseconds with 229376 out of 327680 bytes received
qemu-img: curl: Operation timed out after 5000 milliseconds with 393216 out of 524288 bytes received
qemu-img: error while reading sector 27776: Input/output error
qemu-img: error while reading sector 27904: Input/output error


In "qemu-kvm-2.12.0-85.module+el8.1.0+4010+d6842f29", convert works normally.

# qemu-img convert -p -O raw https://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img tgt.img 
    (100.00/100%)


In 'qemu-kvm-3.1.0-29.module+el8.0.1+3728+47706820', convert will hit core dumped.

# qemu-img convert -p -O raw https://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img tgt.img 
Segmentation fault (core dumped)

(gdb) bt
#0  0x00007f1d1a184858 in curl_multi_do_locked (s=<optimized out>, s=<optimized out>) at block/curl.c:414
#1  0x00007f1d1a1848e6 in curl_multi_do (arg=0x563d12409620) at block/curl.c:426
#2  0x0000563d12123862 in aio_dispatch_handlers (ctx=ctx@entry=0x563d123edf40) at util/aio-posix.c:421
#3  0x0000563d121244cf in aio_poll (ctx=ctx@entry=0x563d123edf40, blocking=blocking@entry=true)
    at util/aio-posix.c:707
#4  0x0000563d120b0c45 in blk_prw (blk=blk@entry=0x563d124bcc20, offset=offset@entry=0, 
    buf=buf@entry=0x7ffd65cf1190 " nC\022=V", bytes=bytes@entry=512, 
    co_entry=co_entry@entry=0x563d120b2720 <blk_read_entry>, flags=flags@entry=0)
    at block/block-backend.c:1262
#5  0x0000563d120b1eca in blk_pread (blk=blk@entry=0x563d124bcc20, offset=offset@entry=0, 
    buf=buf@entry=0x7ffd65cf1190, count=count@entry=512) at block/block-backend.c:1424
#6  0x0000563d1207b7a9 in find_image_format (file=file@entry=0x563d124bcc20, 
    filename=filename@entry=0x7ffd65cf34d1 "https://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img", pdrv=pdrv@entry=0x7ffd65cf1410, errp=errp@entry=0x7ffd65cf1418) at block.c:725
#7  0x0000563d12082f9d in bdrv_open_inherit (
    filename=filename@entry=0x7ffd65cf34d1 "https://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img", reference=reference@entry=0x0, options=0x563d123f6e50, options@entry=0x563d123f1800, 
    flags=<optimized out>, flags@entry=0, parent=parent@entry=0x0, child_role=child_role@entry=0x0, 
    errp=0x7ffd65cf1500) at block.c:2787
#8  0x0000563d12083d55 in bdrv_open (
    filename=filename@entry=0x7ffd65cf34d1 "https://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img", reference=reference@entry=0x0, options=options@entry=0x563d123f1800, flags=flags@entry=0, 
    errp=errp@entry=0x7ffd65cf1500) at block.c:2911
#9  0x0000563d120b3935 in blk_new_open (
    filename=0x7ffd65cf34d1 "https://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img", 
    reference=0x0, options=0x563d123f1800, flags=0, errp=0x7ffd65cf1500) at block/block-backend.c:379
#10 0x0000563d1207465b in img_open_file (
    filename=0x7ffd65cf34d1 "https://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img", 
    options=0x563d123f1800, fmt=<optimized out>, flags=0, writethrough=<optimized out>, 
    force_share=<optimized out>, quiet=<optimized out>) at qemu-img.c:325
#11 0x0000563d12074832 in img_open (image_opts=<optimized out>, 
--Type <RET> for more, q to quit, c to continue without paging--
    filename=0x7ffd65cf34d1 "https://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img", 
    fmt=<optimized out>, flags=0, writethrough=<optimized out>, quiet=<optimized out>, force_share=false)
    at qemu-img.c:370
#12 0x0000563d12079020 in img_convert (argc=<optimized out>, argv=0x7ffd65cf1990) at qemu-img.c:2218
#13 0x0000563d12071ebd in main (argc=6, argv=<optimized out>) at qemu-img.c:4976

Comment 6 Hanna Czenczek 2019-08-26 18:33:26 UTC
I think I’ve found the cause of the hang: CURL has more or less recently introduced some protection against calling certain functions while a callback from CURL is ongoing.  Rather often, we settle requests in such a callback, and then want to start the next request immediately.  And that won’t work now (we also ignore the error that CURL throws, so debugging was a pain).

As for the segfault, I’ve never seen it myself.  But while debugging this issue, I saw a very suspicious FOREACH_SAFE loop in curl_multi_do_locked(); and now I see that the backtrace points to exactly that function.  So I suppose that is the likely culprit (FOREACH_SAFE is not safe for arbitrary concurrent modifications of the list in question).

I’ll send patches upstream and then we’ll see about downstream.

Max

Comment 7 Hanna Czenczek 2019-08-27 16:17:28 UTC
It looks as if the hang occurs because cURL added some protective feature against misuse in 7.59.0.  So if the test system’s cURL version is older, it will not exhibit this problem.

I don’t know where the I/O errors reported in comment 5 for 4.1.0 come from, I’ve never seen such errors myself.  I suppose the network connection was just slow, which caused the timeouts.

Max

Comment 8 Tingting Mao 2019-08-28 07:39:58 UTC
A weird thing here is that I can not reproduce this bug when I download the image to my own https server. I mean that this issue seems only occur with 'https://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img' url. 


===================================================================================================================================================
In 'qemu-kvm-4.1.0-4.module+el8.1.0+4020+16089f93'

Scenario 1(From my own https server)
1. Download the source image to my own https server
# wget https://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img
# ll /var/www/html/tmp/cirros-0.4.0-x86_64-disk.img 
-rw-r--r-- 1 root root 12716032 Nov 20  2017 /var/www/html/tmp/cirros-0.4.0-x86_64-disk.img

2. Convert in client
# qemu-img convert -O raw 'json:{"file.driver":"https", "file.url":"https://10.66.10.26/tmp/cirros-0.4.0-x86_64-disk.img", "file.sslverify":"off", "file.readahead":"64k"}' tgt.img -p
    (100.00/100%)
# echo $?
0


Scenario 2(From the original server)

# qemu-img convert -O raw 'json:{"file.driver":"https", "file.url":"https://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img", "file.sslverify":"off", "file.readahead":"64k"}' tgt.img -p
qemu-img: curl: Operation timed out after 5000 milliseconds with 32768 out of 65544 bytes received
qemu-img: curl: Operation timed out after 5001 milliseconds with 40000 out of 89966 bytes received
qemu-img: curl: Operation timed out after 5000 milliseconds with 64000 out of 89562 bytes received
qemu-img: curl: Operation timed out after 5000 milliseconds with 112000 out of 131090 bytes received
qemu-img: curl: Operation timed out after 5000 milliseconds with 120000 out of 128785 bytes received
qemu-img: curl: Operation timed out after 5000 milliseconds with 112000 out of 130967 bytes received
qemu-img: curl: Operation timed out after 5000 milliseconds with 147456 out of 458752 bytes received
qemu-img: error while reading sector 23680: Input/output error
qemu-img: curl: Operation timed out after 5000 milliseconds with 65536 out of 131072 bytes received
qemu-img: curl: Operation timed out after 5001 milliseconds with 147456 out of 196608 bytes received
qemu-img: curl: Operation timed out after 5000 milliseconds with 65536 out of 131072 bytes received
qemu-img: curl: Operation timed out after 5000 milliseconds with 8000 out of 130107 bytes received
qemu-img: error while reading sector 24960: Input/output error
qemu-img: curl: Operation timed out after 5001 milliseconds with 49152 out of 130560 bytes received
qemu-img: error while reading sector 25344: Input/output error

# echo $?
1
=====================================================================================================================================================


In 'qemu-kvm-3.1.0-29.module+el8.0.1+3728+47706820'

Scenario 1(Convert from my own https server)
# qemu-img convert -O raw 'json:{"file.driver":"https", "file.url":"https://10.66.10.26/tmp/cirros-0.4.0-x86_64-disk.img", "file.sslverify":"off", "file.readahead":"64k"}' tgt.img -p    (100.00/100%)

# echo $?
0


Scenario 2(Convert from the original server)
# qemu-img convert -O raw 'json:{"file.driver":"https", "file.url":"https://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img", "file.sslverify":"off", "file.readahead":"64k"}' tgt.img -p
Segmentation fault (core dumped)

Comment 9 Hanna Czenczek 2019-08-28 12:31:59 UTC
It doesn’t seem that weird to me.  It just looks like your connection to the cirros server is too slow.  You could try a higher timeout ("file.timeout": 30 or somthing like that in the json:{} options) to see whether it changes anything.

Max

Comment 10 Tingting Mao 2019-08-29 02:43:35 UTC
(In reply to Max Reitz from comment #9)
> It doesn’t seem that weird to me.  It just looks like your connection to the
> cirros server is too slow.  You could try a higher timeout ("file.timeout":
> 30 or somthing like that in the json:{} options) to see whether it changes
> anything.

Yeah, you are right. When I set the timeout is 80, convert from the original source image works. Thanks for your info.

# qemu-img convert -O raw 'json:{"file.driver":"https", "file.url":"https://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img", "file.sslverify":"off", "file.readahead":"64k", "file.timeout": 30}' tgt.img -p
qemu-img: curl: Operation timed out after 30001 milliseconds with 589824 out of 1048576 bytes received
qemu-img: error while reading sector 61056: Input/output error
qemu-img: curl: Operation timed out after 30000 milliseconds with 557056 out of 2097152 bytes received
qemu-img: error while reading sector 63616: Input/output error

# qemu-img convert -O raw 'json:{"file.driver":"https", "file.url":"https://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img", "file.sslverify":"off", "file.readahead":"64k", "file.timeout": 50}' tgt.img -p
qemu-img: curl: Operation timed out after 50000 milliseconds with 1114112 out of 2097152 bytes received
qemu-img: error while reading sector 63616: Input/output error
qemu-img: error while reading sector 68608: Input/output error

# qemu-img convert -O raw 'json:{"file.driver":"https", "file.url":"https://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img", "file.sslverify":"off", "file.readahead":"64k", "file.timeout": 80}' tgt.img -p
    (100.00/100%)

Comment 17 Hanna Czenczek 2019-11-28 08:08:13 UTC
Hi,

That’s OK.  I think that’s to be expected as long as you don’t connect to a local server where the latency is guaranteed to be low.

Max

Comment 19 Ademar Reis 2020-02-05 23:03:45 UTC
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks

Comment 22 errata-xmlrpc 2020-04-28 15:32:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:1587


Note You need to log in before you can comment on or make changes to this bug.