Bug 1744602
Summary: | qemu-img gets stuck when stream-converting from http | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Ademar Reis <areis> | |
Component: | qemu-kvm | Assignee: | Hanna Czenczek <hreitz> | |
qemu-kvm sub component: | Storage | QA Contact: | CongLi <coli> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | medium | |||
Priority: | high | CC: | awels, cnv-qe-bugs, coli, grajaiya, hreitz, jinzhao, juzhang, ncredi, ngavrilo, rbalakri, sgordon, virt-maint, ycui | |
Version: | --- | Keywords: | Regression | |
Target Milestone: | rc | |||
Target Release: | 8.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | qemu-kvm-2.12.0-91.module+el8.2.0+4916+02633b66 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | 1740193 | |||
: | 1745209 (view as bug list) | Environment: | ||
Last Closed: | 2020-04-28 15:32:15 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1745209 | |||
Bug Blocks: |
Description
Ademar Reis
2019-08-22 14:05:47 UTC
I don’t see how 4.x was fixed, because it is just as stuck for me. Max Interesting, one of my team members did a local compile of the 4.0.0 and he said it worked for him. But if it doesn't then it needs to be addressed. It’s just that it hangs later (somewhere between 40 and 80 %). By bisecting, I found that this series is responsible: https://lists.nongnu.org/archive/html/qemu-block/2018-11/msg00011.html That series has refactored the qcow2 decompression code to use threading, so it’s clear that the problem in the curl code has not been fixed. (I haven’t found the cause of the hang yet.) Max (In reply to Ademar Reis from comment #0) > This seems to be fixed in upstream and in RHEL8-AV, but according to the > reports, it's a problem affecting some versions of QEMU (3.x?) > > QE: Can we please test and try to reproduce this scenario in RHEL8 > (qemu-2.12) and in RHEL-8-AV-8.0.1 (qemu-3.x?) Hi Tingting, Could you please have a try ? Thanks. Hi, In 'qemu-kvm-4.1.0-4.module+el8.1.0+4020+16089f93', convert will hit I/O error. # qemu-img convert -p -O raw https://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img tgt.img qemu-img: curl: Operation timed out after 5000 milliseconds with 152000 out of 268278 bytes received qemu-img: curl: Operation timed out after 5000 milliseconds with 152000 out of 327732 bytes received qemu-img: curl: Operation timed out after 4913 milliseconds with 312000 out of 327575 bytes received qemu-img: curl: Operation timed out after 5001 milliseconds with 393216 out of 655360 bytes received qemu-img: curl: Operation timed out after 5000 milliseconds with 147456 out of 393216 bytes received qemu-img: curl: Operation timed out after 5000 milliseconds with 163840 out of 327680 bytes received qemu-img: curl: Operation timed out after 5000 milliseconds with 112000 out of 325812 bytes received qemu-img: curl: Operation timed out after 5001 milliseconds with 56000 out of 327543 bytes received qemu-img: error while reading sector 26624: Input/output error qemu-img: curl: Operation timed out after 5001 milliseconds with 229376 out of 327680 bytes received qemu-img: curl: Operation timed out after 5000 milliseconds with 393216 out of 524288 bytes received qemu-img: error while reading sector 27776: Input/output error qemu-img: error while reading sector 27904: Input/output error In "qemu-kvm-2.12.0-85.module+el8.1.0+4010+d6842f29", convert works normally. # qemu-img convert -p -O raw https://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img tgt.img (100.00/100%) In 'qemu-kvm-3.1.0-29.module+el8.0.1+3728+47706820', convert will hit core dumped. # qemu-img convert -p -O raw https://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img tgt.img Segmentation fault (core dumped) (gdb) bt #0 0x00007f1d1a184858 in curl_multi_do_locked (s=<optimized out>, s=<optimized out>) at block/curl.c:414 #1 0x00007f1d1a1848e6 in curl_multi_do (arg=0x563d12409620) at block/curl.c:426 #2 0x0000563d12123862 in aio_dispatch_handlers (ctx=ctx@entry=0x563d123edf40) at util/aio-posix.c:421 #3 0x0000563d121244cf in aio_poll (ctx=ctx@entry=0x563d123edf40, blocking=blocking@entry=true) at util/aio-posix.c:707 #4 0x0000563d120b0c45 in blk_prw (blk=blk@entry=0x563d124bcc20, offset=offset@entry=0, buf=buf@entry=0x7ffd65cf1190 " nC\022=V", bytes=bytes@entry=512, co_entry=co_entry@entry=0x563d120b2720 <blk_read_entry>, flags=flags@entry=0) at block/block-backend.c:1262 #5 0x0000563d120b1eca in blk_pread (blk=blk@entry=0x563d124bcc20, offset=offset@entry=0, buf=buf@entry=0x7ffd65cf1190, count=count@entry=512) at block/block-backend.c:1424 #6 0x0000563d1207b7a9 in find_image_format (file=file@entry=0x563d124bcc20, filename=filename@entry=0x7ffd65cf34d1 "https://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img", pdrv=pdrv@entry=0x7ffd65cf1410, errp=errp@entry=0x7ffd65cf1418) at block.c:725 #7 0x0000563d12082f9d in bdrv_open_inherit ( filename=filename@entry=0x7ffd65cf34d1 "https://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img", reference=reference@entry=0x0, options=0x563d123f6e50, options@entry=0x563d123f1800, flags=<optimized out>, flags@entry=0, parent=parent@entry=0x0, child_role=child_role@entry=0x0, errp=0x7ffd65cf1500) at block.c:2787 #8 0x0000563d12083d55 in bdrv_open ( filename=filename@entry=0x7ffd65cf34d1 "https://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img", reference=reference@entry=0x0, options=options@entry=0x563d123f1800, flags=flags@entry=0, errp=errp@entry=0x7ffd65cf1500) at block.c:2911 #9 0x0000563d120b3935 in blk_new_open ( filename=0x7ffd65cf34d1 "https://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img", reference=0x0, options=0x563d123f1800, flags=0, errp=0x7ffd65cf1500) at block/block-backend.c:379 #10 0x0000563d1207465b in img_open_file ( filename=0x7ffd65cf34d1 "https://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img", options=0x563d123f1800, fmt=<optimized out>, flags=0, writethrough=<optimized out>, force_share=<optimized out>, quiet=<optimized out>) at qemu-img.c:325 #11 0x0000563d12074832 in img_open (image_opts=<optimized out>, --Type <RET> for more, q to quit, c to continue without paging-- filename=0x7ffd65cf34d1 "https://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img", fmt=<optimized out>, flags=0, writethrough=<optimized out>, quiet=<optimized out>, force_share=false) at qemu-img.c:370 #12 0x0000563d12079020 in img_convert (argc=<optimized out>, argv=0x7ffd65cf1990) at qemu-img.c:2218 #13 0x0000563d12071ebd in main (argc=6, argv=<optimized out>) at qemu-img.c:4976 I think I’ve found the cause of the hang: CURL has more or less recently introduced some protection against calling certain functions while a callback from CURL is ongoing. Rather often, we settle requests in such a callback, and then want to start the next request immediately. And that won’t work now (we also ignore the error that CURL throws, so debugging was a pain). As for the segfault, I’ve never seen it myself. But while debugging this issue, I saw a very suspicious FOREACH_SAFE loop in curl_multi_do_locked(); and now I see that the backtrace points to exactly that function. So I suppose that is the likely culprit (FOREACH_SAFE is not safe for arbitrary concurrent modifications of the list in question). I’ll send patches upstream and then we’ll see about downstream. Max It looks as if the hang occurs because cURL added some protective feature against misuse in 7.59.0. So if the test system’s cURL version is older, it will not exhibit this problem. I don’t know where the I/O errors reported in comment 5 for 4.1.0 come from, I’ve never seen such errors myself. I suppose the network connection was just slow, which caused the timeouts. Max A weird thing here is that I can not reproduce this bug when I download the image to my own https server. I mean that this issue seems only occur with 'https://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img' url. =================================================================================================================================================== In 'qemu-kvm-4.1.0-4.module+el8.1.0+4020+16089f93' Scenario 1(From my own https server) 1. Download the source image to my own https server # wget https://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img # ll /var/www/html/tmp/cirros-0.4.0-x86_64-disk.img -rw-r--r-- 1 root root 12716032 Nov 20 2017 /var/www/html/tmp/cirros-0.4.0-x86_64-disk.img 2. Convert in client # qemu-img convert -O raw 'json:{"file.driver":"https", "file.url":"https://10.66.10.26/tmp/cirros-0.4.0-x86_64-disk.img", "file.sslverify":"off", "file.readahead":"64k"}' tgt.img -p (100.00/100%) # echo $? 0 Scenario 2(From the original server) # qemu-img convert -O raw 'json:{"file.driver":"https", "file.url":"https://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img", "file.sslverify":"off", "file.readahead":"64k"}' tgt.img -p qemu-img: curl: Operation timed out after 5000 milliseconds with 32768 out of 65544 bytes received qemu-img: curl: Operation timed out after 5001 milliseconds with 40000 out of 89966 bytes received qemu-img: curl: Operation timed out after 5000 milliseconds with 64000 out of 89562 bytes received qemu-img: curl: Operation timed out after 5000 milliseconds with 112000 out of 131090 bytes received qemu-img: curl: Operation timed out after 5000 milliseconds with 120000 out of 128785 bytes received qemu-img: curl: Operation timed out after 5000 milliseconds with 112000 out of 130967 bytes received qemu-img: curl: Operation timed out after 5000 milliseconds with 147456 out of 458752 bytes received qemu-img: error while reading sector 23680: Input/output error qemu-img: curl: Operation timed out after 5000 milliseconds with 65536 out of 131072 bytes received qemu-img: curl: Operation timed out after 5001 milliseconds with 147456 out of 196608 bytes received qemu-img: curl: Operation timed out after 5000 milliseconds with 65536 out of 131072 bytes received qemu-img: curl: Operation timed out after 5000 milliseconds with 8000 out of 130107 bytes received qemu-img: error while reading sector 24960: Input/output error qemu-img: curl: Operation timed out after 5001 milliseconds with 49152 out of 130560 bytes received qemu-img: error while reading sector 25344: Input/output error # echo $? 1 ===================================================================================================================================================== In 'qemu-kvm-3.1.0-29.module+el8.0.1+3728+47706820' Scenario 1(Convert from my own https server) # qemu-img convert -O raw 'json:{"file.driver":"https", "file.url":"https://10.66.10.26/tmp/cirros-0.4.0-x86_64-disk.img", "file.sslverify":"off", "file.readahead":"64k"}' tgt.img -p (100.00/100%) # echo $? 0 Scenario 2(Convert from the original server) # qemu-img convert -O raw 'json:{"file.driver":"https", "file.url":"https://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img", "file.sslverify":"off", "file.readahead":"64k"}' tgt.img -p Segmentation fault (core dumped) It doesn’t seem that weird to me. It just looks like your connection to the cirros server is too slow. You could try a higher timeout ("file.timeout": 30 or somthing like that in the json:{} options) to see whether it changes anything. Max (In reply to Max Reitz from comment #9) > It doesn’t seem that weird to me. It just looks like your connection to the > cirros server is too slow. You could try a higher timeout ("file.timeout": > 30 or somthing like that in the json:{} options) to see whether it changes > anything. Yeah, you are right. When I set the timeout is 80, convert from the original source image works. Thanks for your info. # qemu-img convert -O raw 'json:{"file.driver":"https", "file.url":"https://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img", "file.sslverify":"off", "file.readahead":"64k", "file.timeout": 30}' tgt.img -p qemu-img: curl: Operation timed out after 30001 milliseconds with 589824 out of 1048576 bytes received qemu-img: error while reading sector 61056: Input/output error qemu-img: curl: Operation timed out after 30000 milliseconds with 557056 out of 2097152 bytes received qemu-img: error while reading sector 63616: Input/output error # qemu-img convert -O raw 'json:{"file.driver":"https", "file.url":"https://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img", "file.sslverify":"off", "file.readahead":"64k", "file.timeout": 50}' tgt.img -p qemu-img: curl: Operation timed out after 50000 milliseconds with 1114112 out of 2097152 bytes received qemu-img: error while reading sector 63616: Input/output error qemu-img: error while reading sector 68608: Input/output error # qemu-img convert -O raw 'json:{"file.driver":"https", "file.url":"https://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img", "file.sslverify":"off", "file.readahead":"64k", "file.timeout": 80}' tgt.img -p (100.00/100%) Hi, That’s OK. I think that’s to be expected as long as you don’t connect to a local server where the latency is guaranteed to be low. Max QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:1587 |