Bug 1524770
Summary: | qemu-img convert hangs on converting qcow2 to raw | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | KOSAL RAJ I <kiyyappa> |
Component: | qemu-kvm-rhev | Assignee: | Kevin Wolf <kwolf> |
Status: | CLOSED DUPLICATE | QA Contact: | Ping Li <pingl> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 7.4 | CC: | berrange, brian.fife, coli, dasmith, dhill, eglynn, kchamart, kiyyappa, knoel, mbooth, michen, ngu, pingl, rbryant, sbauza, sferdjao, sgordon, shivapriya.o.hiremath, srevivo, virt-maint, vromanso |
Target Milestone: | pre-dev-freeze | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-12-21 18:28:07 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
KOSAL RAJ I
2017-12-12 02:00:26 UTC
Ping, Could you help to have a try with the bug on latest rhel7.4z versions? curl -O http://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img i=1; while true; do echo $i; qemu-img convert -O raw cirros-0.4.0-x86_64-disk.img cirros-0.4.0-x86_64-disk.raw -f qcow2; i=$[$i+1]; done It occurred on iteration 2604 ps -ef | grep convert root 24322 8602 0 21:13 pts/10 00:00:00 qemu-img convert -O raw cirros-0.4.0-x86_64-disk.img cirros-0.4.0-x86_64-disk.raw -f qcow2 [root@nspcloud-compute-43 ~]# pstack 24322 #0 0x00007ff16a943aff in ppoll () from /lib64/libc.so.6 #1 0x000056211520458b in qemu_poll_ns () #2 0x0000562115205378 in main_loop_wait () #3 0x000056211514efa3 in img_convert () #4 0x00005621151483a9 in main () Hi guys, We're hitting this exact same problem with : qemu-img convert -f qcow2 -O qcow2 /var/lib/nova/instances/99bea639-a7b4-43b9-a83a-37fdf5388eda/disk /var/lib/nova/instances/46ff755f1780407bbce1939b6971730c.test if we attach gdb to the process we see the following: (gdb) bt #0 0x00007fe62138aaff in ppoll () from /lib64/libc.so.6 #1 0x0000558b9315b58b in qemu_poll_ns () #2 0x0000558b9315c378 in main_loop_wait () #3 0x0000558b930a5fa3 in img_convert () #4 0x0000558b9309f3a9 in main () (gdb) If we run it with " strace -fffff qemu-img convert -f qcow2 -O qcow2 /var/lib/nova/instances/99bea639-a7b4-43b9-a83a-37fdf5388eda/disk /var/lib/nova/instances/46ff755f1780407bbce1939b6971730c.test > /root/strace.out 2>&1 & " it completes successfully. Dave After I had a chance to look at a core dump from David's customer, this seems to be a problem that we already have a fix for in qemu-kvm-rhev-2.9.0-16.el7_4.12. What led me to this conclusion is that we have a single active coroutine in convert_do_copy(), and the only request in it is stuck while we have a ThreadPoolElement that already is in the THREAD_DONE state, but still in the list of thread pool requests. This means that the worker function has completed, but the callback never arrived. This is the same pattern as seen in bug 1513362. (gdb) p *s $1 = {src = 0x561db9196050, src_sectors = 0x561db9196060, src_num = 1, total_sectors = 429916160, allocated_sectors = 47303384, allocated_done = 30076272, sector_num = 153985792, wr_offs = 153984768, status = BLK_DATA, sector_next_status = 153985792, target = 0x561db91ea3c0, has_zero_init = true, compressed = false, target_has_backing = false, wr_in_order = true, min_sparse = 8, cluster_sectors = 128, buf_sectors = 4096, num_coroutines = 8, running_coroutines = 8, co = {0x561db996ab40, 0x561db996ac80, 0x561db996adc0, 0x561db996af00, 0x561db996b040, 0x561db996b180, 0x561db996b2c0, 0x561db996b400, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, wait_sector_num = {153985152, 153985664, 153985408, 153984896, 153985024, 153985536, 153985280, -1, 0, 0, 0, 0, 0, 0, 0, 0}, lock = {locked = 0, ctx = 0x0, from_push = {slh_first = 0x0}, to_pop = {slh_first = 0x0}, handoff = 0, sequence = 0, holder = 0x0}, ret = -115} (gdb) p *s.target.root.bs.aio_context.thread_pool.head.lh_first $13 = {common = {aiocb_info = 0x561db7bd9750 <thread_pool_aiocb_info>, bs = 0x0, cb = 0x561db7950ad0 <thread_pool_co_cb>, opaque = 0x7fadd4697910, refcnt = 1}, pool = 0x561db9228000, func = 0x561db78df850 <aio_worker>, arg = 0x561db9172f00, state = THREAD_DONE, ret = 0, reqs = {tqe_next = 0x0, tqe_prev = 0x0}, all = {le_next = 0x0, le_prev = 0x561db9228098}} *** This bug has been marked as a duplicate of bug 1513362 *** We are facing the same issue in OSP 10 deployment where the spawning of a huge VM gets stuck. We would want to know how to get the custom build with the patch mentioned in this bugzilla. We have downloaded a source RPM (.src.rpm) from http://ftp.redhat.com/pub/redhat/linux/enterprise/7Server/en/RHOS/SRPMS/, specifically qemu-kvm-rhev-2.9.0-16.el7_4.13.src.rpm. Since this is a source RPM, we are yet to build the RPM from this file. We followed through the steps https://wiki.centos.org/HowTos/RebuildSRPM on how to build source RPMs, including installing dependencies, such as gcc and kernel-headers, but there are a ton of dependencies. We have used the 'yum-builddep <src rpm>' command to install some of the dependencies, but there are yet other packages that aren't available. These are the ff.: • bluez-libs-devel • brlapi-devel • gperftools-devel • libfdt-devel >= 1.4.3 • lbiscsi-devel • libseccomp-devel >= 2.3.0 • libssh2-devel • lzo-devel • pciutils-devel • snapp-devel Can you guide us on how to add these dependencies on RHEL OSP and let us know if we are missing any repositories? |