RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1524770 - qemu-img convert hangs on converting qcow2 to raw
Summary: qemu-img convert hangs on converting qcow2 to raw
Keywords:
Status: CLOSED DUPLICATE of bug 1513362
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.4
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: pre-dev-freeze
: ---
Assignee: Kevin Wolf
QA Contact: Ping Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-12-12 02:00 UTC by KOSAL RAJ I
Modified: 2021-12-10 15:29 UTC (History)
21 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-12-21 18:28:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description KOSAL RAJ I 2017-12-12 02:00:26 UTC
Description of problem:
On some of the hypervisors, converting a qcow2 image to raw as part of instance creation is hanging, with the instance remaining in the 'BUILD' state.  attempting to delete the instance ends up with instance/stack stuck in the 'deleting' state. the only workaround has been to restart openstack-nova-compute on the afflicted hypervisor

Version-Release number of selected component (if applicable):
RHOSP 10

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 8 Gu Nini 2017-12-19 10:17:29 UTC
Ping,

Could you help to have a try with the bug on latest rhel7.4z versions?

Comment 11 Brian Fife 2017-12-21 02:24:21 UTC
curl -O  http://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img
i=1; while true; do echo $i; qemu-img convert -O raw cirros-0.4.0-x86_64-disk.img cirros-0.4.0-x86_64-disk.raw -f qcow2; i=$[$i+1]; done

It occurred on iteration 2604

ps -ef | grep convert
root       24322    8602  0 21:13 pts/10   00:00:00 qemu-img convert -O raw cirros-0.4.0-x86_64-disk.img cirros-0.4.0-x86_64-disk.raw -f qcow2
[root@nspcloud-compute-43 ~]# pstack 24322
#0  0x00007ff16a943aff in ppoll () from /lib64/libc.so.6
#1  0x000056211520458b in qemu_poll_ns ()
#2  0x0000562115205378 in main_loop_wait ()
#3  0x000056211514efa3 in img_convert ()
#4  0x00005621151483a9 in main ()

Comment 12 David Hill 2017-12-21 16:26:48 UTC
Hi guys,

    We're hitting this exact same problem with :

qemu-img convert -f qcow2 -O qcow2 /var/lib/nova/instances/99bea639-a7b4-43b9-a83a-37fdf5388eda/disk /var/lib/nova/instances/46ff755f1780407bbce1939b6971730c.test 

if we attach gdb to the process we see the following:

(gdb) bt
#0  0x00007fe62138aaff in ppoll () from /lib64/libc.so.6
#1  0x0000558b9315b58b in qemu_poll_ns ()
#2  0x0000558b9315c378 in main_loop_wait ()
#3  0x0000558b930a5fa3 in img_convert ()
#4  0x0000558b9309f3a9 in main ()
(gdb) 

If we run it with " strace -fffff qemu-img convert -f qcow2 -O qcow2 /var/lib/nova/instances/99bea639-a7b4-43b9-a83a-37fdf5388eda/disk /var/lib/nova/instances/46ff755f1780407bbce1939b6971730c.test > /root/strace.out 2>&1 & " it completes successfully.   

Dave

Comment 13 Kevin Wolf 2017-12-21 18:28:07 UTC
After I had a chance to look at a core dump from David's customer, this seems to be a problem that we already have a fix for in qemu-kvm-rhev-2.9.0-16.el7_4.12.

What led me to this conclusion is that we have a single active coroutine in convert_do_copy(), and the only request in it is stuck while we have a ThreadPoolElement that already is in the THREAD_DONE state, but still in the list of thread pool requests. This means that the worker function has completed, but the callback never arrived. This is the same pattern as seen in bug 1513362.


(gdb) p *s
$1 = {src = 0x561db9196050, src_sectors = 0x561db9196060, src_num = 1, total_sectors = 429916160, allocated_sectors = 47303384, allocated_done = 30076272, sector_num = 153985792, 
  wr_offs = 153984768, status = BLK_DATA, sector_next_status = 153985792, target = 0x561db91ea3c0, has_zero_init = true, compressed = false, target_has_backing = false, 
  wr_in_order = true, min_sparse = 8, cluster_sectors = 128, buf_sectors = 4096, num_coroutines = 8, running_coroutines = 8, co = {0x561db996ab40, 0x561db996ac80, 0x561db996adc0, 
    0x561db996af00, 0x561db996b040, 0x561db996b180, 0x561db996b2c0, 0x561db996b400, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, wait_sector_num = {153985152, 153985664, 153985408, 153984896, 
    153985024, 153985536, 153985280, -1, 0, 0, 0, 0, 0, 0, 0, 0}, lock = {locked = 0, ctx = 0x0, from_push = {slh_first = 0x0}, to_pop = {slh_first = 0x0}, handoff = 0, sequence = 0, 
    holder = 0x0}, ret = -115}


(gdb) p *s.target.root.bs.aio_context.thread_pool.head.lh_first
$13 = {common = {aiocb_info = 0x561db7bd9750 <thread_pool_aiocb_info>, bs = 0x0, cb = 0x561db7950ad0 <thread_pool_co_cb>, opaque = 0x7fadd4697910, refcnt = 1}, pool = 0x561db9228000, 
  func = 0x561db78df850 <aio_worker>, arg = 0x561db9172f00, state = THREAD_DONE, ret = 0, reqs = {tqe_next = 0x0, tqe_prev = 0x0}, all = {le_next = 0x0, le_prev = 0x561db9228098}}

*** This bug has been marked as a duplicate of bug 1513362 ***

Comment 14 shivapriya.o.hiremath 2018-02-02 21:57:55 UTC
We are facing the same issue in OSP 10 deployment where the spawning of a huge VM gets stuck. We would want to know how to get the custom build with the patch mentioned in this bugzilla. 

We have downloaded a source RPM (.src.rpm) from http://ftp.redhat.com/pub/redhat/linux/enterprise/7Server/en/RHOS/SRPMS/, specifically qemu-kvm-rhev-2.9.0-16.el7_4.13.src.rpm.

Since this is a source RPM, we are yet to build the RPM from this file. We followed through the steps https://wiki.centos.org/HowTos/RebuildSRPM on how to build source RPMs, including installing dependencies, such as gcc and kernel-headers, but there are a ton of dependencies. 

We have used the 'yum-builddep <src rpm>' command to install some of the dependencies, but there are yet other packages that aren't available. These are the ff.:
•	bluez-libs-devel
•	brlapi-devel
•	gperftools-devel
•	libfdt-devel >= 1.4.3
•	lbiscsi-devel
•	libseccomp-devel >= 2.3.0
•	libssh2-devel
•	lzo-devel
•	pciutils-devel
•	snapp-devel

Can you guide us on how to add these dependencies on RHEL OSP and let us know if we are missing any repositories?


Note You need to log in before you can comment on or make changes to this bug.