1524770 – qemu-img convert hangs on converting qcow2 to raw

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1524770 - qemu-img convert hangs on converting qcow2 to raw

Summary: qemu-img convert hangs on converting qcow2 to raw

Keywords:
Status:	CLOSED DUPLICATE of bug 1513362
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	qemu-kvm-rhev
Sub Component:
Version:	7.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	pre-dev-freeze
Target Release:	---
Assignee:	Kevin Wolf
QA Contact:	Ping Li
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-12-12 02:00 UTC by KOSAL RAJ I
Modified:	2021-12-10 15:29 UTC (History)
CC List:	21 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-12-21 18:28:07 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description KOSAL RAJ I 2017-12-12 02:00:26 UTC

Description of problem:
On some of the hypervisors, converting a qcow2 image to raw as part of instance creation is hanging, with the instance remaining in the 'BUILD' state.  attempting to delete the instance ends up with instance/stack stuck in the 'deleting' state. the only workaround has been to restart openstack-nova-compute on the afflicted hypervisor

Version-Release number of selected component (if applicable):
RHOSP 10

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 8 Gu Nini 2017-12-19 10:17:29 UTC

Ping,

Could you help to have a try with the bug on latest rhel7.4z versions?

Comment 11 Brian Fife 2017-12-21 02:24:21 UTC

curl -O  http://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img
i=1; while true; do echo $i; qemu-img convert -O raw cirros-0.4.0-x86_64-disk.img cirros-0.4.0-x86_64-disk.raw -f qcow2; i=$[$i+1]; done

It occurred on iteration 2604

ps -ef | grep convert
root       24322    8602  0 21:13 pts/10   00:00:00 qemu-img convert -O raw cirros-0.4.0-x86_64-disk.img cirros-0.4.0-x86_64-disk.raw -f qcow2
[root@nspcloud-compute-43 ~]# pstack 24322
#0  0x00007ff16a943aff in ppoll () from /lib64/libc.so.6
#1  0x000056211520458b in qemu_poll_ns ()
#2  0x0000562115205378 in main_loop_wait ()
#3  0x000056211514efa3 in img_convert ()
#4  0x00005621151483a9 in main ()

Comment 12 David Hill 2017-12-21 16:26:48 UTC

Hi guys,

    We're hitting this exact same problem with :

qemu-img convert -f qcow2 -O qcow2 /var/lib/nova/instances/99bea639-a7b4-43b9-a83a-37fdf5388eda/disk /var/lib/nova/instances/46ff755f1780407bbce1939b6971730c.test 

if we attach gdb to the process we see the following:

(gdb) bt
#0  0x00007fe62138aaff in ppoll () from /lib64/libc.so.6
#1  0x0000558b9315b58b in qemu_poll_ns ()
#2  0x0000558b9315c378 in main_loop_wait ()
#3  0x0000558b930a5fa3 in img_convert ()
#4  0x0000558b9309f3a9 in main ()
(gdb) 

If we run it with " strace -fffff qemu-img convert -f qcow2 -O qcow2 /var/lib/nova/instances/99bea639-a7b4-43b9-a83a-37fdf5388eda/disk /var/lib/nova/instances/46ff755f1780407bbce1939b6971730c.test > /root/strace.out 2>&1 & " it completes successfully.   

Dave

Comment 13 Kevin Wolf 2017-12-21 18:28:07 UTC

After I had a chance to look at a core dump from David's customer, this seems to be a problem that we already have a fix for in qemu-kvm-rhev-2.9.0-16.el7_4.12.

What led me to this conclusion is that we have a single active coroutine in convert_do_copy(), and the only request in it is stuck while we have a ThreadPoolElement that already is in the THREAD_DONE state, but still in the list of thread pool requests. This means that the worker function has completed, but the callback never arrived. This is the same pattern as seen in bug 1513362.


(gdb) p *s
$1 = {src = 0x561db9196050, src_sectors = 0x561db9196060, src_num = 1, total_sectors = 429916160, allocated_sectors = 47303384, allocated_done = 30076272, sector_num = 153985792, 
  wr_offs = 153984768, status = BLK_DATA, sector_next_status = 153985792, target = 0x561db91ea3c0, has_zero_init = true, compressed = false, target_has_backing = false, 
  wr_in_order = true, min_sparse = 8, cluster_sectors = 128, buf_sectors = 4096, num_coroutines = 8, running_coroutines = 8, co = {0x561db996ab40, 0x561db996ac80, 0x561db996adc0, 
    0x561db996af00, 0x561db996b040, 0x561db996b180, 0x561db996b2c0, 0x561db996b400, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, wait_sector_num = {153985152, 153985664, 153985408, 153984896, 
    153985024, 153985536, 153985280, -1, 0, 0, 0, 0, 0, 0, 0, 0}, lock = {locked = 0, ctx = 0x0, from_push = {slh_first = 0x0}, to_pop = {slh_first = 0x0}, handoff = 0, sequence = 0, 
    holder = 0x0}, ret = -115}


(gdb) p *s.target.root.bs.aio_context.thread_pool.head.lh_first
$13 = {common = {aiocb_info = 0x561db7bd9750 <thread_pool_aiocb_info>, bs = 0x0, cb = 0x561db7950ad0 <thread_pool_co_cb>, opaque = 0x7fadd4697910, refcnt = 1}, pool = 0x561db9228000, 
  func = 0x561db78df850 <aio_worker>, arg = 0x561db9172f00, state = THREAD_DONE, ret = 0, reqs = {tqe_next = 0x0, tqe_prev = 0x0}, all = {le_next = 0x0, le_prev = 0x561db9228098}}

*** This bug has been marked as a duplicate of bug 1513362 ***

Comment 14 shivapriya.o.hiremath 2018-02-02 21:57:55 UTC

We are facing the same issue in OSP 10 deployment where the spawning of a huge VM gets stuck. We would want to know how to get the custom build with the patch mentioned in this bugzilla. 

We have downloaded a source RPM (.src.rpm) from http://ftp.redhat.com/pub/redhat/linux/enterprise/7Server/en/RHOS/SRPMS/, specifically qemu-kvm-rhev-2.9.0-16.el7_4.13.src.rpm.

Since this is a source RPM, we are yet to build the RPM from this file. We followed through the steps https://wiki.centos.org/HowTos/RebuildSRPM on how to build source RPMs, including installing dependencies, such as gcc and kernel-headers, but there are a ton of dependencies. 

We have used the 'yum-builddep <src rpm>' command to install some of the dependencies, but there are yet other packages that aren't available. These are the ff.:
•	bluez-libs-devel
•	brlapi-devel
•	gperftools-devel
•	libfdt-devel >= 1.4.3
•	lbiscsi-devel
•	libseccomp-devel >= 2.3.0
•	libssh2-devel
•	lzo-devel
•	pciutils-devel
•	snapp-devel

Can you guide us on how to add these dependencies on RHEL OSP and let us know if we are missing any repositories?

Note You need to log in before you can comment on or make changes to this bug.

berrange
brian.fife
coli
dasmith
dhill
eglynn
kchamart
kiyyappa
knoel
mbooth
michen
ngu
pingl
rbryant
sbauza
sferdjao
sgordon
shivapriya.o.hiremath
srevivo
virt-maint
vromanso