Bug 1667530

Summary:	qemu-img frozen
Product:	Red Hat Enterprise Linux 7	Reporter:	Mircea Vutcovici <mircea.vutcovici>
Component:	qemu-kvm-rhev	Assignee:	Hanna Czenczek <hreitz>
Status:	CLOSED DUPLICATE	QA Contact:	nlevinki <nlevinki>
Severity:	urgent	Docs Contact:
Priority:	high
Version:	7.5	CC:	hreitz, knoel, kwolf, mircea.vutcovici, pkovacs, virt-maint
Target Milestone:	rc
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-05-13 19:56:53 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Mircea Vutcovici 2019-01-18 17:47:36 UTC

Description of problem:
"openstack volume create --image" will run qemu-img to copy the content of the image to the destination volume. The problem is that qemu-img is frozen [3][5].
The image size is 1.5T but only 1.4G are allocated [6].

[1]
[root@controller0 rottejo]# ps -ef | grep qemu-img
root       61536  669672  0 01:55 ?        00:00:00 sudo cinder-rootwrap /etc/cinder/rootwrap.conf qemu-img convert -t none -f qcow2 -O raw /var/lib/cinder/conversion/tmpHSGOBO /dev/dm-7
root       61557   61536  0 01:55 ?        00:00:00 /usr/bin/python2 /usr/bin/cinder-rootwrap /etc/cinder/rootwrap.conf qemu-img convert -t none -f qcow2 -O raw /var/lib/cinder/conversion/tmpHSGOBO /dev/dm-7
root       61558   61557  0 01:55 ?        00:00:08 /bin/qemu-img convert -t none -f qcow2 -O raw /var/lib/cinder/conversion/tmpHSGOBO /dev/dm-7
root      958916  898308  0 22:38 pts/0    00:00:00 grep --color=auto qemu-img

[2]
[root@controller0 rottejo]# strace -y -p 61557 -f 
strace: Process 61557 attached
wait4(61558, ^Cstrace: Process 61557 detached
 <detached ...>
[3]
[root@controller0 rottejo]# strace -y -p 61558 -f 
strace: Process 61558 attached
ppoll([{fd=4<anon_inode:[signalfd]>, events=POLLIN}, {fd=6<anon_inode:[eventfd]>, events=POLLIN}, {fd=8<anon_inode:[eventfd]>, events=POLLIN}, {fd=9<anon_inode:[eventfd]>, events=POLLIN}], 4, NULL, NULL, 8^Cstrace: Process 61558 detached
 <detached ...>

[4]
[root@controller0 rottejo]# pstack  61557
#0  0x00007f8854f4113c in waitpid () from /lib64/libpthread.so.0
#1  0x00007f8855269ed8 in posix_waitpid () from /lib64/libpython2.7.so.1.0
#2  0x00007f88552330ca in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#3  0x00007f8855234efd in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#4  0x00007f88552323fc in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#5  0x00007f885523257d in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#6  0x00007f8855234efd in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#7  0x00007f88552323fc in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#8  0x00007f8855234efd in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#9  0x00007f88552323fc in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#10 0x00007f8855234efd in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#11 0x00007f8855235002 in PyEval_EvalCode () from /lib64/libpython2.7.so.1.0
#12 0x00007f885524e43f in run_mod () from /lib64/libpython2.7.so.1.0
#13 0x00007f885524f5fe in PyRun_FileExFlags () from /lib64/libpython2.7.so.1.0
#14 0x00007f8855250889 in PyRun_SimpleFileExFlags () from /lib64/libpython2.7.so.1.0
#15 0x00007f8855261a3f in Py_Main () from /lib64/libpython2.7.so.1.0
#16 0x00007f8854487c05 in __libc_start_main () from /lib64/libc.so.6
#17 0x000000000040071e in _start ()

[5]
[root@controller0 rottejo]# pstack 61558
#0  0x00007f4aa9577aff in ppoll () from /lib64/libc.so.6
#1  0x000055ffde52158b in qemu_poll_ns ()
#2  0x000055ffde522378 in main_loop_wait ()
#3  0x000055ffde46bfa3 in img_convert ()
#4  0x000055ffde4653a9 in main ()

[6]
[root@controller0 ~]# qemu-img info /var/lib/glance/images/9d3a37b7-2060-41fe-9fbe-063aae72d36e
image: /var/lib/glance/images/9d3a37b7-2060-41fe-9fbe-063aae72d36e
file format: qcow2
virtual size: 1.5T (1699733307392 bytes)
disk size: 1.4G
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false
[root@controller0 ~]# ls -lah /var/lib/glance/images/9d3a37b7-2060-41fe-9fbe-063aae72d36e 
-rw-r-----. 1 glance glance 1.4G Sep 18 21:23 /var/lib/glance/images/9d3a37b7-2060-41fe-9fbe-063aae72d36e
[root@controller0 ~]# du -h /var/lib/glance/images/9d3a37b7-2060-41fe-9fbe-063aae72d36e
1.4G	/var/lib/glance/images/9d3a37b7-2060-41fe-9fbe-063aae72d36e


Version-Release number of selected component (if applicable):
qemu-img-rhev-2.9.0-10.el7.x86_64

How reproducible:
10-20% of time

Steps to Reproduce:
openstack volume create --image ...

Actual results:
openstack volume is in "downloading" state


Expected results:
openstack volume is in "available" state

Comment 3 Hanna Czenczek 2019-01-28 16:19:34 UTC

Hi,

Can you tell me what filesystem the source image is on?
Unfortunately, I don't know openstack too well, so do the two paths you show (/var/lib/glance/images/9d3a37b7-2060-41fe-9fbe-063aae72d36e and /var/lib/cinder/conversion/tmpHSGOBO) point to the same image?  (So is one a link to the other?)

Also, of course the obligatory question, can you try with a newer version of qemu-img?  (And, related, does the issue appear if you invoke qemu-img manually?)

Max

Comment 4 Mircea Vutcovici 2019-01-30 00:52:22 UTC

Filesystem is xfs.
The content of /var/lib/glance/images/9d3a37b7-2060-41fe-9fbe-063aae72d36e and /var/lib/cinder/conversion/tmpHSGOBO are the same. Just focus on /var/lib/cinder/conversion/tmpHSGOBO. The other one is in glance and it is not the source of the qemu-img. Sorry for confusion. The image from glance is copied to the compute machine.

We have hit the same issue again.
We tried to reproduce the problem on the same machine when qemu-img was frozen and we run it again on the exact same source image, and as destination we generated a new file in the same folder. It took less than 2 minutes the conversion of a 8GB/50GB image (on_disk/virtual size). Meantime the frozen qemu-img process remain frozen.

See SF case 02304013 Comment: #29   for details see also the image details in attachment #22 [details]

Comment 5 Hanna Czenczek 2019-01-30 17:21:10 UTC

I've tried to recreate a qcow2 image with a similar qemu-img map output to the one shown in attachment 22 [details] of the support case, and I couldn't reproduce the issue so far (but I'll continue trying).  I don't think the FS is the issue as the file isn't fragmented much and I don't know of any issues with xfs right now.  Maybe the strace log will help.

So as far as I understand the issue never appeared when running qemu-img directly?  That does not make things easier, unfortunately...

I don't know of any fixes related to this issue, but still, qemu 2.9.0 is 2 years old, so it may be possible that this issue has been fixed in the meantime.  7.6 has 2.12.0, at least (which less than one year old), so is there any possibility to at least test that?

Max

Comment 6 Mircea Vutcovici 2019-01-30 17:47:14 UTC

It's not a performance issue. It looks like deadlock. The qemu-img process is not using any I/O or CPU at all. The rest of the machine is idle too (both CPU and disk I/O).
The qemu-img process is waiting for changes on a couple of file descriptor using ppoll() syscall:
ppoll([{fd=4<anon_inode:[signalfd]>, events=POLLIN}, {fd=6<anon_inode:[eventfd]>, events=POLLIN}, {fd=8<anon_inode:[eventfd]>, events=POLLIN}, {fd=9<anon_inode:[eventfd]>, events=POLLIN}], 4, NULL, NULL, 8

Comment 9 Hanna Czenczek 2019-02-04 19:30:47 UTC

Hi,

A core dump sounds interesting, can you tell me where to find it?  (I didn't see any such attachment to the support case, and I couldn't find any core dumps in the SOS report files).
Also, do you know when in the conversion process the hang occurs?  From the file with the image details it looks like it's at the end (because the raw image is shown to contain probably all of the converted data), but maybe the image in there is just the output from the manual conversion.

Thanks,

Max

Comment 10 Mircea Vutcovici 2019-02-05 06:14:35 UTC

Hi Max! You can find a core dump in SF case 02299700. Please let me know if you have access to it. I do not know when the hang occurs. It is not happening very often, but when it is happening the only workaround we found so far is to kill qemu-img process that is idle (buth CPU and I/O).

Comment 11 Hanna Czenczek 2019-02-15 21:32:38 UTC

Hi,

Sorry, while I was able to download the core file, I did not really get any info from it.  Pairing it up with a freshly built version of 2.9.0-10 did not work, nor did taking a 2.9.0-16 binary.  Trying to manually dissect the core file did not go very far either...

How often does this issue occur?  I have seen a couple of support cases that go into that direction, although some then seemed to turn out to be something else.
I am asking because I evidently have a hard time reproding it, and it does not seem to be that critical...

Max

Comment 12 Hanna Czenczek 2019-02-15 21:43:27 UTC

I just had a look into the strace provided in SF 02304013 – there are pwrite64() calls starting 5 seconds after invocation for the whole 1.5 minutes until the trace ends.  So that doesn't seem to be a trace from a run where it hung.

Max

Comment 13 Mircea Vutcovici 2019-02-16 21:14:51 UTC

The strace was made with similar parameters that the command that was hung, to see if I can reproduce the issue. I couldn't reproduce it. The issue is intermittent, and this is why I think it is a deadlock issue.

Comment 18 Hanna Czenczek 2019-04-17 19:12:44 UTC

I’ve found BZ 1524770 today which looks very much related.  It also has a reproducer (BZ 1524770 comment 11) which however didn’t work for me so far (no hangs in more than 50k iterations).

It too has been reported for 2.9.0-10, and apparently there is a fix for that issue in 2.9.0-16.

Max

Comment 19 Hanna Czenczek 2019-05-13 17:45:43 UTC

Is there anything preventing the customer from updating to the current release of 2.9.0, which is 2.9.0-16, where BZ 1524770 was fixed?

Max

Comment 20 Hanna Czenczek 2019-05-13 19:56:53 UTC

I have finally been able to get an environment with 2.9.0-10 in it (and its debugging symbols), and from it I can see that there are in-flight requests to the target image:

(gdb) p $dst->filename
$18 = "/dev/dm-7", '\000' <repeats 4086 times>
(gdb) p $dst->drv
$19 = (BlockDriver *) 0x5622a47c8460 <bdrv_raw>

(gdb) p $dst->in_flight
$20 = 2
(gdb) p $dst->file->bs->in_flight
$21 = 1

BZ 1524770 comment 13 says the problem is that there is a request in the thread pool that is in THREAD_DONE state.  The same is true here:
(gdb) p $dst.aio_context.thread_pool.head.lh_first.state
$37 = THREAD_DONE

BZ 1508886 (the original BZ) explains further that the associated ThreadPoolCo object still has ret == -115, which stands for EINPROGRESS:
(gdb) p ((ThreadPoolCo *)$dst.aio_context.thread_pool.head.lh_first.common.opaque).ret
$38 = -115

And that the completion BH is not scheduled:
(gdb) p $dst.aio_context.thread_pool.completion_bh.scheduled
$35 = false

(And this is the only request in the thread pool, by the way:
(gdb) p $dst.aio_context.thread_pool.request_list.tqh_first 
$44 = (struct ThreadPoolElement *) 0x0
(gdb) p $dst.aio_context.thread_pool.head.lh_first.all.le_next 
$45 = (struct ThreadPoolElement *) 0x0
)


So it all looks exactly like BZ 1524770 and BZ 1508886 (BZ 1513362 for 2.9.0).  This problem was fixed in 2.9.0-16.el7_4.12, so I’ll close this BZ (as far as I understand, only 2.9.0-10 was reported broken). 

Please reopen this BZ if newer qemu-img versions still have this problem.

Max

*** This bug has been marked as a duplicate of bug 1513362 ***

Comment 21 Red Hat Bugzilla 2023-09-14 04:45:19 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days