Bug 1454582 - Qemu crashes when start guest with qcow2 nbd image
Summary: Qemu crashes when start guest with qcow2 nbd image
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.4
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Eric Blake
QA Contact: Suqin Huang
URL:
Whiteboard:
Keywords: Regression
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-05-23 05:56 UTC by yanqzhan@redhat.com
Modified: 2017-08-02 04:41 UTC (History)
17 users (show)

(edit)
Clone Of:
(edit)
Last Closed: 2017-08-02 04:41:00 UTC


Attachments (Terms of Use)
all_threads_backtrace (12.30 KB, text/plain)
2017-05-23 05:56 UTC, yanqzhan@redhat.com
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:2392 normal SHIPPED_LIVE Important: qemu-kvm-rhev security, bug fix, and enhancement update 2017-08-01 20:04:36 UTC
Red Hat Bugzilla 1458725 None CLOSED Unable to migrate with nbd block 2019-05-21 20:27 UTC

Internal Trackers: 1458725

Description yanqzhan@redhat.com 2017-05-23 05:56:30 UTC
Created attachment 1281335 [details]
all_threads_backtrace

Description of problem:
Start a guest with qcow2 nbd image, after a while qemu crashes and guest down

Version-Release number of selected component:
qemu-kvm-rhev-2.9.0-5.el7.x86_64
libvirt-3.2.0-5.el7.x86_64


How reproducible:
80%

Steps to Reproduce:
1.Setup a nbd server:
# mkdir /tmp/nbdSvr
# cd /tmp/nbdSvr
# wget http://10.66.*.*/libvirt-CI-resources/RHEL-7.4-x86_64-latest.qcow2
# qemu-nbd -t -p 30001 --format=raw  /tmp/nbdSvr/RHEL-7.4-x86_64-latest.qcow2

2.On client side, prepare a guest xml with nbd image:
# cat V-nbd.xml|grep 'disk t' -A8
    <disk type='network' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source protocol='nbd'>
        <host name='{server_host}' port='30001'/>
      </source>
      <target dev='sda' bus='scsi'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>

3.Create the guest
# virsh create V-nbd.xml
Domain V created from V-nbd.xml

# virsh list
 Id    Name                           State
----------------------------------------------------
 31    V                              running


4.Wait for a while, check guest status, libvirtd.log and abrt-cli list:
# virsh list
 Id    Name                           State
----------------------------------------------------

# tail -f /var/log/libvirt/libvirtd.log|grep ' error '
2017-05-22 15:41:17.058+0000: 1253: error : qemuMonitorIO:697 : internal error: End of file from qemu monitor
2017-05-22 15:41:17.314+0000: 2809: info : virDBusCall:1558 : DBUS_METHOD_ERROR: 'org.freedesktop.machine1.Manager.TerminateMachine' on '/org/freedesktop/machine1' at 'org.freedesktop.machine1' error org.freedesktop.machine1.NoSuchMachine: No machine 'qemu-31-V' known

# abrt-cli list|head
id ffe2c176aabbfc45d74085e40fafcc1b8082d5ab
reason:         qemu-kvm killed by SIGABRT
time:           Mon 22 May 2017 11:41:15 PM CST
cmdline:        /usr/libexec/qemu-kvm -name guest=V,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-31-V/master-key.aes -machine pc-i440fx-rhel7.2.0,accel=kvm,usb=off,vmport=off,dump-guest-core=off -cpu Penryn,vme=on,x2apic=off,hypervisor=off -m 4096 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid c3a57d2a-7d75-493a-a699-786ac2fd54b3 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-31-V/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x5.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x5 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x5.0x1 -device ich9-usb-uhci3,masterbus=usb.0,fi!
 rstport=4,bus=pci.0,addr=0x5.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x7 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive file=nbd:{server_host}:30001,format=qcow2,if=none,id=drive-scsi0-0-0-0 -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 -netdev tap,fd=27,id=hostnet0,vhost=on,vhostfd=29 -device virtio-net-pci,host_mtu=1500,netdev=hostnet0,id=net0,mac=52:54:00:ad:6a:22,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-31-V/org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -device usb-tabl!
 et,id=input0,bus=usb.0,port=1 -spice port=5900,addr=127.0.0.1,disable-
ticketing,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -msg timestamp=on
package:        qemu-kvm-rhev-2.9.0-5.el7
uid:            107 (qemu)
Directory:      /var/spool/abrt/ccpp-2017-05-22-23:41:15-24171
Run 'abrt-cli report /var/spool/abrt/ccpp-2017-05-22-23:41:15-24171' for creating a case in Red Hat Customer Portal

(gdb) bt
#0  0x00007f71d80231f7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f71d80248e8 in __GI_abort () at abort.c:90
#2  0x00007f71d801c266 in __assert_fail_base (fmt=0x7f71d816ee68 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x55bb65c9ff9e "s->in_flight < 16", file=file@entry=0x55bb65c9ff8b "block/nbd-client.c", line=line@entry=186, function=function@entry=0x55bb65ca0110 <__PRETTY_FUNCTION__.26302> "nbd_coroutine_start")
    at assert.c:92
#3  0x00007f71d801c312 in __GI___assert_fail (assertion=assertion@entry=0x55bb65c9ff9e "s->in_flight < 16", file=file@entry=0x55bb65c9ff8b "block/nbd-client.c", line=line@entry=186, function=function@entry=0x55bb65ca0110 <__PRETTY_FUNCTION__.26302> "nbd_coroutine_start") at assert.c:101
#4  0x000055bb65b614fa in nbd_coroutine_start (s=s@entry=0x55bb6831c400, request=0x7f70c04fe9c0)
    at block/nbd-client.c:186
#5  0x000055bb65b617fc in nbd_client_co_pwritev (bs=0x55bb68373400, offset=889225216, bytes=<optimized out>, qiov=0x7f70c04fecd0, flags=<optimized out>) at block/nbd-client.c:254
#6  0x000055bb65b5cc81 in bdrv_driver_pwritev (bs=bs@entry=0x55bb68373400, offset=offset@entry=889225216, bytes=bytes@entry=16384, qiov=qiov@entry=0x7f70c04fecd0, flags=flags@entry=0) at block/io.c:888
#7  0x000055bb65b5e036 in bdrv_aligned_pwritev (req=req@entry=0x7f70c04febb0, offset=offset@entry=889225216, bytes=bytes@entry=16384, align=align@entry=1, qiov=0x7f70c04fecd0, flags=flags@entry=0, child=0x55bb68251bd0)
    at block/io.c:1396
#8  0x000055bb65b5eb72 in bdrv_co_pwritev (child=0x55bb68251bd0, offset=889225216, bytes=16384, qiov=qiov@entry=0x7f70c04fecd0, flags=flags@entry=0) at block/io.c:1647
#9  0x000055bb65b34076 in qcow2_co_pwritev (bs=0x55bb68370000, offset=2456059904, bytes=16384, qiov=0x55bb6824ea00, flags=<optimized out>) at block/qcow2.c:1663
#10 0x000055bb65b5cc81 in bdrv_driver_pwritev (bs=bs@entry=0x55bb68370000, offset=offset@entry=2456059904, bytes=by---Type <return> to continue, or q <return> to quit---
tes@entry=16384, qiov=qiov@entry=0x55bb6824ea00, flags=flags@entry=0) at block/io.c:888
#11 0x000055bb65b5e036 in bdrv_aligned_pwritev (req=req@entry=0x7f70c04feec0, offset=offset@entry=2456059904, bytes=bytes@entry=16384, align=align@entry=1, qiov=0x55bb6824ea00, flags=flags@entry=0, child=0x55bb68252030)
    at block/io.c:1396
#12 0x000055bb65b5eb72 in bdrv_co_pwritev (child=0x55bb68252030, offset=offset@entry=2456059904, bytes=bytes@entry=16384, qiov=qiov@entry=0x55bb6824ea00, flags=0) at block/io.c:1647
#13 0x000055bb65b5052b in blk_co_pwritev (blk=0x55bb68298000, offset=2456059904, bytes=16384, qiov=0x55bb6824ea00, flags=<optimized out>) at block/block-backend.c:995
#14 0x000055bb65b505ba in blk_aio_write_entry (opaque=0x55bb68251c20) at block/block-backend.c:1186
#15 0x000055bb65be0f4a in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>)
    at util/coroutine-ucontext.c:79
#16 0x00007f71d8034d40 in __start_context () at /usr/lib64/libc-2.17.so
#17 0x00007ffd10eccb10 in  ()
#18 0x0000000000000000 in  ()


Actual results:
As in step4, when start a guest with qcow2 nbd image, after a while qemu crashes and guest down

Expected results:
Qemu should not crash, guest should keep running status.

Additional info:
Can not reproduce if setup nbd server with "--format=qcow2", and correspondingly  use "type='raw'/>" in guest xml

Comment 2 Eric Blake 2017-05-26 19:29:54 UTC
Paolo, does this ring a bell with any of your recent changes to NBD use of coroutines?

Comment 3 Eric Blake 2017-05-26 19:49:59 UTC
While searching for upstream messages mentioning nbd-client.c since 2.9.0, I noticed Fam had a v1 patch that touched it:
https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg01019.html

but it appears that the v3 version (which got merged at aa388ddc) solved that problem a different way. Still, it may be a coroutine race that we have to account for.

Comment 4 Eric Blake 2017-05-30 15:46:10 UTC
Vladimir also just posted a bunch of NBD cleanups; I'm not sure if any of them are relevant but I'm investigating:
https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg06755.html

Comment 6 Eric Blake 2017-05-31 21:21:32 UTC
And it looks like Paolo has the win for the patch that matters:
https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg07010.html

Comment 7 Paolo Bonzini 2017-06-01 10:44:06 UTC
Ok, I'll include that patch in my pull request.

Comment 8 Han Han 2017-06-05 08:54:20 UTC
This works well on qemu-kvm-rhev-2.6.0-28.el7_3.10.x86_64. Marked as regression.

Comment 10 Eric Blake 2017-06-11 03:23:52 UTC
Upstream should be fixed as of Paolo's v6 pull request
https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg01841.html

Comment 12 Miroslav Rezanina 2017-06-13 16:35:35 UTC
Fix included in qemu-kvm-rhev-2.9.0-10.el7

Comment 14 Suqin Huang 2017-06-15 04:56:42 UTC
guest works well when run following cases
unattended_install, reboot(repeat 10 times), shutdown

Packages:

qemu-kvm-rhev-2.9.0-10.el7.x86_64

Comment 15 Suqin Huang 2017-06-15 04:57:32 UTC
update bug status to verified according to comment14

Comment 17 errata-xmlrpc 2017-08-02 04:41:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392


Note You need to log in before you can comment on or make changes to this bug.