Bug 1266936 - QEMU crashes during 'drive-mirror' with error "Co-routine re-entered recursively"
Summary: QEMU crashes during 'drive-mirror' with error "Co-routine re-entered recursi...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: qemu
Version: 22
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
Assignee: Fedora Virtualization Maintainers
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-09-28 15:17 UTC by Kashyap Chamarthy
Modified: 2015-10-23 17:25 UTC (History)
9 users (show)

Fixed In Version: qemu-2.3.1-7.fc22
Clone Of:
Environment:
Last Closed: 2015-10-23 17:25:05 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
libvirtd.log on source host (204.66 KB, text/plain)
2015-09-28 15:20 UTC, Kashyap Chamarthy
no flags Details

Description Kashyap Chamarthy 2015-09-28 15:17:03 UTC
Description of problem
----------------------

When performing live block migration with "virsh --copy-storage-inc" (this
internally uses QEMU QMP command 'drive-mirror'), guest crashes:

    $ virsh migrate --verbose \
        --copy-storage-inc --p2p \
        --live cvm2 qemu+ssh://root@devstack1/system
     error: internal error: guest unexpectedly quit


Relevant QEMU error, from /var/log/libvirt/qemu/cvm2.log:

    [. . .]
    Co-routine re-entered recursively
    2015-09-28 10:45:26.232+0000: shutting down


Version
-------

    libvirt-daemon-driver-qemu-1.2.13.1-2.fc22.x86_64
    qemu-system-x86-2.3.1-1.fc22.x86_64


How reproducible: Consistently.


Steps to Reproduce
------------------

(1) Setup two hosts such that password-less SSH via sudo works just fine:

(2) Since I'm testing a trusted network, I'm using the below insecure
    configs (for testing only!) in destination libvirtd.conf:
        
        [. . .]
        listen_tls = 0
        listen_tcp = 1
        auth_tcp = "none"
        [. . .]

(3) Perform live block migration, with non-shared storage:

    $ virsh migrate --verbose \
        --copy-storage-inc --p2p --live cvm2 \
        qemu+ssh://root@desthost/system

And, observe the failure on stderr, libvirtd.log & in the QEMU log of
the guest.


Actual results
--------------

Guest on sources shuts down, consequently live block migration with
non-shared storage fails:

    `virsh` stderr: "error: internal error: guest unexpectedly quit"


Expected results
----------------

'drive-mirror' should proceed successfully, resulting in a succesful
libvirt live block migration.


Contextual libvirtd.log
-----------------------

With log filters:
    log_filters="3:event 3:file 1:util 1:security 1:qemu 1:libvirt"
    log_outputs="1:file:/var/log/libvirt/libvirtd.log"

$ less /var/log/libvirt/libvirtd.log
[. . .]
2015-09-28 10:45:26.231+0000: 2537: error : qemuMonitorIO:662 : internal error: End of file from monitor
2015-09-28 10:45:26.231+0000: 2537: debug : qemuMonitorIO:717 : Error on monitor internal error: End of file from monitor
2015-09-28 10:45:26.231+0000: 2537: debug : qemuMonitorIO:738 : Triggering EOF callback
2015-09-28 10:45:26.231+0000: 2537: debug : qemuProcessHandleMonitorEOF:307 : Received EOF on 0x7f2be0003bb0 'cvm2'
2015-09-28 10:45:26.231+0000: 2537: debug : qemuProcessHandleMonitorEOF:325 : Monitor connection to 'cvm2' closed without SHUTDOWN event; assuming the domain crashed
[. . .]


Error info from /var/log/libvirt/qemu/cirrvm2.log
-------------------------------------------------

[. . .]
WARNING: Image format was not specified for 'nbd://devstack1:49153/drive-virtio-disk0' and probing guessed raw.
         Automatically detecting the format is dangerous for raw images, write operations on block 0 will be restricted.
         Specify the 'raw' format explicitly to remove the restrictions.
Co-routine re-entered recursively
2015-09-28 10:45:26.232+0000: shutting down
(END)


Coredump info from systemd
--------------------------

$ coredumpctl 
TIME                            PID   UID   GID SIG PRESENT EXE
Mon 2015-09-28 06:34:52 EDT   12447   107   107   6 * /usr/bin/qemu-system-x86_64
Mon 2015-09-28 06:47:23 EDT    1049   107   107   6 * /usr/bin/qemu-system-x86_64

$ coredumpctl dump
           PID: 1049 (qemu-system-x86)
           UID: 107 (qemu)
           GID: 107 (qemu)
        Signal: 6 (ABRT)
     Timestamp: Mon 2015-09-28 06:45:22 EDT (10min ago)
  Command Line: /usr/bin/qemu-system-x86_64 -machine accel=kvm -name cvm2 -S -machine pc-i440fx-2.3,accel=kvm,usb=off -cpu Nehalem -m 1024 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid d458e6a3-fc75-4a49-9617-054aab6d0052 -nographic -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/cvm2.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x4.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x3 -drive file=/var/lib/libvirt/images/cirros-0.3.3-x86_64-disk.img,if=none,id=drive-virtio-disk0,format=qcow2 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=25,id=hostnet0,vhost=on,vhostfd=26 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:69:44:5e,bus=pci.0,addr=0x2 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/cvm2.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -device usb-tablet,id=input0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -msg timestamp=on
    Executable: /usr/bin/qemu-system-x86_64
 Control Group: /machine.slice/machine-qemu\x2dcvm2.scope
          Unit: machine-qemu\x2dcvm2.scope
         Slice: machine.slice
       Boot ID: 2b163d1422fd48c2a82b59eab13e4f96
    Machine ID: 464493d4787243c0b9c25f5085d5dc5b
      Hostname: devstack2
      Coredump: /var/lib/systemd/coredump/core.qemu-system-x86.107.2b163d1422fd48c2a82b59eab13e4f96.1049.1443437122000000.xz
       Message: Process 1049 (qemu-system-x86) of user 107 dumped core.
                
                Stack trace of thread 1049:
                #0  0x00007f05ab9d99c8 raise (libc.so.6)
                #1  0x00007f05ab9db65a abort (libc.so.6)
                #2  0x000055b5bfcdead3 qemu_coroutine_enter (qemu-system-x86_64)
                #3  0x000055b5bfcdec7a qemu_co_queue_run_restart (qemu-system-x86_64)
                #4  0x000055b5bfcdea0f qemu_coroutine_enter (qemu-system-x86_64)
                #5  0x000055b5bfcd1e23 bdrv_co_aio_rw_vector (qemu-system-x86_64)
                #6  0x000055b5bfcd1e54 bdrv_aio_readv (qemu-system-x86_64)
                #7  0x000055b5bfd0e477 mirror_run (qemu-system-x86_64)
                #8  0x000055b5bfcdf3ba coroutine_trampoline (qemu-system-x86_64)
                #9  0x00007f05ab9edfa0 n/a (libc.so.6)
Refusing to dump core to tty.

[NB: The above refusal to dump core is because I didn't have debug-info
packages.]

Comment 1 Kashyap Chamarthy 2015-09-28 15:20:39 UTC
Created attachment 1077951 [details]
libvirtd.log on source host

Comment 2 Kashyap Chamarthy 2015-09-28 15:21:21 UTC
This is fixed (merged)upstream:

    http://git.qemu.org/?p=qemu.git;a=commit;h=e424aff 
    -- mirror: Fix coroutine reentrance

    https://lists.gnu.org/archive/html/qemu-devel/2015-08/msg01524.html

Comment 3 Kashyap Chamarthy 2015-09-30 01:46:21 UTC
I applied the patch in Comment #2 and prepared an F22 QEMU scratch build[1] to test.

I can confirm that this bug is fixed:

    $ virsh migrate --verbose \
        --copy-storage-inc --p2p \
        --live cvm1 qemu+ssh://root@devstack2/system
    Migration: [100 %]


Evidence from logs
------------------

SOURCE: Successful invocation of QMP 'drive-mirror' 

[. . .]
2015-09-29 18:47:44.006+0000: 8285: debug : qemuMonitorJSONCommandWithFd:290 : Send command '{"execute":"drive-mirror","arguments":{"device":"drive-virtio-disk0","target":"nbd:devstack2:49153:exportname=drive-virtio-disk0","speed":8796093022207,"sync":"top","mode":"existing"},"id":"libvirt-11"}' for write with FD -1
[. . .]


DEST host: Successful NBD server operations.

[. . .]
2015-09-29 18:47:43.603+0000: 22665: debug : qemuMonitorJSONCommandWithFd:290 : Send command '{"execute":"nbd-server-start","arguments":{"addr":{"type":"inet","data":{"host":"::","port":"49153"}}},"id":"libvirt-13"}' for write with FD -1
2015-09-29 18:47:43.605+0000: 22665: debug : qemuMonitorJSONCommandWithFd:290 : Send command '{"execute":"nbd-server-add","arguments":{"device":"drive-virtio-disk0","writable":true},"id":"libvirt-14"}' for write with FD -1
[. . .]
2015-09-29 18:47:45.049+0000: 22661: debug : qemuMonitorJSONCommandWithFd:290 : Send command '{"execute":"nbd-server-stop","id":"libvirt-16"}' for write with FD -1
[. . .]


[1] http://koji.fedoraproject.org/koji/taskinfo?taskID=11265946

Comment 4 Fedora Update System 2015-10-08 19:58:22 UTC
qemu-2.3.1-7.fc22 has been submitted as an update to Fedora 22. https://bodhi.fedoraproject.org/updates/FEDORA-2015-d5c1048b47

Comment 5 Fedora Update System 2015-10-09 13:54:10 UTC
qemu-2.3.1-7.fc22 has been pushed to the Fedora 22 testing repository. If problems still persist, please make note of it in this bug report.
If you want to test the update, you can install it with
$ su -c 'dnf --enablerepo=updates-testing update qemu'
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-d5c1048b47

Comment 6 Fedora Update System 2015-10-23 17:24:28 UTC
qemu-2.3.1-7.fc22 has been pushed to the Fedora 22 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.