Bug 1266936

Summary: QEMU crashes during 'drive-mirror' with error "Co-routine re-entered recursively"
Product: [Fedora] Fedora Reporter: Kashyap Chamarthy <kchamart>
Component: qemuAssignee: Fedora Virtualization Maintainers <virt-maint>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 22CC: amit.shah, berrange, cfergeau, crobinso, dwmw2, itamar, pbonzini, rjones, virt-maint
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-2.3.1-7.fc22 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-10-23 17:25:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
libvirtd.log on source host none

Description Kashyap Chamarthy 2015-09-28 15:17:03 UTC
Description of problem
----------------------

When performing live block migration with "virsh --copy-storage-inc" (this
internally uses QEMU QMP command 'drive-mirror'), guest crashes:

    $ virsh migrate --verbose \
        --copy-storage-inc --p2p \
        --live cvm2 qemu+ssh://root@devstack1/system
     error: internal error: guest unexpectedly quit


Relevant QEMU error, from /var/log/libvirt/qemu/cvm2.log:

    [. . .]
    Co-routine re-entered recursively
    2015-09-28 10:45:26.232+0000: shutting down


Version
-------

    libvirt-daemon-driver-qemu-1.2.13.1-2.fc22.x86_64
    qemu-system-x86-2.3.1-1.fc22.x86_64


How reproducible: Consistently.


Steps to Reproduce
------------------

(1) Setup two hosts such that password-less SSH via sudo works just fine:

(2) Since I'm testing a trusted network, I'm using the below insecure
    configs (for testing only!) in destination libvirtd.conf:
        
        [. . .]
        listen_tls = 0
        listen_tcp = 1
        auth_tcp = "none"
        [. . .]

(3) Perform live block migration, with non-shared storage:

    $ virsh migrate --verbose \
        --copy-storage-inc --p2p --live cvm2 \
        qemu+ssh://root@desthost/system

And, observe the failure on stderr, libvirtd.log & in the QEMU log of
the guest.


Actual results
--------------

Guest on sources shuts down, consequently live block migration with
non-shared storage fails:

    `virsh` stderr: "error: internal error: guest unexpectedly quit"


Expected results
----------------

'drive-mirror' should proceed successfully, resulting in a succesful
libvirt live block migration.


Contextual libvirtd.log
-----------------------

With log filters:
    log_filters="3:event 3:file 1:util 1:security 1:qemu 1:libvirt"
    log_outputs="1:file:/var/log/libvirt/libvirtd.log"

$ less /var/log/libvirt/libvirtd.log
[. . .]
2015-09-28 10:45:26.231+0000: 2537: error : qemuMonitorIO:662 : internal error: End of file from monitor
2015-09-28 10:45:26.231+0000: 2537: debug : qemuMonitorIO:717 : Error on monitor internal error: End of file from monitor
2015-09-28 10:45:26.231+0000: 2537: debug : qemuMonitorIO:738 : Triggering EOF callback
2015-09-28 10:45:26.231+0000: 2537: debug : qemuProcessHandleMonitorEOF:307 : Received EOF on 0x7f2be0003bb0 'cvm2'
2015-09-28 10:45:26.231+0000: 2537: debug : qemuProcessHandleMonitorEOF:325 : Monitor connection to 'cvm2' closed without SHUTDOWN event; assuming the domain crashed
[. . .]


Error info from /var/log/libvirt/qemu/cirrvm2.log
-------------------------------------------------

[. . .]
WARNING: Image format was not specified for 'nbd://devstack1:49153/drive-virtio-disk0' and probing guessed raw.
         Automatically detecting the format is dangerous for raw images, write operations on block 0 will be restricted.
         Specify the 'raw' format explicitly to remove the restrictions.
Co-routine re-entered recursively
2015-09-28 10:45:26.232+0000: shutting down
(END)


Coredump info from systemd
--------------------------

$ coredumpctl 
TIME                            PID   UID   GID SIG PRESENT EXE
Mon 2015-09-28 06:34:52 EDT   12447   107   107   6 * /usr/bin/qemu-system-x86_64
Mon 2015-09-28 06:47:23 EDT    1049   107   107   6 * /usr/bin/qemu-system-x86_64

$ coredumpctl dump
           PID: 1049 (qemu-system-x86)
           UID: 107 (qemu)
           GID: 107 (qemu)
        Signal: 6 (ABRT)
     Timestamp: Mon 2015-09-28 06:45:22 EDT (10min ago)
  Command Line: /usr/bin/qemu-system-x86_64 -machine accel=kvm -name cvm2 -S -machine pc-i440fx-2.3,accel=kvm,usb=off -cpu Nehalem -m 1024 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid d458e6a3-fc75-4a49-9617-054aab6d0052 -nographic -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/cvm2.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x4.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x3 -drive file=/var/lib/libvirt/images/cirros-0.3.3-x86_64-disk.img,if=none,id=drive-virtio-disk0,format=qcow2 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=25,id=hostnet0,vhost=on,vhostfd=26 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:69:44:5e,bus=pci.0,addr=0x2 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/cvm2.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -device usb-tablet,id=input0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -msg timestamp=on
    Executable: /usr/bin/qemu-system-x86_64
 Control Group: /machine.slice/machine-qemu\x2dcvm2.scope
          Unit: machine-qemu\x2dcvm2.scope
         Slice: machine.slice
       Boot ID: 2b163d1422fd48c2a82b59eab13e4f96
    Machine ID: 464493d4787243c0b9c25f5085d5dc5b
      Hostname: devstack2
      Coredump: /var/lib/systemd/coredump/core.qemu-system-x86.107.2b163d1422fd48c2a82b59eab13e4f96.1049.1443437122000000.xz
       Message: Process 1049 (qemu-system-x86) of user 107 dumped core.
                
                Stack trace of thread 1049:
                #0  0x00007f05ab9d99c8 raise (libc.so.6)
                #1  0x00007f05ab9db65a abort (libc.so.6)
                #2  0x000055b5bfcdead3 qemu_coroutine_enter (qemu-system-x86_64)
                #3  0x000055b5bfcdec7a qemu_co_queue_run_restart (qemu-system-x86_64)
                #4  0x000055b5bfcdea0f qemu_coroutine_enter (qemu-system-x86_64)
                #5  0x000055b5bfcd1e23 bdrv_co_aio_rw_vector (qemu-system-x86_64)
                #6  0x000055b5bfcd1e54 bdrv_aio_readv (qemu-system-x86_64)
                #7  0x000055b5bfd0e477 mirror_run (qemu-system-x86_64)
                #8  0x000055b5bfcdf3ba coroutine_trampoline (qemu-system-x86_64)
                #9  0x00007f05ab9edfa0 n/a (libc.so.6)
Refusing to dump core to tty.

[NB: The above refusal to dump core is because I didn't have debug-info
packages.]

Comment 1 Kashyap Chamarthy 2015-09-28 15:20:39 UTC
Created attachment 1077951 [details]
libvirtd.log on source host

Comment 2 Kashyap Chamarthy 2015-09-28 15:21:21 UTC
This is fixed (merged)upstream:

    http://git.qemu.org/?p=qemu.git;a=commit;h=e424aff 
    -- mirror: Fix coroutine reentrance

    https://lists.gnu.org/archive/html/qemu-devel/2015-08/msg01524.html

Comment 3 Kashyap Chamarthy 2015-09-30 01:46:21 UTC
I applied the patch in Comment #2 and prepared an F22 QEMU scratch build[1] to test.

I can confirm that this bug is fixed:

    $ virsh migrate --verbose \
        --copy-storage-inc --p2p \
        --live cvm1 qemu+ssh://root@devstack2/system
    Migration: [100 %]


Evidence from logs
------------------

SOURCE: Successful invocation of QMP 'drive-mirror' 

[. . .]
2015-09-29 18:47:44.006+0000: 8285: debug : qemuMonitorJSONCommandWithFd:290 : Send command '{"execute":"drive-mirror","arguments":{"device":"drive-virtio-disk0","target":"nbd:devstack2:49153:exportname=drive-virtio-disk0","speed":8796093022207,"sync":"top","mode":"existing"},"id":"libvirt-11"}' for write with FD -1
[. . .]


DEST host: Successful NBD server operations.

[. . .]
2015-09-29 18:47:43.603+0000: 22665: debug : qemuMonitorJSONCommandWithFd:290 : Send command '{"execute":"nbd-server-start","arguments":{"addr":{"type":"inet","data":{"host":"::","port":"49153"}}},"id":"libvirt-13"}' for write with FD -1
2015-09-29 18:47:43.605+0000: 22665: debug : qemuMonitorJSONCommandWithFd:290 : Send command '{"execute":"nbd-server-add","arguments":{"device":"drive-virtio-disk0","writable":true},"id":"libvirt-14"}' for write with FD -1
[. . .]
2015-09-29 18:47:45.049+0000: 22661: debug : qemuMonitorJSONCommandWithFd:290 : Send command '{"execute":"nbd-server-stop","id":"libvirt-16"}' for write with FD -1
[. . .]


[1] http://koji.fedoraproject.org/koji/taskinfo?taskID=11265946

Comment 4 Fedora Update System 2015-10-08 19:58:22 UTC
qemu-2.3.1-7.fc22 has been submitted as an update to Fedora 22. https://bodhi.fedoraproject.org/updates/FEDORA-2015-d5c1048b47

Comment 5 Fedora Update System 2015-10-09 13:54:10 UTC
qemu-2.3.1-7.fc22 has been pushed to the Fedora 22 testing repository. If problems still persist, please make note of it in this bug report.
If you want to test the update, you can install it with
$ su -c 'dnf --enablerepo=updates-testing update qemu'
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-d5c1048b47

Comment 6 Fedora Update System 2015-10-23 17:24:28 UTC
qemu-2.3.1-7.fc22 has been pushed to the Fedora 22 stable repository. If problems still persist, please make note of it in this bug report.