Bug 1338638 - Migration fails after ejecting the cdrom in the guest
Summary: Migration fails after ejecting the cdrom in the guest
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.3
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: John Snow
QA Contact: FuXiangChun
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-05-23 07:55 UTC by Dan Zheng
Modified: 2016-11-07 21:11 UTC (History)
17 users (show)

Fixed In Version: qemu-kvm-rhev-2.6.0-26.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-07 21:11:36 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:2673 normal SHIPPED_LIVE qemu-kvm-rhev bug fix and enhancement update 2016-11-08 01:06:13 UTC

Description Dan Zheng 2016-05-23 07:55:00 UTC
Description of problem:
Migration fails after eject the cdrom in the guest. This is a regression.
No problem is in qemu-kvm-rhev 2.5.0-4.el7.x86_64. And 2.6.0.1.el7.x86_64 also has this problem.

Version-Release number of selected component (if applicable):
kernel        3.10.0-327.el7.x86_64
qemu-kvm-rhev        2.6.0-2.el7.x86_64
libvirt        1.3.4-1.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Configure the guest with cdrom disk without cache='none' and start guest ok
2. Dumpxml guest
# virsh dumpxml avocado-vt-vm-ci
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/var/lib/libvirt/images2/virt_iso.img'/>
      <backingStore/>
      <target dev='hdc' bus='ide'/>
      <readonly/>
      <alias name='ide0-1-0'/>
      <address type='drive' controller='0' bus='1' target='0' unit='0'/>
    </disk>

3. Log on the guest and run eject the cdrom
# eject -v /dev/cdrom
eject: device name is `/dev/sr0'
eject: /dev/sr0: not mounted
eject: /dev/sr0: is whole-disk device
eject: /dev/sr0: is removable device
eject: /dev/sr0: trying to eject using CD-ROM eject command
eject: CD-ROM eject command succeeded

4. Check the guest XML , same with that before ejecting.
# virsh dumpxml avocado-vt-vm-ci
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/var/lib/libvirt/images2/virt_iso.img'/>
      <backingStore/>
      <target dev='hdc' bus='ide' tray='open'/>
      <readonly/>
      <alias name='ide0-1-0'/>
      <address type='drive' controller='0' bus='1' target='0' unit='0'/>
    </disk>

5. Try migration of the guest
# virsh migrate avocado-vt-vm-ci --live --verbose --unsafe qemu+ssh://10.66.4.167:22/system
root@10.66.4.167's password:
Migration: [ 97 %]error: internal error: early end of file from monitor, possible problem: warning: host doesn't support requested feature: CPUID.80000001H:ECX.abm [bit 5]
warning: host doesn't support requested feature: CPUID.80000001H:ECX.sse4a [bit 6]
warning: host doesn't support requested feature: CPUID.80000001H:ECX.abm [bit 5]
warning: host doesn't support requested feature: CPUID.80000001H:ECX.sse4a [bit 6]
2016-05-20T10:06:14.489445Z qemu-kvm: load of migration failed: Input/output error

6. Check XMl again.
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw' cache='none'/>
      <backingStore/>
      <target dev='hdc' bus='ide' tray='open'/>
      <readonly/>
      <alias name='ide0-1-0'/>
      <address type='drive' controller='0' bus='1' target='0' unit='0'/>
    </disk>

7. Destroy the guest and configure the guest with cache='none', then guest start ok.
8. Dumpxml below:
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw' cache='none'/>
      <source file='/var/lib/libvirt/images2/virt_iso.img'/>
      <backingStore/>
      <target dev='hdc' bus='ide'/>
      <readonly/>
      <alias name='ide0-1-0'/>
      <address type='drive' controller='0' bus='1' target='0' unit='0'/>
    </disk>
9. do same steps from step 3 to step 5,
Migration fails with the messages:
error: internal error: process exited while connecting to monitor: 2016-05-23T07:40:03.494254Z qemu-kvm: -drive if=none,id=drive-ide0-1-0,readonly=on,cache=none: Must specify either driver or file

10. Dumpxml of the guest:

    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw' cache='none'/>
      <backingStore/>
      <target dev='hdc' bus='ide'/>
      <readonly/>
      <alias name='ide0-1-0'/>
      <address type='drive' controller='0' bus='1' target='0' unit='0'/>
    </disk>

Actual results:
Migration command fails and guest is still running on the source host, no guest is in target host.

Expected results:
Migration command should succeed. Guest is shut off on source host and running on target host.

Additional info:
In step 5, (without cache='none')
qemu log on target host:
...
***-drive if=none,id=drive-ide0-1-0,readonly=on -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0***

warning: host doesn't support requested feature: CPUID.80000001H:ECX.abm [bit 5]
warning: host doesn't support requested feature: CPUID.80000001H:ECX.sse4a [bit 6]
warning: host doesn't support requested feature: CPUID.80000001H:ECX.abm [bit 5]
warning: host doesn't support requested feature: CPUID.80000001H:ECX.sse4a [bit 6]
2016-05-20T10:06:14.489445Z qemu-kvm: load of migration failed: Input/output error


in step 9, (with cache='none')
-drive if=none,id=drive-ide0-1-0,readonly=on,cache=none -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 
2016-05-23T07:40:03.494254Z qemu-kvm: -drive if=none,id=drive-ide0-1-0,readonly=on,cache=none: Must specify either driver or file

Comment 4 John Snow 2016-06-09 21:47:02 UTC
This looks like a fun one.

David: I'm trying a migrate like this:

jhuston@scv ((qemu-kvm-rhev-2.6.0-5.el7)) ~/s/q/b/git> 
./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 4096 -cpu host -M pc -smp 4 -qmp tcp::4444,server,nowait -monitor stdio -hda /media/ext/img/f24b.qcow2 -cdrom /media/ext/iso/Fedora-Workstation-Live-x86_64-24_Beta-1.6.iso 

and on the receiving end:

jhuston@scv ((qemu-kvm-rhev-2.6.0-5.el7)) ~/s/q/b/git> 
./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 4096 -cpu host -M pc -smp 4 -monitor stdio -hda /media/ext/img/f24b.qcow2 -cdrom /media/ext/iso/Fedora-Workstation-Live-x86_64-24_Beta-1.6.iso -incoming tcp:localhost:1234

And back on the source VM, via HMP:

"migrate tcp:localhost:1234"


Source VM:

(qemu) migrate tcp:127.0.0.1:1234
(qemu) 

[no further output/errors. VM remains active and responsive.]

Destination VM:

(qemu) qemu-system-x86_64: load of migration failed: Input/output error

[VM closes with no further output.]




David: Any suggestions for getting better output out of this to see what's going on?

Comment 5 Dr. David Alan Gilbert 2016-06-10 09:34:06 UTC
Oh that is fun.
short answer: I think blk_flush_all is returning ENOMEDIUM (123)

Longer version:

I turned on all of the tracing on the loading side and found that it was failing straight after loading the RAM - I'd expected it to have tried to load the CDROM device, but no it was failing sooner.

So then I turned on all the source side tracing, and it doesn't even get to trying to save the devices.

migration/migration.c migration_completion has:

        if (!ret) {
*******     ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
            if (ret >= 0) {
                ret = bdrv_inactivate_all();
            }

I added some printf's and vm_stop_force_state is returning -123, then I found it's taking the first branch through that calling vm_stop and it calls do_vm_stop which the only none-0 return path is from 

    ret = blk_flush_all();

    return ret;

You can debug this a bit easier with just the source VM; if you do a:

  migrate "exec: cat > /dev/null"

and wait until it finishes and do an   'info migrate'  it shows failed for me.

Dave

Comment 6 John Snow 2016-06-10 18:42:05 UTC
Thanks for the assist, David!

Looks like this (upstream) commit in the 2.5 timeframe introduced the regression:

commit fe1a9cbc339bb54d20f1ca4c1e8788d16944d5cf
Author: Max Reitz <mreitz@redhat.com>
Date:   Wed Mar 16 19:54:40 2016 +0100

    block: Move some bdrv_*_all() functions to BB
    
    Move bdrv_commit_all() and bdrv_flush_all() to the BlockBackend level.
    
    Signed-off-by: Max Reitz <mreitz@redhat.com>
    Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Comment 7 yduan 2016-09-12 07:03:59 UTC
Hit this issue.

Version-Release number of selected component (if applicable):
kernel        3.10.0-505.el7.x86_64
qemu-kvm-rhev        2.6.0-24.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Boot guest with cdrom on src host and boot guest with '-incoming' on des host.

2.Open tray of the cdrom in guest:
(qemu) info block
drive_syscd (#block110): /mnt/RHEL-7.3-20160901.1-Server-x86_64-dvd1.iso (raw, read-only)
    Removable device: locked, tray closed
    Cache mode:       writeback, direct

drive_sysdisk (#block356): /mnt/sysdisk (qcow2)
    Cache mode:       writeback, direct
(qemu) eject drive_syscd 
Device 'drive_syscd' is locked and force was not specified, wait for tray to open and try again
(qemu) info block
drive_syscd (#block110): /mnt/RHEL-7.3-20160901.1-Server-x86_64-dvd1.iso (raw, read-only)
    Removable device: not locked, tray open
    Cache mode:       writeback, direct

drive_sysdisk (#block356): /mnt/sysdisk (qcow2)
    Cache mode:       writeback, direct

3.Start live migration.
(qemu) migrate -d tcp:$dst_host_ip:5800
{"execute": "migrate","arguments":{"uri": "tcp:$dst_host_ip:5800"}}

Actual results:
(qemu) qemu-kvm: load of migration failed: Input/output error
red_channel_client_disconnect_dummy: rcc=0x7fcc44e82000 (channel=0x7fcc46dd8aa0 type=5 id=0)
snd_channel_put: SndChannel=0x7fcc47a04000 freed
red_channel_client_disconnect_dummy: rcc=0x7fcc44df3000 (channel=0x7fcc46dd8940 type=6 id=0)
snd_channel_put: SndChannel=0x7fcc45934000 freed
red_channel_client_disconnect: rcc=0x7fcc45eb4000 (channel=0x7fcc45788600 type=2 id=0)
qemu-kvm: network script /etc/ifdown_script failed with status 256
red_channel_client_disconnect: rcc=0x7fcc44dee000 (channel=0x7fcc45777b80 type=4 id=0)

Comment 8 John Snow 2016-09-16 01:12:03 UTC
Fix under review upstream: https://lists.nongnu.org/archive/html/qemu-devel/2016-09/msg03745.html

Comment 9 Miroslav Rezanina 2016-09-20 12:28:17 UTC
Fix included in qemu-kvm-rhev-2.6.0-26.el7

Comment 11 yduan 2016-09-23 08:43:05 UTC
Reproduced with qemu-kvm-rhev-2.6.0-2.el7.x86_64.

Steps are exactly same as comment 0.

In step 5, it prompts:
# virsh migrate bug --live --verbose --unsafe qemu+ssh://10.73.72.58:22/system
root@10.73.72.58's password: 
Migration: [ 94 %]error: internal error: qemu unexpectedly closed the monitor: main_channel_link: add main channel client
inputs_connect: inputs channel client create
red_dispatcher_set_cursor_peer: 
red_channel_client_disconnect: rcc=0x7f631dd2c000 (channel=0x7f631c1e4600 type=2 id=0)
2016-09-23T08:21:09.881350Z qemu-kvm: load of migration failed: Input/output error

***************************************************************************

With qemu-kvm-rhev-2.6.0-26.el7.x86_64, migration succeeds without any error prompt. Steps are exactly same as comment 0.

It is also reproduced and verified as comment 7.

So this issue has been fixed.

Comment 13 errata-xmlrpc 2016-11-07 21:11:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2673.html


Note You need to log in before you can comment on or make changes to this bug.