Bug 1338638
Summary: | Migration fails after ejecting the cdrom in the guest | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Dan Zheng <dzheng> |
Component: | qemu-kvm-rhev | Assignee: | John Snow <jsnow> |
Status: | CLOSED ERRATA | QA Contact: | FuXiangChun <xfu> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 7.3 | CC: | chayang, dgilbert, dyuan, dzheng, fjin, gsun, huding, juzhang, knoel, mrezanin, mzhan, ngu, qizhu, virt-maint, xfu, yduan, zpeng |
Target Milestone: | rc | Keywords: | Regression |
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | qemu-kvm-rhev-2.6.0-26.el7 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-11-07 21:11:36 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Dan Zheng
2016-05-23 07:55:00 UTC
This looks like a fun one. David: I'm trying a migrate like this: jhuston@scv ((qemu-kvm-rhev-2.6.0-5.el7)) ~/s/q/b/git> ./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 4096 -cpu host -M pc -smp 4 -qmp tcp::4444,server,nowait -monitor stdio -hda /media/ext/img/f24b.qcow2 -cdrom /media/ext/iso/Fedora-Workstation-Live-x86_64-24_Beta-1.6.iso and on the receiving end: jhuston@scv ((qemu-kvm-rhev-2.6.0-5.el7)) ~/s/q/b/git> ./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 4096 -cpu host -M pc -smp 4 -monitor stdio -hda /media/ext/img/f24b.qcow2 -cdrom /media/ext/iso/Fedora-Workstation-Live-x86_64-24_Beta-1.6.iso -incoming tcp:localhost:1234 And back on the source VM, via HMP: "migrate tcp:localhost:1234" Source VM: (qemu) migrate tcp:127.0.0.1:1234 (qemu) [no further output/errors. VM remains active and responsive.] Destination VM: (qemu) qemu-system-x86_64: load of migration failed: Input/output error [VM closes with no further output.] David: Any suggestions for getting better output out of this to see what's going on? Oh that is fun. short answer: I think blk_flush_all is returning ENOMEDIUM (123) Longer version: I turned on all of the tracing on the loading side and found that it was failing straight after loading the RAM - I'd expected it to have tried to load the CDROM device, but no it was failing sooner. So then I turned on all the source side tracing, and it doesn't even get to trying to save the devices. migration/migration.c migration_completion has: if (!ret) { ******* ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE); if (ret >= 0) { ret = bdrv_inactivate_all(); } I added some printf's and vm_stop_force_state is returning -123, then I found it's taking the first branch through that calling vm_stop and it calls do_vm_stop which the only none-0 return path is from ret = blk_flush_all(); return ret; You can debug this a bit easier with just the source VM; if you do a: migrate "exec: cat > /dev/null" and wait until it finishes and do an 'info migrate' it shows failed for me. Dave Thanks for the assist, David! Looks like this (upstream) commit in the 2.5 timeframe introduced the regression: commit fe1a9cbc339bb54d20f1ca4c1e8788d16944d5cf Author: Max Reitz <mreitz> Date: Wed Mar 16 19:54:40 2016 +0100 block: Move some bdrv_*_all() functions to BB Move bdrv_commit_all() and bdrv_flush_all() to the BlockBackend level. Signed-off-by: Max Reitz <mreitz> Signed-off-by: Kevin Wolf <kwolf> Hit this issue. Version-Release number of selected component (if applicable): kernel 3.10.0-505.el7.x86_64 qemu-kvm-rhev 2.6.0-24.el7.x86_64 How reproducible: 100% Steps to Reproduce: 1.Boot guest with cdrom on src host and boot guest with '-incoming' on des host. 2.Open tray of the cdrom in guest: (qemu) info block drive_syscd (#block110): /mnt/RHEL-7.3-20160901.1-Server-x86_64-dvd1.iso (raw, read-only) Removable device: locked, tray closed Cache mode: writeback, direct drive_sysdisk (#block356): /mnt/sysdisk (qcow2) Cache mode: writeback, direct (qemu) eject drive_syscd Device 'drive_syscd' is locked and force was not specified, wait for tray to open and try again (qemu) info block drive_syscd (#block110): /mnt/RHEL-7.3-20160901.1-Server-x86_64-dvd1.iso (raw, read-only) Removable device: not locked, tray open Cache mode: writeback, direct drive_sysdisk (#block356): /mnt/sysdisk (qcow2) Cache mode: writeback, direct 3.Start live migration. (qemu) migrate -d tcp:$dst_host_ip:5800 {"execute": "migrate","arguments":{"uri": "tcp:$dst_host_ip:5800"}} Actual results: (qemu) qemu-kvm: load of migration failed: Input/output error red_channel_client_disconnect_dummy: rcc=0x7fcc44e82000 (channel=0x7fcc46dd8aa0 type=5 id=0) snd_channel_put: SndChannel=0x7fcc47a04000 freed red_channel_client_disconnect_dummy: rcc=0x7fcc44df3000 (channel=0x7fcc46dd8940 type=6 id=0) snd_channel_put: SndChannel=0x7fcc45934000 freed red_channel_client_disconnect: rcc=0x7fcc45eb4000 (channel=0x7fcc45788600 type=2 id=0) qemu-kvm: network script /etc/ifdown_script failed with status 256 red_channel_client_disconnect: rcc=0x7fcc44dee000 (channel=0x7fcc45777b80 type=4 id=0) Fix under review upstream: https://lists.nongnu.org/archive/html/qemu-devel/2016-09/msg03745.html Fix included in qemu-kvm-rhev-2.6.0-26.el7 Reproduced with qemu-kvm-rhev-2.6.0-2.el7.x86_64. Steps are exactly same as comment 0. In step 5, it prompts: # virsh migrate bug --live --verbose --unsafe qemu+ssh://10.73.72.58:22/system root.72.58's password: Migration: [ 94 %]error: internal error: qemu unexpectedly closed the monitor: main_channel_link: add main channel client inputs_connect: inputs channel client create red_dispatcher_set_cursor_peer: red_channel_client_disconnect: rcc=0x7f631dd2c000 (channel=0x7f631c1e4600 type=2 id=0) 2016-09-23T08:21:09.881350Z qemu-kvm: load of migration failed: Input/output error *************************************************************************** With qemu-kvm-rhev-2.6.0-26.el7.x86_64, migration succeeds without any error prompt. Steps are exactly same as comment 0. It is also reproduced and verified as comment 7. So this issue has been fixed. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-2673.html |