Bug 1151723
Summary: | migration will hang after use migrate with --graphicsuri and guest status will be locked | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Luyao Huang <lhuang> | ||||||
Component: | libvirt | Assignee: | Jiri Denemark <jdenemar> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 7.1 | CC: | dgilbert, dyuan, fjin, hhuang, huding, jdenemar, juzhang, knoel, mzhan, quintela, rbalakri, virt-maint, xfu, zpeng | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | libvirt-2.0.0-2.el7 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2016-11-03 18:10:42 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1288337 | ||||||||
Attachments: |
|
Description
Luyao Huang
2014-10-11 07:48:52 UTC
It's apparently something in qemu-kvm-rhev. It works with qemu-kvm-1.5.3-60.el7, doesn't work with qemu-kvm-rhev-2.1.2-3.el7 and doesn't work in qemu-kvm-rhev-2.1.2-14.el7 either. So the difference between 1.5.3 and 2.1.2 is in the response to query-spice in case of invalid graphics URI. Normally, we send client_migrate_info and at the end of migration, we wait for query-spice to return migrated = True. However, if invalid graphics URI is passed to our migration APIs (i.e., something that does not start with spice://), we don't call client_migrate_info. But we still wait for query-spice (as long as spice is enabled for the domain, of course) to return migrated = true at the end of migration. With qemu-kvm-1.5.3 SPICE_DISCONNECTED event is emitted and followed by SPICE_MIGRATE_COMPLETED. Once migration completes, query-spice returns: { "return": { "migrated": true, "enabled": true, "auth": "none", "port": 5900, "compiled-version": "0.12.4", "host": "0.0.0.0", "channels": [ ], "mouse-mode": "server" }, "id": "libvirt-25" } While with qemu-kvm-rhev-2.1.2 no SPICE related events are emitted and at the end of migration query-spice always returns (172.17.172.1 is the client): { "return": { "migrated": false, "enabled": true, "auth": "none", "port": 5900, "compiled-version": "0.12.4", "host": "0.0.0.0", "channels": [ { "port": "51853", "family": "ipv4", "channel-type": 1, "connection-id": 2035481344, "host": "172.17.172.1", "channel-id": 0, "tls": false }, { "port": "51854", "family": "ipv4", "channel-type": 2, "connection-id": 2035481344, "host": "172.17.172.1", "channel-id": 0, "tls": false }, { "port": "51855", "family": "ipv4", "channel-type": 3, "connection-id": 2035481344, "host": "172.17.172.1", "channel-id": 0, "tls": false }, { "port": "51856", "family": "ipv4", "channel-type": 4, "connection-id": 2035481344, "host": "172.17.172.1", "channel-id": 0, "tls": false } ], "mouse-mode": "server" }, "id": "libvirt-238" } and libvirt ends up in an endless loop waiting for migrated = true. Perhaps we should not wait for spice to finish migration when we didn't call client_migrate_info, I don't know. But it still seems QEMU behaves strangely. > Perhaps we should not wait for spice to finish migration when we didn't call > client_migrate_info, I don't know. Yes, you should not. BTW: no need to poll 'migrate', you can just wait for SPICE_MIGRATE_COMPLETED. > But it still seems QEMU behaves strangely. Why? Sending spice migration notification when no spice client migration happened in the first place is strange. Was fixed here: ============================= cut here ================================= commit a76a2f729aae21c45c7e9eef8d1d80e94d1cc930 Author: Gerd Hoffmann <kraxel> Date: Tue Apr 29 09:27:31 2014 +0200 spice: fix libvirt snapshots Only notify spice-server about migration events in case we got target host information beforehand. So we kick the seamless spice client migration only in case a actual live migration happens, not when libvirt uses live-migration-to-file for snapshotting. Signed-off-by: Gerd Hoffmann <kraxel> Fixed upstream by v1.3.2-48-gbd7c8a6: commit bd7c8a693d4d5f036ac55990bf5785dd19774685 Author: Jiri Denemark <jdenemar> AuthorDate: Mon Feb 29 13:18:13 2016 +0100 Commit: Jiri Denemark <jdenemar> CommitDate: Tue Mar 1 15:59:00 2016 +0100 qemu: Don't always wait for SPICE to finish migration When SPICE graphics is configured for a domain but we did not ask the client to switch to the destination, we should not wait for SPICE_MIGRATE_COMPLETED event (which will never come). https://bugzilla.redhat.com/show_bug.cgi?id=1151723 Signed-off-by: Jiri Denemark <jdenemar> This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions I can reproduce with build libvirt-1.2.8-5.el7.x86_64 and qemu-kvm-rhev-2.1.2-23.el7.x86_64 Verify pass with build libvirt-1.3.3-1.el7.x86_64 and qemu-kvm-rhev-2.5.0-4.el7.x86_64 Steps: 1.# virsh list --all Id Name State ---------------------------------------------------- 8 rhel7.2-1030 running 2.Connect to guest graphic: # remote-viewer spice://10.66.5.57:5900 3.# virsh migrate rhel7.2-1030 qemu+ssh://10.66.4.113/system --live --verbose --graphicsuri aakdkdie Migration: [100 %] [root@fjin-5-57 2.1.2-23]# 4.On source host: # virsh list Id Name State ---------------------------------------------------- 5.On target host: # virsh list Id Name State ---------------------------------------------------- 6 rhel7.2-1030 running 6.Connect to guest graphic: # remote-viewer spice://10.66.4.113:5900 7.In guest, do some operation, it can read and write. 8.Migrate back: # virsh migrate rhel7.2-1030 qemu+ssh://10.66.5.57/system --live --verbose --graphicsuri spice://10.66.5.57:5900 Migration: [100 %] [root@fjin-4-113 ~]# After do more testing, I found that when guest is persistent and do migration with --graphicsuri {invalid_uri} after a successfully migration, migrate will hang(waiting for spice migration to finish). Maybe wait_for_spice is not reset to false after the first successful migration, I guess. Steps: 0. Guest rhel7.2-1030 is persistent on source host 1.On source # virsh migrate rhel7.2-1030 qemu+ssh://10.66.4.113/system --live --verbose Migration: [100 %] [root@fjin-5-57 libvirt] 2.On target, migration back: # virsh migrate rhel7.2-1030 qemu+ssh://10.66.5.57/system --live --verbose Migration: [100 %] 3.On source, migrate with invalid graphicsuri # virsh migrate rhel7.2-1030 qemu+ssh://10.66.4.113/system --live --verbose --graphicsuri 10.66.4.113 Migration: [100 %] Migration: [100 %] (after several minutes, virsh still hangs) Created attachment 1146190 [details]
libvirtd log on source host
Indeed, libvirt doesn't properly reset job->spiceMigration and thus a migration with an incorrect graphics URI will get stuck in case the domain was migrated with a correct graphics URI before. This bug affects mainly persistent domains; transient domains are affected only if the first migration is cancelled. Patches for this issue were sent for review upstream: https://www.redhat.com/archives/libvir-list/2016-July/msg00108.html Unfortunately, there is a related bug 1352836, which needs to be taken into account when testing this bug with qemu-kvm-rhev-2.6. This should be now fixed upstream by v2.0.0-59-ga16ea1a..v2.0.0-60-gf34b981: commit a16ea1a0f3e6b9eb8be4be7a664af76e47bbceba Author: Jiri Denemark <jdenemar> AuthorDate: Tue Jul 5 10:07:24 2016 +0200 Commit: Jiri Denemark <jdenemar> CommitDate: Fri Jul 8 13:35:17 2016 +0200 qemu: Properly reset spiceMigration flag Otherwise migration during which we didn't send client_migrate_info QMP command will get stuck waiting for SPICE migration to finish if libvirtd sent the QMP command in a previous migration attempt. Broken by bd7c8a69. https://bugzilla.redhat.com/show_bug.cgi?id=1151723 Signed-off-by: Jiri Denemark <jdenemar> commit f34b981e403ce7abf41c0047e1b5610e1f5269db Author: Jiri Denemark <jdenemar> AuthorDate: Wed Jun 29 15:01:17 2016 +0200 Commit: Jiri Denemark <jdenemar> CommitDate: Fri Jul 8 13:36:00 2016 +0200 qemu: Drop useless SPICE migration code The spiceMigration flag will never be true if there is no SPICE graphics configured for the domain. https://bugzilla.redhat.com/show_bug.cgi?id=1151723 Signed-off-by: Jiri Denemark <jdenemar> Verify on build libvirt-2.0.0-2.el7.x86_64 and qemu-kvm-rhev-2.6.0-12.el7.x86_64 Scenario 1: migrate with invalid graphicsuri -> migrate back with default graphicsuri -> migrate with correct graphicsuri 1.Define&start a guest with spice graphic on host A: 2.Connect a spice client to the guest: # remote-viewer spice://hp-dl385g7-05.lab.eng.pek2.redhat.com:5900 3.Migrate the guest to host B with invalid graphicsuri # virsh migrate rhel7.2 qemu+ssh://hp-dl385g7-06.lab.eng.pek2.redhat.com/system --live --verbose --graphicsuri abcdefg Migration: [100 %] Virsh didn't hang after migration is 100%, and the spice client disconnects. 4.After migration, connect a spice client to the guest again: # remote-viewer spice://hp-dl385g7-06.lab.eng.pek2.redhat.com:5900 5.Migrate the guest back with default graphicsuri # virsh migrate rhel7.2 qemu+tcp://hp-dl385g7-05.lab.eng.pek2.redhat.com/system --live --verbose Migration: [100 %] The spice migration finishes successfully. 6.Migrate the guest to host B again with correct graphicsuri # virsh migrate rhel7.2 qemu+ssh://hp-dl385g7-06.lab.eng.pek2.redhat.com/system --live --verbose --graphicsuri spice://hp-dl385g7-06.lab.eng.pek2.redhat.com:5900 Migration: [100 %] The spice migration finishes successfully. Scenario 2: prepare a persistent guest, migrate with default graphicsuri -> migrate back with default graphicsuri -> migrate with invalid graphicsuri again 1.Define&start a guest with spice graphic on host A: 2.Connect a spice client to the guest: # remote-viewer spice://hp-dl385g7-05.lab.eng.pek2.redhat.com:5900 3.Migrate the guest to host B with default graphicsuri # virsh migrate rhel7.2 qemu+ssh://hp-dl385g7-06.lab.eng.pek2.redhat.com/system --live --verbose Migration: [100 %] 4.Migrate back: # virsh migrate rhel7.2 qemu+ssh://hp-dl385g7-05.lab.eng.pek2.redhat.com/system --live --verbose Migration: [100 %] 5.Migrate the guest to host B with invalid graphicsuri: # virsh migrate rhel7.2 qemu+ssh://hp-dl385g7-06.lab.eng.pek2.redhat.com/system --live --verbose --graphicsuri abcdefg Migration: [100 %] Virsh didn't hang after migration is 100%, and the spice client disconnects. With qemu-kvm-rhev-2.6.0-12.el7.x86_64, I can't reproduce the issue described in comment 12 (and reported in Bug 1352836 ). And I met another problem, cancel the first migration immediately after issuing the command, then do migrate again, the second migration will get stuck after memory is 100% transferred. # virsh migrate rhel7.2 qemu+tcp://hp-dl385g7-06.lab.eng.pek2.redhat.com/system --live --verbose --p2p ^Cerror: operation aborted: migration out: canceled by client Connect a spice client to the guest: # remote-viewer spice://hp-dl385g7-05.lab.eng.pek2.redhat.com:5900 # virsh migrate rhel7.2 qemu+tcp://hp-dl385g7-06.lab.eng.pek2.redhat.com/system --live --verbose --p2p Migration: [100 %] Migration: [100 %] Migration: [100 %] Migration: [100 %]^C [root@hp-dl385g7-05 2.6.0-13]# Created attachment 1178860 [details]
the second migration gets stuck if the first migration is cancelled immediately
I think it's the same issue as reported in bug 1352836, but in your case different steps were needed to reproduce it. According to comment 15, move this bug to verified. I will track the issue in comment 17 by adding a new test case Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-2577.html |