Bug 1151723
| Summary: | migration will hang after use migrate with --graphicsuri and guest status will be locked | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Luyao Huang <lhuang> | ||||||
| Component: | libvirt | Assignee: | Jiri Denemark <jdenemar> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | medium | ||||||||
| Version: | 7.1 | CC: | dgilbert, dyuan, fjin, hhuang, huding, jdenemar, juzhang, knoel, mzhan, quintela, rbalakri, virt-maint, xfu, zpeng | ||||||
| Target Milestone: | rc | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | libvirt-2.0.0-2.el7 | Doc Type: | Bug Fix | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2016-11-03 18:10:42 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 1288337 | ||||||||
| Attachments: |
|
||||||||
It's apparently something in qemu-kvm-rhev. It works with qemu-kvm-1.5.3-60.el7, doesn't work with qemu-kvm-rhev-2.1.2-3.el7 and doesn't work in qemu-kvm-rhev-2.1.2-14.el7 either. So the difference between 1.5.3 and 2.1.2 is in the response to query-spice in
case of invalid graphics URI. Normally, we send client_migrate_info and at the
end of migration, we wait for query-spice to return migrated = True. However,
if invalid graphics URI is passed to our migration APIs (i.e., something that
does not start with spice://), we don't call client_migrate_info. But we still
wait for query-spice (as long as spice is enabled for the domain, of course)
to return migrated = true at the end of migration.
With qemu-kvm-1.5.3 SPICE_DISCONNECTED event is emitted and followed by
SPICE_MIGRATE_COMPLETED. Once migration completes, query-spice returns:
{
"return": {
"migrated": true,
"enabled": true,
"auth": "none",
"port": 5900,
"compiled-version": "0.12.4",
"host": "0.0.0.0",
"channels": [
],
"mouse-mode": "server"
},
"id": "libvirt-25"
}
While with qemu-kvm-rhev-2.1.2 no SPICE related events are emitted and at the
end of migration query-spice always returns (172.17.172.1 is the client):
{
"return": {
"migrated": false,
"enabled": true,
"auth": "none",
"port": 5900,
"compiled-version": "0.12.4",
"host": "0.0.0.0",
"channels": [
{
"port": "51853",
"family": "ipv4",
"channel-type": 1,
"connection-id": 2035481344,
"host": "172.17.172.1",
"channel-id": 0,
"tls": false
},
{
"port": "51854",
"family": "ipv4",
"channel-type": 2,
"connection-id": 2035481344,
"host": "172.17.172.1",
"channel-id": 0,
"tls": false
},
{
"port": "51855",
"family": "ipv4",
"channel-type": 3,
"connection-id": 2035481344,
"host": "172.17.172.1",
"channel-id": 0,
"tls": false
},
{
"port": "51856",
"family": "ipv4",
"channel-type": 4,
"connection-id": 2035481344,
"host": "172.17.172.1",
"channel-id": 0,
"tls": false
}
],
"mouse-mode": "server"
},
"id": "libvirt-238"
}
and libvirt ends up in an endless loop waiting for migrated = true.
Perhaps we should not wait for spice to finish migration when we didn't call
client_migrate_info, I don't know. But it still seems QEMU behaves strangely.
> Perhaps we should not wait for spice to finish migration when we didn't call > client_migrate_info, I don't know. Yes, you should not. BTW: no need to poll 'migrate', you can just wait for SPICE_MIGRATE_COMPLETED. > But it still seems QEMU behaves strangely. Why? Sending spice migration notification when no spice client migration happened in the first place is strange. Was fixed here: ============================= cut here ================================= commit a76a2f729aae21c45c7e9eef8d1d80e94d1cc930 Author: Gerd Hoffmann <kraxel> Date: Tue Apr 29 09:27:31 2014 +0200 spice: fix libvirt snapshots Only notify spice-server about migration events in case we got target host information beforehand. So we kick the seamless spice client migration only in case a actual live migration happens, not when libvirt uses live-migration-to-file for snapshotting. Signed-off-by: Gerd Hoffmann <kraxel> Fixed upstream by v1.3.2-48-gbd7c8a6:
commit bd7c8a693d4d5f036ac55990bf5785dd19774685
Author: Jiri Denemark <jdenemar>
AuthorDate: Mon Feb 29 13:18:13 2016 +0100
Commit: Jiri Denemark <jdenemar>
CommitDate: Tue Mar 1 15:59:00 2016 +0100
qemu: Don't always wait for SPICE to finish migration
When SPICE graphics is configured for a domain but we did not ask the
client to switch to the destination, we should not wait for
SPICE_MIGRATE_COMPLETED event (which will never come).
https://bugzilla.redhat.com/show_bug.cgi?id=1151723
Signed-off-by: Jiri Denemark <jdenemar>
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions I can reproduce with build libvirt-1.2.8-5.el7.x86_64 and qemu-kvm-rhev-2.1.2-23.el7.x86_64 Verify pass with build libvirt-1.3.3-1.el7.x86_64 and qemu-kvm-rhev-2.5.0-4.el7.x86_64 Steps: 1.# virsh list --all Id Name State ---------------------------------------------------- 8 rhel7.2-1030 running 2.Connect to guest graphic: # remote-viewer spice://10.66.5.57:5900 3.# virsh migrate rhel7.2-1030 qemu+ssh://10.66.4.113/system --live --verbose --graphicsuri aakdkdie Migration: [100 %] [root@fjin-5-57 2.1.2-23]# 4.On source host: # virsh list Id Name State ---------------------------------------------------- 5.On target host: # virsh list Id Name State ---------------------------------------------------- 6 rhel7.2-1030 running 6.Connect to guest graphic: # remote-viewer spice://10.66.4.113:5900 7.In guest, do some operation, it can read and write. 8.Migrate back: # virsh migrate rhel7.2-1030 qemu+ssh://10.66.5.57/system --live --verbose --graphicsuri spice://10.66.5.57:5900 Migration: [100 %] [root@fjin-4-113 ~]# After do more testing, I found that when guest is persistent and do migration with --graphicsuri {invalid_uri} after a successfully migration, migrate will hang(waiting for spice migration to finish).
Maybe wait_for_spice is not reset to false after the first successful migration, I guess.
Steps:
0. Guest rhel7.2-1030 is persistent on source host
1.On source
# virsh migrate rhel7.2-1030 qemu+ssh://10.66.4.113/system --live --verbose
Migration: [100 %]
[root@fjin-5-57 libvirt]
2.On target, migration back:
# virsh migrate rhel7.2-1030 qemu+ssh://10.66.5.57/system --live --verbose
Migration: [100 %]
3.On source, migrate with invalid graphicsuri
# virsh migrate rhel7.2-1030 qemu+ssh://10.66.4.113/system --live --verbose --graphicsuri 10.66.4.113
Migration: [100 %]
Migration: [100 %]
(after several minutes, virsh still hangs)
Created attachment 1146190 [details]
libvirtd log on source host
Indeed, libvirt doesn't properly reset job->spiceMigration and thus a migration with an incorrect graphics URI will get stuck in case the domain was migrated with a correct graphics URI before. This bug affects mainly persistent domains; transient domains are affected only if the first migration is cancelled. Patches for this issue were sent for review upstream: https://www.redhat.com/archives/libvir-list/2016-July/msg00108.html Unfortunately, there is a related bug 1352836, which needs to be taken into account when testing this bug with qemu-kvm-rhev-2.6. This should be now fixed upstream by v2.0.0-59-ga16ea1a..v2.0.0-60-gf34b981:
commit a16ea1a0f3e6b9eb8be4be7a664af76e47bbceba
Author: Jiri Denemark <jdenemar>
AuthorDate: Tue Jul 5 10:07:24 2016 +0200
Commit: Jiri Denemark <jdenemar>
CommitDate: Fri Jul 8 13:35:17 2016 +0200
qemu: Properly reset spiceMigration flag
Otherwise migration during which we didn't send client_migrate_info QMP
command will get stuck waiting for SPICE migration to finish if libvirtd
sent the QMP command in a previous migration attempt.
Broken by bd7c8a69.
https://bugzilla.redhat.com/show_bug.cgi?id=1151723
Signed-off-by: Jiri Denemark <jdenemar>
commit f34b981e403ce7abf41c0047e1b5610e1f5269db
Author: Jiri Denemark <jdenemar>
AuthorDate: Wed Jun 29 15:01:17 2016 +0200
Commit: Jiri Denemark <jdenemar>
CommitDate: Fri Jul 8 13:36:00 2016 +0200
qemu: Drop useless SPICE migration code
The spiceMigration flag will never be true if there is no SPICE graphics
configured for the domain.
https://bugzilla.redhat.com/show_bug.cgi?id=1151723
Signed-off-by: Jiri Denemark <jdenemar>
Verify on build libvirt-2.0.0-2.el7.x86_64 and qemu-kvm-rhev-2.6.0-12.el7.x86_64 Scenario 1: migrate with invalid graphicsuri -> migrate back with default graphicsuri -> migrate with correct graphicsuri 1.Define&start a guest with spice graphic on host A: 2.Connect a spice client to the guest: # remote-viewer spice://hp-dl385g7-05.lab.eng.pek2.redhat.com:5900 3.Migrate the guest to host B with invalid graphicsuri # virsh migrate rhel7.2 qemu+ssh://hp-dl385g7-06.lab.eng.pek2.redhat.com/system --live --verbose --graphicsuri abcdefg Migration: [100 %] Virsh didn't hang after migration is 100%, and the spice client disconnects. 4.After migration, connect a spice client to the guest again: # remote-viewer spice://hp-dl385g7-06.lab.eng.pek2.redhat.com:5900 5.Migrate the guest back with default graphicsuri # virsh migrate rhel7.2 qemu+tcp://hp-dl385g7-05.lab.eng.pek2.redhat.com/system --live --verbose Migration: [100 %] The spice migration finishes successfully. 6.Migrate the guest to host B again with correct graphicsuri # virsh migrate rhel7.2 qemu+ssh://hp-dl385g7-06.lab.eng.pek2.redhat.com/system --live --verbose --graphicsuri spice://hp-dl385g7-06.lab.eng.pek2.redhat.com:5900 Migration: [100 %] The spice migration finishes successfully. Scenario 2: prepare a persistent guest, migrate with default graphicsuri -> migrate back with default graphicsuri -> migrate with invalid graphicsuri again 1.Define&start a guest with spice graphic on host A: 2.Connect a spice client to the guest: # remote-viewer spice://hp-dl385g7-05.lab.eng.pek2.redhat.com:5900 3.Migrate the guest to host B with default graphicsuri # virsh migrate rhel7.2 qemu+ssh://hp-dl385g7-06.lab.eng.pek2.redhat.com/system --live --verbose Migration: [100 %] 4.Migrate back: # virsh migrate rhel7.2 qemu+ssh://hp-dl385g7-05.lab.eng.pek2.redhat.com/system --live --verbose Migration: [100 %] 5.Migrate the guest to host B with invalid graphicsuri: # virsh migrate rhel7.2 qemu+ssh://hp-dl385g7-06.lab.eng.pek2.redhat.com/system --live --verbose --graphicsuri abcdefg Migration: [100 %] Virsh didn't hang after migration is 100%, and the spice client disconnects. With qemu-kvm-rhev-2.6.0-12.el7.x86_64, I can't reproduce the issue described in comment 12 (and reported in Bug 1352836 ). And I met another problem, cancel the first migration immediately after issuing the command, then do migrate again, the second migration will get stuck after memory is 100% transferred. # virsh migrate rhel7.2 qemu+tcp://hp-dl385g7-06.lab.eng.pek2.redhat.com/system --live --verbose --p2p ^Cerror: operation aborted: migration out: canceled by client Connect a spice client to the guest: # remote-viewer spice://hp-dl385g7-05.lab.eng.pek2.redhat.com:5900 # virsh migrate rhel7.2 qemu+tcp://hp-dl385g7-06.lab.eng.pek2.redhat.com/system --live --verbose --p2p Migration: [100 %] Migration: [100 %] Migration: [100 %] Migration: [100 %]^C [root@hp-dl385g7-05 2.6.0-13]# Created attachment 1178860 [details]
the second migration gets stuck if the first migration is cancelled immediately
I think it's the same issue as reported in bug 1352836, but in your case different steps were needed to reproduce it. According to comment 15, move this bug to verified. I will track the issue in comment 17 by adding a new test case Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-2577.html |
Description of problem: migration will hang after use migrate --graphicsuri with a invalid uri and guest status will be locked. Only found this issue with guest use spice. Version-Release number of selected component (if applicable): libvirt-1.2.8-5.el7.x86_64 qemu-img-rhev-2.1.2-3.el7.x86_64 How reproducible: 100% Steps to Reproduce: 1.prepare a guest can be migrate success and prepare a migration env # virsh list --all Id Name State ---------------------------------------------------- 4 test3 running 2.#virsh dumpxml test3 <graphics type='spice' autoport='yes' listen='127.0.0.1'> <listen type='address' address='127.0.0.1'/> </graphics> 3.use a invalid vnc(i use vnc123 ) # time virsh migrate test3 --graphicsuri vnc123 qemu+ssh://10.66.70.127/system --live --verbose root.70.127's password: Migration: [100 %] Migration: [100 %] Migration: [100 %] Migration: [100 %]^C^C <------hang real 3m53.970s user 0m0.146s sys 0m0.097s 4.on source: # virsh list --all Id Name State ---------------------------------------------------- 4 test3 paused 5.# virsh resume test3 error: Failed to resume domain test3 error: Timed out during operation: cannot acquire state change lock 6.# virsh destroy test3 Domain test3 destroyed Actual results: migrate cmd hang and after use ctrl+c source guest status will be locked dest guest will be running status but cannot be used(cannot read or write) Expected results: report error before migrate Additional info: log from libvirtd.log: 2014-10-11 07:43:15.601+0000: 15140: error : qemuDomainMigrateGraphicsRelocate:2143 : invalid argument: unknown graphics type (null) 2014-10-11 07:43:15.601+0000: 15140: warning : qemuMigrationRun:3555 : unable to provide data for graphics client relocation 2014-10-11 07:43:18.067+0000: 15140: warning : qemuMigrationCancelDriveMirror:1632 : Unable to stop block job on drive-ide0-0-0