RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1151723 - migration will hang after use migrate with --graphicsuri and guest status will be locked
Summary: migration will hang after use migrate with --graphicsuri and guest status wil...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt
Version: 7.1
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Jiri Denemark
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: 1288337
TreeView+ depends on / blocked
 
Reported: 2014-10-11 07:48 UTC by Luyao Huang
Modified: 2016-11-03 18:10 UTC (History)
14 users (show)

Fixed In Version: libvirt-2.0.0-2.el7
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-03 18:10:42 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
libvirtd log on source host (264.31 KB, application/x-gzip)
2016-04-12 03:50 UTC, Fangge Jin
no flags Details
the second migration gets stuck if the first migration is cancelled immediately (411.49 KB, application/x-gzip)
2016-07-12 11:08 UTC, Fangge Jin
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2016:2577 0 normal SHIPPED_LIVE Moderate: libvirt security, bug fix, and enhancement update 2016-11-03 12:07:06 UTC

Description Luyao Huang 2014-10-11 07:48:52 UTC
Description of problem:
migration will hang after use migrate --graphicsuri with a invalid uri and guest status will be locked.
 Only found this issue with guest use spice.

Version-Release number of selected component (if applicable):
libvirt-1.2.8-5.el7.x86_64
qemu-img-rhev-2.1.2-3.el7.x86_64


How reproducible:
100%

Steps to Reproduce:

1.prepare a guest can be migrate success and prepare a migration env
# virsh list --all
 Id    Name                           State
----------------------------------------------------
 4     test3                          running

2.#virsh dumpxml test3
    <graphics type='spice' autoport='yes' listen='127.0.0.1'>
      <listen type='address' address='127.0.0.1'/>
    </graphics>


3.use a invalid vnc(i use vnc123 )
# time virsh migrate test3 --graphicsuri vnc123 qemu+ssh://10.66.70.127/system --live --verbose
root.70.127's password: 
Migration: [100 %]
Migration: [100 %]
Migration: [100 %]

Migration: [100 %]^C^C                         <------hang

real	3m53.970s
user	0m0.146s
sys	0m0.097s

4.on source:

# virsh list --all
 Id    Name                           State
----------------------------------------------------
 4     test3                          paused

5.# virsh resume test3
error: Failed to resume domain test3
error: Timed out during operation: cannot acquire state change lock

6.# virsh destroy test3
Domain test3 destroyed



Actual results:

migrate cmd hang and after use ctrl+c 
source guest status will be locked
dest guest will be running status but cannot be used(cannot read or write)

Expected results:
report  error before migrate

Additional info:

log from libvirtd.log:
2014-10-11 07:43:15.601+0000: 15140: error : qemuDomainMigrateGraphicsRelocate:2143 : invalid argument: unknown graphics type (null)
2014-10-11 07:43:15.601+0000: 15140: warning : qemuMigrationRun:3555 : unable to provide data for graphics client relocation
2014-10-11 07:43:18.067+0000: 15140: warning : qemuMigrationCancelDriveMirror:1632 : Unable to stop block job on drive-ide0-0-0

Comment 1 Jiri Denemark 2014-11-27 16:12:36 UTC
It's apparently something in qemu-kvm-rhev. It works with qemu-kvm-1.5.3-60.el7, doesn't work with qemu-kvm-rhev-2.1.2-3.el7 and doesn't work in qemu-kvm-rhev-2.1.2-14.el7 either.

Comment 2 Jiri Denemark 2014-12-02 12:29:15 UTC
So the difference between 1.5.3 and 2.1.2 is in the response to query-spice in
case of invalid graphics URI. Normally, we send client_migrate_info and at the
end of migration, we wait for query-spice to return migrated = True. However,
if invalid graphics URI is passed to our migration APIs (i.e., something that
does not start with spice://), we don't call client_migrate_info. But we still
wait for query-spice (as long as spice is enabled for the domain, of course)
to return migrated = true at the end of migration.

With qemu-kvm-1.5.3 SPICE_DISCONNECTED event is emitted and followed by
SPICE_MIGRATE_COMPLETED. Once migration completes, query-spice returns:

    {
      "return": {
        "migrated": true,
        "enabled": true,
        "auth": "none",
        "port": 5900,
        "compiled-version": "0.12.4",
        "host": "0.0.0.0",
        "channels": [

        ],
        "mouse-mode": "server"
      },
      "id": "libvirt-25"
    }


While with qemu-kvm-rhev-2.1.2 no SPICE related events are emitted and at the
end of migration query-spice always returns (172.17.172.1 is the client):

    {
      "return": {
        "migrated": false,
        "enabled": true,
        "auth": "none",
        "port": 5900,
        "compiled-version": "0.12.4",
        "host": "0.0.0.0",
        "channels": [
          {
            "port": "51853",
            "family": "ipv4",
            "channel-type": 1,
            "connection-id": 2035481344,
            "host": "172.17.172.1",
            "channel-id": 0,
            "tls": false
          },
          {
            "port": "51854",
            "family": "ipv4",
            "channel-type": 2,
            "connection-id": 2035481344,
            "host": "172.17.172.1",
            "channel-id": 0,
            "tls": false
          },
          {
            "port": "51855",
            "family": "ipv4",
            "channel-type": 3,
            "connection-id": 2035481344,
            "host": "172.17.172.1",
            "channel-id": 0,
            "tls": false
          },
          {
            "port": "51856",
            "family": "ipv4",
            "channel-type": 4,
            "connection-id": 2035481344,
            "host": "172.17.172.1",
            "channel-id": 0,
            "tls": false
          }
        ],
        "mouse-mode": "server"
      },
      "id": "libvirt-238"
    }

and libvirt ends up in an endless loop waiting for migrated = true.

Perhaps we should not wait for spice to finish migration when we didn't call
client_migrate_info, I don't know. But it still seems QEMU behaves strangely.

Comment 5 Gerd Hoffmann 2015-09-09 12:38:48 UTC
> Perhaps we should not wait for spice to finish migration when we didn't call
> client_migrate_info, I don't know.

Yes, you should not.

BTW: no need to poll 'migrate', you can just wait for SPICE_MIGRATE_COMPLETED.

> But it still seems QEMU behaves strangely.

Why?  Sending spice migration notification when no spice client migration happened in the first place is strange.  Was fixed here:

============================= cut here =================================

commit a76a2f729aae21c45c7e9eef8d1d80e94d1cc930
Author: Gerd Hoffmann <kraxel>
Date:   Tue Apr 29 09:27:31 2014 +0200

    spice: fix libvirt snapshots
    
    Only notify spice-server about migration events in case we got
    target host information beforehand.  So we kick the seamless spice
    client migration only in case a actual live migration happens, not
    when libvirt uses live-migration-to-file for snapshotting.
    
    Signed-off-by: Gerd Hoffmann <kraxel>

Comment 6 Jiri Denemark 2016-03-01 15:45:11 UTC
Fixed upstream by v1.3.2-48-gbd7c8a6:

commit bd7c8a693d4d5f036ac55990bf5785dd19774685
Author:     Jiri Denemark <jdenemar>
AuthorDate: Mon Feb 29 13:18:13 2016 +0100
Commit:     Jiri Denemark <jdenemar>
CommitDate: Tue Mar 1 15:59:00 2016 +0100

    qemu: Don't always wait for SPICE to finish migration
    
    When SPICE graphics is configured for a domain but we did not ask the
    client to switch to the destination, we should not wait for
    SPICE_MIGRATE_COMPLETED event (which will never come).
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1151723
    
    Signed-off-by: Jiri Denemark <jdenemar>

Comment 7 Mike McCune 2016-03-28 22:45:31 UTC
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions

Comment 9 Fangge Jin 2016-04-11 10:12:43 UTC
I can reproduce with build libvirt-1.2.8-5.el7.x86_64 and qemu-kvm-rhev-2.1.2-23.el7.x86_64

Verify pass with build libvirt-1.3.3-1.el7.x86_64 and qemu-kvm-rhev-2.5.0-4.el7.x86_64

Steps:
1.# virsh list --all
 Id    Name                           State
----------------------------------------------------
 8     rhel7.2-1030                   running
2.Connect to guest graphic:
# remote-viewer spice://10.66.5.57:5900
3.# virsh migrate rhel7.2-1030 qemu+ssh://10.66.4.113/system --live --verbose --graphicsuri aakdkdie
Migration: [100 %]
[root@fjin-5-57 2.1.2-23]#
4.On source host:
# virsh list
 Id    Name                           State
----------------------------------------------------
5.On target host:
# virsh list
 Id    Name                           State
----------------------------------------------------
 6     rhel7.2-1030                   running

6.Connect to guest graphic:
# remote-viewer spice://10.66.4.113:5900
7.In guest, do some operation, it can read and write.
8.Migrate back:
# virsh migrate rhel7.2-1030 qemu+ssh://10.66.5.57/system --live --verbose --graphicsuri spice://10.66.5.57:5900
Migration: [100 %]
[root@fjin-4-113 ~]#

Comment 10 Fangge Jin 2016-04-12 03:48:30 UTC
After do more testing, I found that when guest is persistent and do migration with --graphicsuri {invalid_uri} after a successfully migration, migrate will hang(waiting for spice migration to finish). 
Maybe wait_for_spice is not reset to false after the first successful migration, I guess.


Steps:
0. Guest rhel7.2-1030 is persistent on source host
1.On source
# virsh migrate rhel7.2-1030 qemu+ssh://10.66.4.113/system --live --verbose 
Migration: [100 %]
[root@fjin-5-57 libvirt]

2.On target, migration back:
# virsh migrate rhel7.2-1030 qemu+ssh://10.66.5.57/system --live --verbose 
Migration: [100 %]

3.On source, migrate with invalid graphicsuri
# virsh migrate rhel7.2-1030 qemu+ssh://10.66.4.113/system --live --verbose --graphicsuri 10.66.4.113
Migration: [100 %]
Migration: [100 %]
(after several minutes, virsh still hangs)

Comment 11 Fangge Jin 2016-04-12 03:50:13 UTC
Created attachment 1146190 [details]
libvirtd log on source host

Comment 12 Jiri Denemark 2016-07-05 08:52:28 UTC
Indeed, libvirt doesn't properly reset job->spiceMigration and thus a migration with an incorrect graphics URI will get stuck in case the domain was migrated with a correct graphics URI before. This bug affects mainly persistent domains; transient domains are affected only if the first migration is cancelled. Patches for this issue were sent for review upstream:

  https://www.redhat.com/archives/libvir-list/2016-July/msg00108.html

Unfortunately, there is a related bug 1352836, which needs to be taken into account when testing this bug with qemu-kvm-rhev-2.6.

Comment 13 Jiri Denemark 2016-07-08 11:46:36 UTC
This should be now fixed upstream by v2.0.0-59-ga16ea1a..v2.0.0-60-gf34b981:

commit a16ea1a0f3e6b9eb8be4be7a664af76e47bbceba
Author:     Jiri Denemark <jdenemar>
AuthorDate: Tue Jul 5 10:07:24 2016 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Fri Jul 8 13:35:17 2016 +0200

    qemu: Properly reset spiceMigration flag

    Otherwise migration during which we didn't send client_migrate_info QMP
    command will get stuck waiting for SPICE migration to finish if libvirtd
    sent the QMP command in a previous migration attempt.

    Broken by bd7c8a69.

    https://bugzilla.redhat.com/show_bug.cgi?id=1151723

    Signed-off-by: Jiri Denemark <jdenemar>

commit f34b981e403ce7abf41c0047e1b5610e1f5269db
Author:     Jiri Denemark <jdenemar>
AuthorDate: Wed Jun 29 15:01:17 2016 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Fri Jul 8 13:36:00 2016 +0200

    qemu: Drop useless SPICE migration code

    The spiceMigration flag will never be true if there is no SPICE graphics
    configured for the domain.

    https://bugzilla.redhat.com/show_bug.cgi?id=1151723

    Signed-off-by: Jiri Denemark <jdenemar>

Comment 15 Fangge Jin 2016-07-12 06:49:42 UTC
Verify on build libvirt-2.0.0-2.el7.x86_64 and qemu-kvm-rhev-2.6.0-12.el7.x86_64

Scenario 1: migrate with invalid graphicsuri -> migrate back with default graphicsuri -> migrate with correct graphicsuri

1.Define&start a guest with spice graphic on host A:

2.Connect a spice client to the guest:
# remote-viewer spice://hp-dl385g7-05.lab.eng.pek2.redhat.com:5900

3.Migrate the guest to host B with invalid graphicsuri
# virsh migrate rhel7.2 qemu+ssh://hp-dl385g7-06.lab.eng.pek2.redhat.com/system --live --verbose --graphicsuri abcdefg
Migration: [100 %]

Virsh didn't hang after migration is 100%, and the spice client disconnects.

4.After migration, connect a spice client to the guest again:
# remote-viewer spice://hp-dl385g7-06.lab.eng.pek2.redhat.com:5900

5.Migrate the guest back with default graphicsuri
# virsh migrate rhel7.2 qemu+tcp://hp-dl385g7-05.lab.eng.pek2.redhat.com/system --live --verbose 
Migration: [100 %]

The spice migration finishes successfully.

6.Migrate the guest to host B again with correct graphicsuri
# virsh migrate rhel7.2 qemu+ssh://hp-dl385g7-06.lab.eng.pek2.redhat.com/system --live --verbose --graphicsuri spice://hp-dl385g7-06.lab.eng.pek2.redhat.com:5900
Migration: [100 %]

The spice migration finishes successfully.



Scenario 2: prepare a persistent guest, migrate with default graphicsuri -> migrate back with default graphicsuri -> migrate with invalid graphicsuri again

1.Define&start a guest with spice graphic on host A:

2.Connect a spice client to the guest:
# remote-viewer spice://hp-dl385g7-05.lab.eng.pek2.redhat.com:5900

3.Migrate the guest to host B with default graphicsuri
# virsh migrate rhel7.2 qemu+ssh://hp-dl385g7-06.lab.eng.pek2.redhat.com/system --live --verbose 
Migration: [100 %]

4.Migrate back:
# virsh migrate rhel7.2 qemu+ssh://hp-dl385g7-05.lab.eng.pek2.redhat.com/system --live --verbose 
Migration: [100 %]

5.Migrate the guest to host B with invalid graphicsuri:
# virsh migrate rhel7.2 qemu+ssh://hp-dl385g7-06.lab.eng.pek2.redhat.com/system --live --verbose --graphicsuri abcdefg
Migration: [100 %]

Virsh didn't hang after migration is 100%, and the spice client disconnects.

Comment 16 Fangge Jin 2016-07-12 06:53:07 UTC
With qemu-kvm-rhev-2.6.0-12.el7.x86_64, I can't reproduce the issue described in comment 12 (and reported in Bug 1352836 ).

Comment 17 Fangge Jin 2016-07-12 11:04:52 UTC
And I met another problem, cancel the first migration immediately after issuing the command, then do migrate again, the second migration will get stuck after memory is 100% transferred. 


# virsh migrate rhel7.2 qemu+tcp://hp-dl385g7-06.lab.eng.pek2.redhat.com/system --live --verbose  --p2p
^Cerror: operation aborted: migration out: canceled by client

Connect a spice client to the guest:
# remote-viewer spice://hp-dl385g7-05.lab.eng.pek2.redhat.com:5900

# virsh migrate rhel7.2 qemu+tcp://hp-dl385g7-06.lab.eng.pek2.redhat.com/system --live --verbose  --p2p
Migration: [100 %]
Migration: [100 %]
Migration: [100 %]
Migration: [100 %]^C
[root@hp-dl385g7-05 2.6.0-13]#

Comment 18 Fangge Jin 2016-07-12 11:08:55 UTC
Created attachment 1178860 [details]
the second migration gets stuck if the first migration is cancelled immediately

Comment 19 Jiri Denemark 2016-07-19 07:57:28 UTC
I think it's the same issue as reported in bug 1352836, but in your case different steps were needed to reproduce it.

Comment 20 Fangge Jin 2016-08-08 08:32:24 UTC
According to comment 15, move this bug to verified. I will track the issue in comment 17 by adding a new test case

Comment 22 errata-xmlrpc 2016-11-03 18:10:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2577.html


Note You need to log in before you can comment on or make changes to this bug.