RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1376083 - Spice migration can't finish when migrate a guest with spice listen on port '0' from RHEL7.2 to RHEL7.3
Summary: Spice migration can't finish when migrate a guest with spice listen on port '...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt
Version: 7.3
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Jiri Denemark
QA Contact: Fangge Jin
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-09-14 15:44 UTC by Fangge Jin
Modified: 2017-06-15 09:07 UTC (History)
8 users (show)

Fixed In Version: libvirt-2.5.0-1.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-06-15 09:07:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
log and domain xml (355.17 KB, application/x-gzip)
2016-09-14 17:12 UTC, Fangge Jin
no flags Details
libvirtd log for comment 7 (271.68 KB, application/x-gzip)
2017-05-31 10:41 UTC, Fangge Jin
no flags Details
backtrace of libvirtd (12.51 KB, text/plain)
2017-05-31 11:51 UTC, Fangge Jin
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:1846 0 normal SHIPPED_LIVE libvirt bug fix and enhancement update 2017-08-01 18:02:50 UTC

Description Fangge Jin 2016-09-14 15:44:24 UTC
Description of problem:
Start a guest on RHEL7.2 host with spice listen on port '0', migrate it to RHEL7.3 host using virsh command, virsh command hangs after memory migration is 100% finished because it seems that spice migration can't finish.

# virsh dumpxml rhel7.3-0817
...
    <graphics type='spice' autoport='no' listen='0.0.0.0'>
      <listen type='address' address='0.0.0.0'/>
    </graphics>
...

# virsh migrate rhel7.3-0817 qemu+ssh://10.66.4.152/system --live --verbose
Migration: [100 %]

Part of gstack output of libvirtd process:
Thread 13 (Thread 0x7f8ddd672700 (LWP 29728)):
#0  0x00007f8dec2ff6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f8deec913c6 in virCondWait (c=c@entry=0x7f8d7026ea38, m=m@entry=0x7f8d7026ea10) at util/virthread.c:154
#2  0x00007f8deecae132 in virDomainObjWait (vm=vm@entry=0x7f8d7026ea00) at conf/domain_conf.c:2674
#3  0x00007f8dce35e343 in qemuMigrationWaitForSpice (vm=0x7f8d7026ea00) at qemu/qemu_migration.c:2469
#4  qemuMigrationConfirmPhase (driver=driver@entry=0x7f8d701af860, conn=conn@entry=0x7f8dc400f6d0, vm=0x7f8d7026ea00, cookiein=cookiein@entry=0x7f8dc4018c30 "<qemu-migration>\n  <name>rhel7.3-0817</name>\n  <uuid>f16dd5c5-f0ca-40b2-82a2-38089cc0e12b</uuid>\n  <hostname>fjin-4-152</hostname>\n  <hostuuid>4f379b00-5c66-11e2-a064-10604b782a74</hostuuid>\n  <statis"..., cookieinlen=cookieinlen@entry=1095, flags=flags@entry=257, retcode=retcode@entry=0) at qemu/qemu_migration.c:3809
#5  0x00007f8dce360bec in qemuMigrationConfirm (conn=0x7f8dc400f6d0, vm=0x7f8d7026ea00, cookiein=cookiein@entry=0x7f8dc4018c30 "<qemu-migration>\n  <name>rhel7.3-0817</name>\n  <uuid>f16dd5c5-f0ca-40b2-82a2-38089cc0e12b</uuid>\n  <hostname>fjin-4-152</hostname>\n  <hostuuid>4f379b00-5c66-11e2-a064-10604b782a74</hostuuid>\n  <statis"..., cookieinlen=cookieinlen@entry=1095, flags=flags@entry=257, cancelled=cancelled@entry=0) at qemu/qemu_migration.c:3878
#6  0x00007f8dce38cf4f in qemuDomainMigrateConfirm3Params (domain=0x7f8dc4012530, params=<optimized out>, nparams=<optimized out>, cookiein=0x7f8dc4018c30 "<qemu-migration>\n  <name>rhel7.3-0817</name>\n  <uuid>f16dd5c5-f0ca-40b2-82a2-38089cc0e12b</uuid>\n  <hostname>fjin-4-152</hostname>\n  <hostuuid>4f379b00-5c66-11e2-a064-10604b782a74</hostuuid>\n  <statis"..., cookieinlen=1095, flags=257, cancelled=0) at qemu/qemu_driver.c:12830
#7  0x00007f8deed2fc57 in virDomainMigrateConfirm3Params (domain=domain@entry=0x7f8dc4012530, params=params@entry=0x7f8dc40128b0, nparams=3, cookiein=0x7f8dc4018c30 "<qemu-migration>\n  <name>rhel7.3-0817</name>\n  <uuid>f16dd5c5-f0ca-40b2-82a2-38089cc0e12b</uuid>\n  <hostname>fjin-4-152</hostname>\n  <hostuuid>4f379b00-5c66-11e2-a064-10604b782a74</hostuuid>\n  <statis"..., cookieinlen=1095, flags=257, cancelled=0) at libvirt-domain.c:5329
#8  0x00007f8def965a06 in remoteDispatchDomainMigrateConfirm3Params (server=<optimized out>, msg=0x7f8df15ae8b0, args=0x7f8dc4011960, rerr=0x7f8ddd671c30, client=<optimized out>) at remote.c:5672
#9  remoteDispatchDomainMigrateConfirm3ParamsHelper (server=<optimized out>, client=<optimized out>, msg=0x7f8df15ae8b0, rerr=0x7f8ddd671c30, args=0x7f8dc4011960, ret=0x7f8dc40119b0) at remote_dispatch.h:6585
#10 0x00007f8deed9bd82 in virNetServerProgramDispatchCall (msg=0x7f8df15ae8b0, client=0x7f8df15a41d0, server=0x7f8df157f050, prog=0x7f8df159f2d0) at rpc/virnetserverprogram.c:437
#11 virNetServerProgramDispatch (prog=0x7f8df159f2d0, server=server@entry=0x7f8df157f050, client=0x7f8df15a41d0, msg=0x7f8df15ae8b0) at rpc/virnetserverprogram.c:307
#12 0x00007f8deed96ffd in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x7f8df157f050) at rpc/virnetserver.c:135
#13 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x7f8df157f050) at rpc/virnetserver.c:156
#14 0x00007f8deec91c35 in virThreadPoolWorker (opaque=opaque@entry=0x7f8df1573f60) at util/virthreadpool.c:145
#15 0x00007f8deec91158 in virThreadHelper (data=<optimized out>) at util/virthread.c:206
#16 0x00007f8dec2fbdc5 in start_thread () from /lib64/libpthread.so.0
#17 0x00007f8dec0291cd in clone () from /lib64/libc.so.6

Version-Release number of selected component (if applicable):
RHEL7.2:
libvirt-1.2.17-13.el7_2.5.x86_64
qemu-kvm-rhev-2.3.0-31.el7_2.22.x86_64
spice-server-0.12.4-15.el7_2.1.x86_64

RHEL7.3:
libvirt-2.0.0-8.el7.x86_64
qemu-kvm-rhev-2.6.0-25.el7.x86_64
spice-server-0.12.4-18.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Start guest on RHEl7.2 host
2.Migrate it to RHEL7.3 host

Actual results:
Spice migration can't finish

Expected results:
Spice migration can finish and virsh exits after migration finishes.

Additional info:
1) Migration from RHEL7.2 to RHEL7.2 has no such problem
2) Migration from RHEL7.3 to RHEL7.3 has no such problem
3) When spice listen on a non-zero port, spice migration can finish from RHEL7.2 to RHEL7.3

Comment 1 Fangge Jin 2016-09-14 16:54:05 UTC
Update spice-server on RHEL7.3 host to spice-server-0.12.4-19.el7.x86_64, and test again, the problem still exists

Comment 2 Fangge Jin 2016-09-14 17:12:38 UTC
Created attachment 1200914 [details]
log and domain xml

Comment 4 Jiri Denemark 2016-09-20 14:08:21 UTC
1) 7.3 -> 7.3 works because libvirtd is not (correctly) waiting for the SPICE_MIGRATE_COMPLETED event

2) in the 7.2 -> 7.2 case libvirtd is waiting for the event and QEMU sends it

3) 7.3 -> 7.2 is broken since libvirtd is waiting for the event, but QEMU doesn't send it

I wasn't able to reproduce the third case; QEMU was sending the event. I'm not sure why it didn't work for you, but we should fix libvirt to behave similarly to 1. That is, libvirt should not even try to do seamless SPICE migration.

Comment 5 Jiri Denemark 2016-09-21 12:19:07 UTC
Fixed upstream by

commit 2e164b451ee6a33e2d9ba33d7bf2121f3f4817a0
Refs: v2.2.0-141-g2e164b4
Author:     Jiri Denemark <jdenemar>
AuthorDate: Tue Sep 20 17:27:03 2016 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Wed Sep 21 14:11:26 2016 +0200

    qemu: Ignore graphics cookie if port == 0

    Old libvirt represents

        <graphics type='spice'>
          <listen type='none'/>
        </graphics>

    as

        <graphics type='spice' autoport='no'/>

    In this mode, QEMU doesn't listen for SPICE connection anywhere and
    clients have to use virDomainOpenGraphics* APIs to attach to the domain.
    That is, the client has to run on the same host where the domains runs
    and it's impossible to tell the client to reconnect to the destination
    QEMU during migration (unless there is some kind of proxy on the host).

    While current libvirt correctly ignores such graphics devices when
    creating graphics migration cookie, old libvirt just sends

        <graphics type='spice' port='0' listen='0.0.0.0' tlsPort='-1'/>

    in the cookie. After seeing this cookie, we happily would call
    client_migrate_info QMP command and wait for SPICE_MIGRATE_COMPLETED
    event, which is quite pointless since the doesn't know where to connecti
    anyway. We should just ignore such cookies.

    https://bugzilla.redhat.com/show_bug.cgi?id=1376083

    Signed-off-by: Jiri Denemark <jdenemar>

Comment 7 Fangge Jin 2017-05-31 09:09:18 UTC
Hi Jirka

I tried to verify this bug with the following builds, but the issue still exists:

Source host:
libvirt-1.2.17-13.el7_2.6.x86_64
qemu-kvm-rhev-2.3.0-31.el7_2.25.x86_64
spice-server-0.12.4-15.el7_2.2.x86_64

Target host:
libvirt-3.2.0-6.virtcov.el7.x86_64
qemu-kvm-rhev-2.9.0-6.el7.x86_64
spice-server-0.12.8-2.el7.x86_64

Comment 8 Jiri Denemark 2017-05-31 10:07:10 UTC
Well, the fix was on the source and you're still using an old unfixed version there. Thus this is definitely not something the patch from comment 5 could fix. However, from its commit message I'd say the new libvirt on the target host should not even send a graphics cookie, but apparently it did. Could you be so kind and share debug logs from both sides with us?

Comment 9 Fangge Jin 2017-05-31 10:41:28 UTC
Created attachment 1283695 [details]
libvirtd log for comment 7

Comment 10 Jiri Denemark 2017-05-31 11:26:11 UTC
Hmm, this seems to be a different issue according to logs. There's no graphics cookie sent by the target libvirtd and there's no message in the log which would indicate the source is waiting for spice migration to finish. It seems the migration just hangs somewhere in qemuMigrationConfirmPhase just after parsing the migration cookie. Could you please attach gdb to the source daemon once it hangs and attach a complete backtrace of it?

Comment 11 Fangge Jin 2017-05-31 11:51:40 UTC
Created attachment 1283714 [details]
backtrace of libvirtd

Comment 12 Fangge Jin 2017-05-31 11:54:47 UTC
(In reply to Jiri Denemark from comment #10)
> Hmm, this seems to be a different issue according to logs. There's no
> graphics cookie sent by the target libvirtd and there's no message in the
> log which would indicate the source is waiting for spice migration to
> finish. It seems the migration just hangs somewhere in
> qemuMigrationConfirmPhase just after parsing the migration cookie. Could you
> please attach gdb to the source daemon once it hangs and attach a complete
> backtrace of it?

Maybe there is a misunderstanding between us?  I'm always testing vm migration from RHEL 7.2 to newer release from the beginning.

Comment 13 Jiri Denemark 2017-05-31 12:26:41 UTC
OK so the source is really waiting for SPICE migration to finish according to the backtrace even though the new libvirt on the destination does not (correctly) send anything about graphics in the migration cookie.

And this bug is about fixing the source, so you can't really test it by using the old unfixed libvirt on the source. You could try to update the source libvirtd, but keep old qemu-kvm-rhev there.

Comment 14 Fangge Jin 2017-06-01 05:00:35 UTC
(In reply to Jiri Denemark from comment #13)
> OK so the source is really waiting for SPICE migration to finish according
> to the backtrace even though the new libvirt on the destination does not
> (correctly) send anything about graphics in the migration cookie.
> 
> And this bug is about fixing the source, so you can't really test it by
> using the old unfixed libvirt on the source. You could try to update the
> source libvirtd, but keep old qemu-kvm-rhev there.

Migration between newer libvirt versions or between older libvirt versions or from newer libvirt version to older libvirt version can always succeed, this bug only exists when migration from older libvirt version(RHEL7.2) to newer libvirt version. 
So I think if I can update libvirt version before do migration, the migration can succeed even before the fix.

Comment 15 Jiri Denemark 2017-06-01 08:19:13 UTC
I don't really remember any details about this issue, but commit 4 suggests QEMU version matters too. Which is why I suggested updating just libvirt and using it with older QEMU.

Comment 16 Fangge Jin 2017-06-01 12:19:31 UTC
(In reply to Jiri Denemark from comment #15)
> I don't really remember any details about this issue, but commit 4 suggests
> QEMU version matters too. Which is why I suggested updating just libvirt and
> using it with older QEMU.

After update libvirt to latest version, spice migration can be completed. But I still think this bug can't be verified, because usually we don't update libvirt version when testing cross migration

Comment 17 Fangge Jin 2017-06-07 07:06:39 UTC
Hi Jirka

Now, the test results are as below. It seems that the problem can only be fixed on source side(which should be not possible), should we just close this bug as won'tfix?

Direction      Result
7.2->7.3       Fail    ===> This one is what's reported in comment 0.
7.2->7.4       Fail

7.3->7.2       Succeed
7.4->7.2       Succeed

7.2->7.2       Succeed
7.3->7.3       Succeed
7.4->7.4       Succeed

Comment 18 Jiri Denemark 2017-06-08 10:27:06 UTC
I think CURRENTRELEASE would be a better reason.

Comment 19 Fangge Jin 2017-06-13 05:50:12 UTC
(In reply to Jiri Denemark from comment #18)
> I think CURRENTRELEASE would be a better reason.

OK, I have no objection. Will you do it? Or should I do it?


Note You need to log in before you can comment on or make changes to this bug.