725009 – spice-client: semi-seamless migration support

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 725009 - spice-client: semi-seamless migration support

Summary: spice-client: semi-seamless migration support

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	spice-client
Sub Component:
Version:	6.1
Hardware:	Unspecified
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Yonit Halperin
QA Contact:	Desktop QE
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	738311 (view as bug list)
Depends On:	737921
Blocks:	727602 738262 738266 738268 738270 749151
TreeView+	depends on / blocked

Reported:	2011-07-22 15:00 UTC by Marian Krcmarik
Modified:	2013-01-10 00:07 UTC (History)
CC List:	27 users (show)
Fixed In Version:	spice-client-0.8.2-7.el6
Doc Type:	Bug Fix
Doc Text:	The SPICE client failed to connect to the SPICE server on the target host after a virtual machine had been migrated to a remote machine. This happened when the migration of the virtual machine took longer than the expiration time of the SPICE ticket that was set on the target host. Without a valid password, the SPICE server refused connection from the SPICE client and the SPICE session had to be closed. To prevent this problem, support for spice semi-seamless migration has been added. Other components such as spice-protocol, spice-server and qemu-kvm have also been modified to support this feature. SPICE now allows the SPICE client to connect to the SPICE server on the target host at the very start of the virtual machine migration, just before the migrate monitor command is given to the qemu-kvm application. With a valid ticket on the target host, the SPICE ticket on the destination no longer expires and the SPICE client now remains open when the virtual machine migration is done.
Clone Of:
Clones:	727602 737921 738262 738266 738268 738270 749151 (view as bug list)
Environment:
Last Closed:	2011-12-06 15:22:50 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
async client_migrate_info patch (3.41 KB, patch) 2011-09-13 14:47 UTC, Gerd Hoffmann	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2011:1518	0	normal	SHIPPED_LIVE	libcacard and spice-client bug fix and enhancement update	2011-12-06 00:50:43 UTC

Description Marian Krcmarik 2011-07-22 15:00:46 UTC

Description of problem:
Spice client window is closed during migration because no spice password is set even though ticketing is enabled on target host qemu process. See message from qemu log:
reds_handle_ticket: Ticketing is enabled, but no password is set. please set a ticket first
I am not really sure which component to blame but migration through libvirt with set Spice password on source host works fine and Spice client is not closed during migration and the same ticket is set on target host.

From spice client log:
1311265157 INFO [3551:3551] Application::switch_host: host=XXXX.spice.lab.eng.brq.redhat.com port=5906 sport=5907
1311265158 INFO [3551:3552] RedPeer::connect_unsecure: Trying X.X.X.X 5907
1311265158 INFO [3551:3552] RedPeer::connect_unsecure: Connected to X.X.X.X 5907
1311265158 WARN [3551:3552] RedChannel::run: connect failed 7
1311265158 INFO [3551:3551] main: Spice client terminated (exitcode = 3)

- according to exit code, authentication was not successful.

Version-Release number of selected component (if applicable):
Always

How reproducible:
RHEVM3.0 (ic130.1)
Guest: Any
HOSTS: 2.6.32-169.el6.x86_64 - qemu-kvm-0.12.1.2-2.171, libvirt-0.9.3-5, vdsm-4.9-81, spice-server-0.8.1-2
Client: RHEL6.2, spice-client-0.8.0-2

Steps to Reproduce:
1. Connect to a guest using RHEVM User Portal.
2. Migrate guest using Admin portal while spice client session is opened

Actual results:
Spice client closes

Expected results:
Spice client does not close and has proper ticket set.

Additional info:

Comment 1 Dave Allan 2011-07-26 02:29:48 UTC

(In reply to comment #0)
> I am not really sure which component to blame but migration through libvirt
> with set Spice password on source host works fine and Spice client is not
> closed during migration and the same ticket is set on target host.

Given that, this seems to me like it's not a libvirt bug; am I reading that correctly that you're saying that the behavior is correct when you use libvirt alone?

Comment 2 Vivian Bian 2011-07-26 05:43:47 UTC

(In reply to comment #1)
> (In reply to comment #0)
> > I am not really sure which component to blame but migration through libvirt
> > with set Spice password on source host works fine and Spice client is not
> > closed during migration and the same ticket is set on target host.
> 
> Given that, this seems to me like it's not a libvirt bug; am I reading that
> correctly that you're saying that the behavior is correct when you use libvirt
> alone?

tested with libvirt-0.9.3-7.el6.x86_64 without RHEVM 
steps 
1. configure guest with spice graphical framebuffer 
    <graphics type='spice' port='5900' tlsPort='5901' autoport='yes' listen='0.0.0.0' passwd='cccddd' connected='disconnect'/>

2. start guest 

3. connect the spice client monitor on the source machine 
    spicec -h 10.66.4.220 -p 5900 -s 5901 --host-subject "C=IL,L=Raanana,O=Red Hat,CN=my server" --ca-file /etc/pki/libvirt-spice/ca-cert.pem --secure-channels main --enable-channels all -w cccddd

4. migrate the guest to remote machine 

5. try to reconnect the guest from destination machine of the migration .
   spicec -h 10.66.4.241 -p 5900 -s 5901 --host-subject "C=IL,L=Raanana,O=Red Hat,CN=my server" --ca-file /etc/pki/libvirt-spice/ca-cert.pem --secure-channels main --enable-channels all -w cccddd

Actual result:
1. after step4 , the connection on the resource machine spice client didn't get cut , the session could be kept . 

2. trying to connect the guest on destination machine with the previous spice password on source machine succeeded . 

3. virsh dumpxml --security-info guest |grep grap on destination machine . The xml section keeps the same with the source machine one . 
    <graphics type='spice' port='5900' tlsPort='5901' autoport='yes' listen='0.0.0.0' passwd='cccddd' connected='disconnect'/> 

Questions:
1. Marian, would you please provide the exact configuration of your guest 
   a. virsh dumpxml --security-info guest > guest.xml 
   b. please paste the /etc/libvirt/qemu.conf

   please paste the above info from source machine and destination machine . 

2. Please provide /var/log/libvirt/libvirtd.log on the destination machine and source machine if there is the error message associated to this bug . 

3. Did you copy the Certification info files to the destination machine ? Mean files under /etc/pki/vdsm/xxxx 

P.S. From us libvirt QE , we didn't encounter this bug with the libvirt 0.9.3-7. Please have a try with the latest libvirt build . If you can still meet this bug . please update comment with version info of libvirt , vdsm , spice-client and spice-server

Comment 3 Vivian Bian 2011-07-26 05:56:38 UTC

(In reply to comment #2)

> tested with libvirt-0.9.3-7.el6.x86_64 without RHEVM 
libvirt-0.9.3-7.el6.x86_64
spice-server-0.8.1-2.el6.x86_64
spice-client-0.8.0-2.el6.x86_64

Comment 4 Marian Krcmarik 2011-07-26 18:53:29 UTC

(In reply to comment #1)
> (In reply to comment #0)
> > I am not really sure which component to blame but migration through libvirt
> > with set Spice password on source host works fine and Spice client is not
> > closed during migration and the same ticket is set on target host.
> 
> Given that, this seems to me like it's not a libvirt bug; am I reading that
> correctly that you're saying that the behavior is correct when you use libvirt
> alone?

Yes, by "migration through libvirt with set Spice password on source host works fine" I meant that migration through libvirt worked fine and that's why I filed it for vdsm (even though I do not have deep knowledge of interaction vdsm+libvirt).

Anyway I did some more researching and testing. I found out after many tries that I am hitting two different bugs (at least I do believe).

1. During investigation how VDSM sets a ticket I noticed that VDSM calls sth like:
setVmTicket(6f4ae0c9-e8c2-491e-a929-1cfd8d885344, f5TOvrKcjmTP, 120, disconnect) and I assume that the number 120 means time in seconds during which the ticket is valid and that imho looks like the problem. I assume libvirt takes this number and set attribute "passwdValidTo" to actual time + 120 seconds. 
Once user does the migration after more than 120 seconds, libvirt most likely sees that ticket is not valid anymore and that's why libvirt does not set the same ticket with correct "passwdValidTo" attribute (actual time + 120 seconds) on the target host qemu process. Once spice-client is reconnecting to the target host with the same ticket authentication cannot be successful and clients is terminated with code which means authentication problem.
Just a theory but my testing indicates that (I did not look at any source code).

2. Bug I reported separately: https://bugzilla.redhat.com/show_bug.cgi?id=725854
I believe Spice server is not able to handle multiple migration so that migration to another host and back will end up in terminated spice client with different exit code. I believe nothing to do with libvirt, the guest is migrated correctly with correct spice ticket.

I used latest RHEVM3.0 (ic134) build with libvirt-0.9.3-8. and for libvirt testing: libvirt-0.9.3-7, qemu-kvm-0.12.1.2-2.171.

Comment 5 Ayal Baron 2011-07-26 20:31:40 UTC

(In reply to comment #4)
> (In reply to comment #1)
> > (In reply to comment #0)
> > > I am not really sure which component to blame but migration through libvirt
> > > with set Spice password on source host works fine and Spice client is not
> > > closed during migration and the same ticket is set on target host.
> > 
> > Given that, this seems to me like it's not a libvirt bug; am I reading that
> > correctly that you're saying that the behavior is correct when you use libvirt
> > alone?
> 
> Yes, by "migration through libvirt with set Spice password on source host works
> fine" I meant that migration through libvirt worked fine and that's why I filed
> it for vdsm (even though I do not have deep knowledge of interaction
> vdsm+libvirt).
> 
> Anyway I did some more researching and testing. I found out after many tries
> that I am hitting two different bugs (at least I do believe).
> 
> 1. During investigation how VDSM sets a ticket I noticed that VDSM calls sth
> like:
> setVmTicket(6f4ae0c9-e8c2-491e-a929-1cfd8d885344, f5TOvrKcjmTP, 120,
> disconnect) and I assume that the number 120 means time in seconds during which
> the ticket is valid 

This assumption is correct (120 is the "valid to" in seconds)

> and that imho looks like the problem. I assume libvirt
> takes this number and set attribute "passwdValidTo" to actual time + 120
> seconds. 
> Once user does the migration after more than 120 seconds, libvirt most likely
> sees that ticket is not valid anymore and that's why libvirt does not set the
> same ticket with correct "passwdValidTo" attribute (actual time + 120 seconds)
> on the target host qemu process. Once spice-client is reconnecting to the
> target host with the same ticket authentication cannot be successful and
> clients is terminated with code which means authentication problem.
> Just a theory but my testing indicates that (I did not look at any source
> code).
> 
> 2. Bug I reported separately:
> https://bugzilla.redhat.com/show_bug.cgi?id=725854
> I believe Spice server is not able to handle multiple migration so that
> migration to another host and back will end up in terminated spice client with
> different exit code. I believe nothing to do with libvirt, the guest is
> migrated correctly with correct spice ticket.
> 
> I used latest RHEVM3.0 (ic134) build with libvirt-0.9.3-8. and for libvirt
> testing: libvirt-0.9.3-7, qemu-kvm-0.12.1.2-2.171.

Comment 6 Dan Kenigsberg 2011-07-26 20:51:38 UTC

Marian, Others, I might have misunderstood the bug. I'm afraid I still don't.

You migrate the VM, while spice client is connected, and then everything works
fine. Right?

Now you disconnect spice client, and fail to reconnect with a long-expired
password? Why is this a bug?

Comment 7 Itamar Heim 2011-07-26 21:03:13 UTC

(In reply to comment #5)
> (In reply to comment #4)
> > (In reply to comment #1)
> > > (In reply to comment #0)
> > > > I am not really sure which component to blame but migration through libvirt
> > > > with set Spice password on source host works fine and Spice client is not
> > > > closed during migration and the same ticket is set on target host.
> > > 
> > > Given that, this seems to me like it's not a libvirt bug; am I reading that
> > > correctly that you're saying that the behavior is correct when you use libvirt
> > > alone?
> > 
> > Yes, by "migration through libvirt with set Spice password on source host works
> > fine" I meant that migration through libvirt worked fine and that's why I filed
> > it for vdsm (even though I do not have deep knowledge of interaction
> > vdsm+libvirt).
> > 
> > Anyway I did some more researching and testing. I found out after many tries
> > that I am hitting two different bugs (at least I do believe).
> > 
> > 1. During investigation how VDSM sets a ticket I noticed that VDSM calls sth
> > like:
> > setVmTicket(6f4ae0c9-e8c2-491e-a929-1cfd8d885344, f5TOvrKcjmTP, 120,
> > disconnect) and I assume that the number 120 means time in seconds during which
> > the ticket is valid 
> 
> This assumption is correct (120 is the "valid to" in seconds)

but assumption on behavior is not. spice live migration is supposed to deal with this use case (which is mostly the only relevant one).

Comment 8 Marian Krcmarik 2011-07-26 21:26:09 UTC

(In reply to comment #6)
> Marian, Others, I might have misunderstood the bug. I'm afraid I still don't.
> 
> You migrate the VM, while spice client is connected, and then everything works
> fine. Right?
No, No, It does not, I am saying "reconnecting" cause that's what spice-client does during migration automatically, It gets "an order" to reconnect to a new host from spice-server. But from user perspective It's not "reconnecting". User just can observe quick flash and not doing any reconnection action.

> 
> Now you disconnect spice client, and fail to reconnect with a long-expired
> password? Why is this a bug?

Comment 9 Marian Krcmarik 2011-07-26 21:42:40 UTC

(In reply to comment #7)

> > 
> > This assumption is correct (120 is the "valid to" in seconds)
> 
> but assumption on behavior is not. spice live migration is supposed to deal
> with this use case (which is mostly the only relevant one).

It's true that this quick flash of spice client during migration caused by automatic reconnection of spice client is ugly and reconnection is causing problems. So that this bug should be passed to spice-server/client for design change? (I am trying to recall how it was dealt in rhel5 qspice).

Comment 19 Uri Lublin 2011-07-27 12:24:18 UTC

(In reply to comment #16)
> for how long? the live migration can take more than the lifespan of the ticket
> if it is set in the beginning of the live migration

I'm suggesting setting the lifespan of the ticket to be longer than any
migration is expected to be, practically making sure this ticket never expires.

Comment 20 Dan Kenigsberg 2011-07-27 12:37:23 UTC

(In reply to comment #19)
And I am suggesting that libvirt does it just before spice client is expected to reconnected (when migration is almost done).

Comment 21 Itamar Heim 2011-07-28 06:14:15 UTC

(In reply to comment #19)
> (In reply to comment #16)
> > for how long? the live migration can take more than the lifespan of the ticket
> > if it is set in the beginning of the live migration
> 
> I'm suggesting setting the lifespan of the ticket to be longer than any
> migration is expected to be, practically making sure this ticket never expires.

while i hope this will be fixed - just fyi we have a customer which live migration takes 20 hours (not desktop workload, and should be better on rhel 6).
point is we never can tell what's 'expected'.

Comment 36 Daniel Berrangé 2011-08-10 09:15:18 UTC

> The spice side possible fix is discussed in comments 32 and 33. It involves
> changes both on the server and the client side (spicec and spicey).
> However, event if we connect immediately after migration start, isn't there
> still a race? Can't the ticket on the dest expire before the connection?

That "race" condition is no different to a potential race condition for a spice client connecting to QEMU at any time when password expiry is set.

eg, Consider starting a new VM from scratch (no migration involved)

 1. QEMU process is started
 2. Password is set with an expiry time "N"
 3. QEMU cpus are started
 4. Spice client connects

In this normal sequence, the time between steps 2 and 4, must not exceed "N".

IMHO, the goal for migration, should be that we can allow for (almost) the same scenario

So for migration, if we can get RHEL-6 QEMU to go through this sequence

  1. QEMU process on dest is started with -incoming
  2. QEMU dest has password is set with an expiry time "N"
  3. libvirt/vdsm issues 'client_migrate_info spice hostname portnumber'
  4. libvirt/vdsm issues 'migrate' command to QEMU
  5. QEMU tells SPICE migration has started
  6. Spice tells client migration has started, giving it new QEMU connection
details
  7. Spice client connects to dest QEMU, but doesn't switch displays
  ...some time passes...
  8. Migration completes

In this sequence, the time between steps 2 and step 7, must not exceed 'N". This time gap is going to be approx the same as the time between steps '2' and '4' during normal VM startup sequence.

So the "race" with migration will be no different from that a SPICE client experiences when connecting to a non-migrating VM. The real key is that we don't want an arbitrary long delay between setting of the password, and a client being able to connect. By letting the client connect right at the start of migration, the delay will be very minimal - a few seconds at the most, which is easily coped with by setting a nice short expiry time (15-30 seconds perhaps).

Comment 37 Yonit Halperin 2011-09-07 10:30:46 UTC

Daniel/Dave:
The target can not accept connections during migration. Thus, connecting spice client to the target must be done just before migration. We can do it upon client_migrate_info, but we need to hold migration from starting till we receive an ack from the client that it connected to the target (or failed/timeout).
In order to hold migration we can hold qemu-thread. However, the nicer solution would be to add client_migrate_info_async, and that libvirt will tell qemu to migrate only after it has been signalled by spice that client_migrate_info_async has completed. Can you do it?

Comment 38 Jiri Denemark 2011-09-08 09:42:25 UTC

It depends on what qemu can do. There are several options:

1. fix qemu to accept connections during incoming migration; this is the best
   option and requires no work on libvirt side
2. make client_migrate_info command block until spice client confirms it 
   successfully connected to the destination qemu; no change in libvirt is needed
   for this option either
3. do what you suggest with client_migrate_info_async; this would require some 
   modification to libvirt

Both 2 and 3 are just hacks around the real issue in 1 and would only make sense if 1 is significantly harder to implement in qemu than 2 or 3. That said, I'd prefer implementing 1 but if qemu guys prefer other ways we can do it in libvirt as well. I'd like to avoid option 3 since it would mean issuing qemu monitor command and waiting for an event before proceeding with the migration. It's certainly doable but it just doesn't seem right as the command is synchronous in its nature.

Anyway, most of the work needs to be done in qemu so this should also be discussed with qemu guys.

Comment 39 Yonit Halperin 2011-09-08 09:53:23 UTC

(In reply to comment #38)
> It depends on what qemu can do. There are several options:
> 
> 1. fix qemu to accept connections during incoming migration; this is the best
>    option and requires no work on libvirt side
> 2. make client_migrate_info command block until spice client confirms it 
>    successfully connected to the destination qemu; no change in libvirt is
> needed
>    for this option either
> 3. do what you suggest with client_migrate_info_async; this would require some 
>    modification to libvirt
> 
> Both 2 and 3 are just hacks around the real issue in 1 and would only make
> sense if 1 is significantly harder to implement in qemu than 2 or 3. That said,
> I'd prefer implementing 1 but if qemu guys prefer other ways we can do it in
> libvirt as well. I'd like to avoid option 3 since it would mean issuing qemu
> monitor command and waiting for an event before proceeding with the migration.
> It's certainly doable but it just doesn't seem right as the command is
> synchronous in its nature.
> 
> Anyway, most of the work needs to be done in qemu so this should also be
> discussed with qemu guys.

Dor, comments?

Comment 40 Dor Laor 2011-09-11 11:11:53 UTC

(In reply to comment #39)
> (In reply to comment #38)
> > It depends on what qemu can do. There are several options:
> > 
> > 1. fix qemu to accept connections during incoming migration; this is the best
> >    option and requires no work on libvirt side


The monitor is blocked on incoming connections so if you like to set a password through the monitor it won't work.
Gerd, what's your opinion?


> > 2. make client_migrate_info command block until spice client confirms it 
> >    successfully connected to the destination qemu; no change in libvirt is
> > needed
> >    for this option either
> > 3. do what you suggest with client_migrate_info_async; this would require some 
> >    modification to libvirt
> > 
> > Both 2 and 3 are just hacks around the real issue in 1 and would only make
> > sense if 1 is significantly harder to implement in qemu than 2 or 3. That said,
> > I'd prefer implementing 1 but if qemu guys prefer other ways we can do it in
> > libvirt as well. I'd like to avoid option 3 since it would mean issuing qemu
> > monitor command and waiting for an event before proceeding with the migration.
> > It's certainly doable but it just doesn't seem right as the command is
> > synchronous in its nature.
> > 
> > Anyway, most of the work needs to be done in qemu so this should also be
> > discussed with qemu guys.
> 
> Dor, comments?

Comment 41 Jiri Denemark 2011-09-12 08:53:06 UTC

(In reply to comment #40)
> (In reply to comment #39)
> > (In reply to comment #38)
> > > It depends on what qemu can do. There are several options:
> > > 
> > > 1. fix qemu to accept connections during incoming migration; this is the
> > > best option and requires no work on libvirt side
> 
> The monitor is blocked on incoming connections so if you like to set a password
> through the monitor it won't work. Gerd, what's your opinion?

The issue here is a bit different. For seamless spice migration, spice client has to connect to destination qemu when migration starts and switch to displaying data from there once migration finishes. The problem is that destination qemu apparently doesn't accept connection from spice client when it is receiving migration data from source qemu. Relevant part of this BZ starts in comment 36.

Comment 42 Gerd Hoffmann 2011-09-12 08:59:25 UTC

I'd love to see (1), but I expect this isn't going to happen in RHEL 6.x, maybe possible in RHEL-7.

(2) is the second best option.  We need async monitor commands for that though.  Blocking the monitor until the connection is established should be ok, blocking the iothread that long is out of question.  A quick grep through the rhel6 sources shows there seems to be some async command command support and one user (balloon), so I think this should be doable.

(3) sucks big time because we have to change the libvirt <=> qemu interface because of a implementation detail libvirt should not have to worry about.  Last ressort if nothing else works.

Comment 43 Yonit Halperin 2011-09-12 09:22:58 UTC

(In reply to comment #42)
> I'd love to see (1), but I expect this isn't going to happen in RHEL 6.x, maybe
> possible in RHEL-7.
> 
> (2) is the second best option.  We need async monitor commands for that though.
>  Blocking the monitor until the connection is established should be ok,
> blocking the iothread that long is out of question.  A quick grep through the
> rhel6 sources shows there seems to be some async command command support and
> one user (balloon), so I think this should be doable.
If we are going to implement an async command, I don't think that blocking the monitor is the right thing to do. We only want to prevent a migrate command, and not all the commands. Also, do you think upstream will accept such a solution? I still think that if (1) is not possible, adding libvirt an async command is the best choice.

BTW, it doesn't look like do_balloon is actually asynchronous; it calls the completion cb immediately,
> 
> (3) sucks big time because we have to change the libvirt <=> qemu interface
> because of a implementation detail libvirt should not have to worry about. 
> Last ressort if nothing else works.

Comment 44 Yonit Halperin 2011-09-12 10:56:48 UTC

(In reply to comment #42)
> I'd love to see (1), but I expect this isn't going to happen in RHEL 6.x, maybe
> possible in RHEL-7.
> 
> (2) is the second best option.  We need async monitor commands for that though.
>  Blocking the monitor until the connection is established should be ok,
> blocking the iothread that long is out of question.  A quick grep through the
> rhel6 sources shows there seems to be some async command command support and
> one user (balloon), so I think this should be doable.
> 
> (3) sucks big time because we have to change the libvirt <=> qemu interface
> because of a implementation detail libvirt should not have to worry about. 
> Last ressort if nothing else works.

Another option is to change qemu migration to have similar functionality to the one we had in RHEL5: (1) notify on migration start before it actually start (2)
add support for asynchronous migration state notifiers. i.e., continue migration only after the async migration notifiers completed.
Can it be done?

Comment 45 Gerd Hoffmann 2011-09-12 11:32:55 UTC

Re #43: I don't think blocking the monitor is a problem there.  Unter normal circumstances it should be a few seconds at most, and I doubt libvirt is going to send other commands in that situation, it just waits for the client_info command to finish so it can send the migrate command.

Re #44: notify-on-migration-start is probably too late.  Even if the source didn't send anything yet I think the target qemu is in blocking state already where it doesn't accept new connections any more.

Comment 46 Yonit Halperin 2011-09-12 11:41:12 UTC

(In reply to comment #45)
> Re #43: I don't think blocking the monitor is a problem there.  Unter normal
> circumstances it should be a few seconds at most, and I doubt libvirt is going
> to send other commands in that situation, it just waits for the client_info
> command to finish so it can send the migrate command.
> 
> Re #44: notify-on-migration-start is probably too late.  Even if the source
> didn't send anything yet I think the target qemu is in blocking state already
> where it doesn't accept new connections any more.
I meant changing the start notification (or adding an additional one), so that it will be called before blocking the target.
I think it is the best solution. Connecting the client to the target upon client_migrate_info is just a workaround.

Comment 47 Jiri Denemark 2011-09-12 12:36:36 UTC

(In reply to comment #45)
> Re #43: I don't think blocking the monitor is a problem there.  Unter normal
> circumstances it should be a few seconds at most, and I doubt libvirt is going
> to send other commands in that situation, it just waits for the client_info
> command to finish so it can send the migrate command.

Actually, I also think blocking the monitor is not the best idea (unfortunately, since it would have been easier for libvirt :-)) In normal situation, libvirt doesn't want to send anything until client_info finishes. But since the command has a side effect of spice client connecting to destination qemu, client_info can take very long time in case something doesn't work as expected and in that case, libvirt user should be able to abort the operation (just like migration itself can be aborted).

Comment 48 Gerd Hoffmann 2011-09-12 12:48:39 UTC

Re #46: Impossible to do inside qemu as it needs coordination of the two qemu running on source and target machine.  I see connecting in client_migrate_info as workaround too, and I hope we can kill that one as soon as qemu can handle incoming connections in parallel to incoming migration (and that switch can be fully transparent to libvirt).

Re #47: I think we can get away here with a pretty agressive timeout.  If the connection isn't up'n'running within -- say -- 5 seconds your network connection is too slow for spice anyway.

Comment 49 Yonit Halperin 2011-09-12 13:10:58 UTC

(In reply to comment #48)
> Re #46: Impossible to do inside qemu as it needs coordination of the two qemu
> running on source and target machine.  I see connecting in client_migrate_info
> as workaround too, and I hope we can kill that one as soon as qemu can handle
> incoming connections in parallel to incoming migration (and that switch can be
> fully transparent to libvirt).
> 
I don't understand: migration is triggered by a command in the src, so why can't we notify about the planned migration before qemu actually starts it?

> Re #47: I think we can get away here with a pretty agressive timeout.  If the
> connection isn't up'n'running within -- say -- 5 seconds your network
> connection is too slow for spice anyway.
Yes, a timeout is planned.

Comment 50 Gerd Hoffmann 2011-09-12 14:18:06 UTC

Re #49:  libvirt coordinates the whole thing.  client_migrate_info is sent to the source before libvirt asks the target to start receiving the migration, thats why you will be able to connect to the target qemu.  Then libvirt kicks the migration process on both ends.  The target stops accepting new connections, and no notifier in the "migrate" command handling path on the source can do anything useful.

Comment 51 Yonit Halperin 2011-09-12 14:33:33 UTC

(In reply to comment #50)
> Re #49:  libvirt coordinates the whole thing.  client_migrate_info is sent to
> the source before libvirt asks the target to start receiving the migration,
> thats why you will be able to connect to the target qemu.  Then libvirt kicks
> the migration process on both ends.
What is kicked by libvirt on the target side that is independent from the migrate command on the src side? Doesn't the migrate command on the src side kicks the migration on both ends?
> The target stops accepting new
> connections, and no notifier in the "migrate" command handling path on the
> source can do anything useful.

Comment 52 Jiri Denemark 2011-09-12 14:36:01 UTC

(In reply to comment #48)
> Re #47: I think we can get away here with a pretty agressive timeout.  If the
> connection isn't up'n'running within -- say -- 5 seconds your network
> connection is too slow for spice anyway.

OK, fair enough, we don't need to send any commands during that time then.

Comment 53 Uri Lublin 2011-09-12 15:27:51 UTC

(re #46 - #50)
On the destination libvirt starts qemu-kvm with a '-incoming <params>' option (no change in libvirt). The (blocked) migration starts upon accept. Spice client connections are accepted before that.

Theoretically on the source qemu-kvm can call pre-migration notifiers before connecting to the migration-port on the destination.

I think that's what Yonit meant on #46

Comment 55 Yonit Halperin 2011-09-13 05:08:53 UTC

(In reply to comment #54)
> Indeed, with the successful accept() system call the blocking phase starts (for
> tcp:...).  I assumed there is a separate monitor command on the target side,
> but that isn't the case.
> 
> I still think doing it in the client_migrate_info is better:
> 
> (1) There is infrastructure present in qemu to run monitor commands async, i.e.
> without blocking the iothread.  That isn't the case for the notifiers.
> 
> (2) It looks fragile to me to assume it is save to build the spice connection
> before we start processing the "migrate" command on the source.  libvirt has
> support for tunneling the migration data, in that case the data pipe isn't a
> direct tcp connection between source and target.  Will the scheme still work
> then?
I think it will still work. Uri?
async notifiers for migration + adding 1 notifier before migration start + adding 1 notifier before starting the target, is what we are missing in order to be able to implement a really seamless migration.
If it is not possible for Rhel6.2, then we need the support for an asynchronous client_migrate_info. Gerd/Dor can you send a patch for asynchronous commands that block the monitor (if you are still convinced it shouldn't involve libvirt)?

BTW - just to make clear, since we don't have the above notifiers, we can't implement a real seamless migration. The discussed bug is about connecting spice client to the target host as migration starts and not when it completes.
When migration completes, the communication to the target starts from scratch.

Comment 56 Uri Lublin 2011-09-13 12:02:43 UTC

(In reply to comment #54)
> (2) ...
> libvirt has
> support for tunneling the migration data, in that case the data pipe isn't a
> direct tcp connection between source and target.  Will the scheme still work
> then?

There may be a problem with libvirt's tunnelled migration, e.g. if the destination qemu-kvm accepts a connection (e.g. a unix domain socket) before the migration starts.

This problem may happen with switch-host too.
For example, if libvirt on the destination connects to qemu-kvm before letting libvirt on the source know the port number to pass to migrate-info command.
I think in such a case the destination qemu-kvm might not accept spice-client connection following a migrate-info command (as it is already in a "blocked phase").

Both of the above are theoretical. We need to look at libvirt code or ask the libvirt team to know.

Comment 57 Yonit Halperin 2011-09-13 12:56:45 UTC

(In reply to comment #56)
> (In reply to comment #54)
> > (2) ...
> > libvirt has
> > support for tunneling the migration data, in that case the data pipe isn't a
> > direct tcp connection between source and target.  Will the scheme still work
> > then?
> 
> There may be a problem with libvirt's tunnelled migration, e.g. if the
> destination qemu-kvm accepts a connection (e.g. a unix domain socket) before
> the migration starts.
> 
> This problem may happen with switch-host too.
> For example, if libvirt on the destination connects to qemu-kvm before letting
> libvirt on the source know the port number to pass to migrate-info command.
> I think in such a case the destination qemu-kvm might not accept spice-client
> connection following a migrate-info command (as it is already in a "blocked
> phase").
> 
> Both of the above are theoretical. We need to look at libvirt code or ask the
> libvirt team to know.
Jiri/Daniel, can you please answer this? i.e., in case of tunnelled migration, will it be even possible for spice client to connect to the target qemu upon client_migrate_info?

Comment 58 Gerd Hoffmann 2011-09-13 14:47:37 UTC

Created attachment 522943 [details]
async client_migrate_info patch

Re 55: attached is a quick&dirty proof-of-concept patch which turns client_migrate_info into a async monitor command command.  It will just wait 5 seconds before calling the completion callback, which should be long enougth for the spice client to build the connection to the target.

Comment 59 Daniel Berrangé 2011-09-15 12:07:24 UTC

@yonit c#57

Under a 'plain' libvirt migrate, then handshake is controlled by the libvirt client application, talking to the source and dest libvirtds directly.

1. Client->Source: Begin
2. Client->Dest: Prepare
- Launches QEMU -incoming
- Connects to monitor
- Sets passwords, etc
3. Client->Source: Perform
- Run client_migrate_info
- Run migrate_setspeed, etc
- Run migrate
- Loop
- Run 'info migrate' until complete/error/abort
4. Client->Dest: Finish
- If success, then
- Start CPUs
- Else
- Kill QEMU
5. Client->Source: Confirm
- If success, then
- Kill QEMU
- Else
- Start CPUs

With libvirt Peer-2-Peer migration, the client only talks to the source libvirtd. The source libvirt in turn talks to the dest libvirtd. The flow is thus

1. Client->Source: Perform
1.1: Source: Begin
1.2. Source->Dest: Prepare
- Launches QEMU -incoming
- Connects to monitor
- Sets passwords, etc
1.3. Source: Perform
- Run client_migrate_info
- Run migrate_setspeed, etc
- Run migrate
- Loop
- Run 'info migrate' until complete/error/abort
1.4. Source->Dest: Finish
- If success, then
- Start CPUs
- Else
- Kill QEMU
1.5. Source: Confirm
- If success, then
- Kill QEMU
- Else
- Start CPUs

The tunnelled migration is a variant on the peer-2-peer migration. All the changes is the middle step where we spawn a background thread to handle tunnelling of the data:

1.3. Source: Perform
- Run client_migrate_info
- Run migrate_setspeed, etc
- Run migrate
- Spawn tunnelling thread
- Loop
- Run 'info migrate' until complete/error/abort

The tunnelling thread simply does

- Loop
- Read N bytes from QEMU migration FD
- Encoded N bytes in libvirt RPC and send to dest libvirtd

So in summary, I don't see any problems with SPICE seemless migration that would be unique to tunnelled migration.

Comment 60 Jiri Denemark 2011-09-15 13:01:04 UTC

Actually there is one difference between tunneled and non-tunneled migration. Normally, qemu is told to accept migration data on a tcp socket while with tunneled migration qemu gets migration data on stdin. So the question is if qemu is able to accept spice connection when it's asked to receive migration data on stdin.

Comment 61 Alon Levy 2011-09-19 11:07:36 UTC

wrt tunneling migration, tested and it doesn't seem to be a problem, can someone from libvirt verify this is equivalent to what libvirt does:

qemu -spice disable-ticketing,port=7777 -incoming fd:0

spicec -h localhost -p 7777


Spice connection established successfully.

If, on the other hand, I press enter on the qemu console, giving stdin some input, the spice connection fails.

Comment 62 Jiri Denemark 2011-09-19 11:47:39 UTC

Yeah, that should be equivalent to what libvirt does. It uses -incoming fd:N, exec:cat, or stdio depending on what qemu supports (i.e., it should use fd:N for current qemu-kvm in RHEL).

Comment 64 Yonit Halperin 2011-10-03 11:00:49 UTC

*** Bug 738311 has been marked as a duplicate of this bug. ***

Comment 66 David Jaša 2011-10-14 15:05:07 UTC

VERIFIED in 

hosts side:
qemu-kvm-0.12.1.2-2.195.el6.x86_64
libvirt-0.9.4-16.el6.x86_64
spice-server-0.8.2-4.el6.x86_64

client side:
spice-client 0.8.2-7

Comment 67 Marian Krcmarik 2011-10-26 10:20:50 UTC

The provided fix of this bug does not solve issue I originally reported. As far as I understand the provided fix solves situation when migration takes longer time than expiration time of ticket which is great to be fixed but I believe It's not the original bug I reported.

I reverted changes in vdsm from bug #727602 on my setup and behaviour is the same, result of migration on destination host is:
reds_handle_ticket: Ticketing is enabled, but no password is set. please set a ticket first.

Steps to reproduce (with reverted change in #727602):
1. Open spice session from RHEVM (any) portal.
2. Keep spice session opened and wait more than 2 mins.
3. Migrate.

I guess that maybe "nobody" sets a new expiration time for expired ticket at the beginning of migration (vdsm?). I can see the same expiration time of ticket is set on destination libvirt using virsh as It was set during connecting to VM using spice client on source host. Once I do:
1. Open spice session from RHEVM portal(with reverted change in #727602 in vdsm) .
2. Wait 2 minutes 
3. Change expiration time manually using virsh update-device on particular domain and host to actual time+2mins (with connected='keep' attribute).
4. Migrate within 2 mins.
, then spice session is kept.
Since this bug solved the issue with migration taking longer time than expiration time of ticket and this issue is verified and bug is included in erratum I am cloning this bug for vdsm.

Comment 68 Yonit Halperin 2011-10-26 11:01:34 UTC

(In reply to comment #66)
> VERIFIED in 
> 
> hosts side:
> qemu-kvm-0.12.1.2-2.195.el6.x86_64
> libvirt-0.9.4-16.el6.x86_64
> spice-server-0.8.2-4.el6.x86_64
> 
> client side:
> spice-client 0.8.2-7

Hi,
what is the vdsm version? If it is >= vdsm-4.9-89, than ticketing is disabled. see comment #67.

Comment 69 David Jaša 2011-10-26 12:32:40 UTC

vdsm was -104 or -106 at the time of the verification, now it is -108. I realize I verifided semi-seamless migration in fact, not original issue reported by Marian.

Comment 70 Uri Lublin 2011-11-21 10:05:47 UTC

    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
The SPICE client failed to connect to the SPICE server on the target host after a virtual machine had been migrated to a remote machine. This happened when the migration of the virtual machine took longer than the expiration time of the SPICE ticket that was set on the target host. Without a valid password, the SPICE server refused connection from the SPICE client and the SPICE session had to be closed. To prevent this problem, support for spice semi-seamless migration has been added. Other components such as spice-protocol, spice-server and qemu-kvm have also been modified to support this feature. SPICE now allows the SPICE client to connect to the SPICE server on the target host at the very start of the virtual machine migration, just before the migrate monitor command is given to the qemu-kvm application. With a valid ticket on the target host, the SPICE ticket on the destination no longer expires and the SPICE client now remains open when the virtual machine migration is done.

Comment 71 errata-xmlrpc 2011-12-06 15:22:50 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1518.html

Note You need to log in before you can comment on or make changes to this bug.

abaron
acathrow
alevy
bazulay
berrange
cfergeau
dallan
danken
dblechte
djasa
dyuan
iheim
jdenemar
juzhang
kraxel
lkocman
marcandre.lureau
michen
mjenner
mkenneth
mzhan
rwu
tburke
uril
vbian
weizhan
yhalperi