Bug 836133 - spice migration: prevent race with libvirt
spice migration: prevent race with libvirt
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: qemu-kvm (Show other bugs)
6.4
Unspecified Unspecified
unspecified Severity unspecified
: rc
: ---
Assigned To: Yonit Halperin
Virtualization Bugs
: Reopened
Depends On: 867724
Blocks: 798682 836123 836135 846910 846911 894020
  Show dependency treegraph
 
Reported: 2012-06-28 03:44 EDT by Yonit Halperin
Modified: 2013-02-21 02:37 EST (History)
15 users (show)

See Also:
Fixed In Version: qemu-kvm-0.12.1.2-2.310.el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 836135 (view as bug list)
Environment:
Last Closed: 2013-02-21 02:37:39 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
bad result: libvirtd.log (316.93 KB, text/plain)
2012-10-12 04:33 EDT, Qunfang Zhang
no flags Details
good result: libvirtd.log (175.02 KB, text/plain)
2012-10-12 04:34 EDT, Qunfang Zhang
no flags Details
qemu log in the src host (41.09 KB, text/plain)
2012-10-15 05:05 EDT, Qunfang Zhang
no flags Details
qemu log in the destination host. (28.22 KB, text/plain)
2012-10-15 05:39 EDT, Qunfang Zhang
no flags Details
New: libvirtd.log on source host. (211.57 KB, text/plain)
2012-10-16 06:18 EDT, Qunfang Zhang
no flags Details
New: libvirtd.log on destination host. (179.77 KB, text/plain)
2012-10-16 06:19 EDT, Qunfang Zhang
no flags Details
New: qemu.log on source host. (5.42 KB, text/plain)
2012-10-16 06:20 EDT, Qunfang Zhang
no flags Details
New: qemu.log on destnation host. (2.08 KB, text/plain)
2012-10-16 06:20 EDT, Qunfang Zhang
no flags Details
Verified pass result (libvirtd.log) (473.30 KB, text/plain)
2012-11-06 05:19 EST, Qunfang Zhang
no flags Details

  None (edit)
Description Yonit Halperin 2012-06-28 03:44:38 EDT
Description of problem:
In order to support seamless migration (bug 836123) Spice needs to migrate its state, from the migration src to the target, when migration is completed. However, when libvirt identifies that migration has completed, it closes the src qemu, preventing spice from completing its migration.
We want to add a QMP event for signaling libvirt when spice migration has completed. libvirt will need to wait for this event before closing the src.
(spice api already contains a migration_completion callback for qemu).

In order for spice to recognize whether the installed libvirt supports this qmp event, we will also need to add a "seamless-migration=on" parameter to spice arguments in qemu. Only if this parameter is set, spice will execute the seamless migration pathway.
Comment 1 Luiz Capitulino 2012-07-02 10:05:54 EDT
Has this already been proposed upstream?

The process is to propose it upstream and get it merged. Afterwards, we open a bz to backport it to RHEL. I'll close this as NOTABUG if this hasn't been done yet.
Comment 2 RHEL Product and Program Management 2012-07-10 04:27:16 EDT
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.
Comment 3 RHEL Product and Program Management 2012-07-10 21:45:57 EDT
This request was erroneously removed from consideration in Red Hat Enterprise Linux 6.4, which is currently under development.  This request will be evaluated for inclusion in Red Hat Enterprise Linux 6.4.
Comment 4 Luiz Capitulino 2012-07-12 16:26:00 EDT
Closing as per comment 1. I think this bz is premature, as it's proposing one possible solution for a problem that has to be discussed upstream first.
Comment 5 Yonit Halperin 2012-07-15 01:03:26 EDT
(In reply to comment #4)
> Closing as per comment 1. I think this bz is premature, as it's proposing
> one possible solution for a problem that has to be discussed upstream first.

There has already been a discussion about it upstream, and this bug is based on it.
http://lists.gnu.org/archive/html/qemu-devel/2012-06/msg00815.html
Comment 6 Luiz Capitulino 2012-07-16 09:38:29 EDT
Yonit, now I understand the problem, although I'm not completely sure that the event is enough. For example, what happens if libvirt misses the event?

Let's do the following: I'll re-assign this bug for you. You can re-open it if you want, you post the patch(es) upstream and we discuss it there.
Comment 7 Dor Laor 2012-08-05 09:59:32 EDT
(In reply to comment #6)
> Yonit, now I understand the problem, although I'm not completely sure that
> the event is enough. For example, what happens if libvirt misses the event?

If libvirt was down (the only reason to miss the event), libvirt should query the migration status when it re-starts. The requires that the live migration info command should provide the exact details about the end of migration.
Anyway it's a rare case and worse case we'll have the client disconnect because libvirt was done, that's acceptable.

> Let's do the following: I'll re-assign this bug for you. You can re-open it
> if you want, you post the patch(es) upstream and we discuss it there.
Comment 8 Yonit Halperin 2012-08-05 10:14:17 EDT
(In reply to comment #7)
> (In reply to comment #6)
> > Yonit, now I understand the problem, although I'm not completely sure that
> > the event is enough. For example, what happens if libvirt misses the event?
> 
> If libvirt was down (the only reason to miss the event), libvirt should
> query the migration status when it re-starts. The requires that the live
> migration info command should provide the exact details about the end of
> migration.
> Anyway it's a rare case and worse case we'll have the client disconnect
> because libvirt was done, that's acceptable.
> 
There will be a problem if libvirt will endlessly wait for the above qmp event.
Michal suggested (https://bugzilla.redhat.com/show_bug.cgi?id=836135#c4) to add a state to the migration, "spice-migration-completed", instead of using qmp event.

> > Let's do the following: I'll re-assign this bug for you. You can re-open it
> > if you want, you post the patch(es) upstream and we discuss it there.
Comment 9 Dor Laor 2012-08-06 11:59:25 EDT
(In reply to comment #8)
> (In reply to comment #7)
> > (In reply to comment #6)
> > > Yonit, now I understand the problem, although I'm not completely sure that
> > > the event is enough. For example, what happens if libvirt misses the event?
> > 
> > If libvirt was down (the only reason to miss the event), libvirt should
> > query the migration status when it re-starts. The requires that the live
> > migration info command should provide the exact details about the end of
> > migration.
> > Anyway it's a rare case and worse case we'll have the client disconnect
> > because libvirt was done, that's acceptable.
> > 
> There will be a problem if libvirt will endlessly wait for the above qmp
> event.
> Michal suggested (https://bugzilla.redhat.com/show_bug.cgi?id=836135#c4) to
> add a state to the migration, "spice-migration-completed", instead of using
> qmp event.

It's pretty much the same case as we need both to publish it as an event base change and also add a info option in a synchronous way.

> 
> > > Let's do the following: I'll re-assign this bug for you. You can re-open it
> > > if you want, you post the patch(es) upstream and we discuss it there.
Comment 13 Qunfang Zhang 2012-10-12 04:30:13 EDT
Hello, Yonit
I verified this bug on the following packages, but find it may still have some problem, please help confirm it and correct me if I am wrong. Thanks. 

Packages:
[root@dhcp-8-207 ~]# uname -r
2.6.32-323.el6.x86_64
[root@dhcp-8-207 ~]# rpm -qa | grep qemu
qemu-kvm-0.12.1.2-2.320.el6.x86_64
qemu-kvm-debuginfo-0.12.1.2-2.320.el6.x86_64
qemu-guest-agent-0.12.1.2-2.316.el6.x86_64
qemu-kvm-tools-0.12.1.2-2.320.el6.x86_64
gpxe-roms-qemu-0.9.7-6.9.el6.noarch
qemu-img-0.12.1.2-2.320.el6.x86_64
[root@dhcp-8-207 ~]# rpm -qa | grep libvirt
libvirt-0.10.2-2.el6.x86_64
libvirt-python-0.10.2-1.el6.x86_64
libvirt-client-0.10.2-2.el6.x86_64
[root@dhcp-8-207 ~]# rpm -qa | grep spice
spice-gtk-0.14-2.el6.x86_64
spice-glib-0.14-2.el6.x86_64
spice-server-0.12.0-1.el6.x86_64
spice-gtk-python-0.14-2.el6.x86_64
spice-client-0.8.2-15.el6.x86_64

Steps:
1. Prepare a win7 guest image and place it on the nfs server. 
2. Boot the guest up with virt-manager and open the guest desktop.
3. 'ps' to check if guest is using spice+qxl and seamless-migration is 'on'
4. Migrate guest to another host.
# virsh migrate win7-64 qemu+ssh://10.66.6.67/system
5. Check the libvirt log under /var/log/libvirt/libvirtd.log.

Result:
Most of time, the result looks good (attachment will be upload).
after 'query-migrate' returns 'complete', the 'query-spice' command is called. and when 'query-spice' command returns: Migrate: true and QMP event "SPICE_MIGRATE_COMPLETED" is emitted, guest process quits.

But sometimes the result looks bad (attachment will be upload). 
After migration, there's no "SPICE_MIGRATE_COMPLETED" event emitted, and 'query-spice' command returns : migrated: false.  But guest still quit on the source host after finish migration. 

Hi, Yonit
Could you help have a look at the libvirtd log? Thanks a lot.

Note:
Command line:
[root@t2 libvirt]# ps ax | grep kvm
 1228 ?        S      0:00 [kvm-irqfd-clean]
20528 ?        Sl     0:57 /usr/libexec/qemu-kvm -name win7-64 -S -M rhel6.4.0 -enable-kvm -m 2048 -smp 2,sockets=2,cores=1,threads=1 -uuid 19fe9276-cf27-574b-5cfb-853a08165f9c -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/win7-64.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/libvirt/migrate/win7-64-virtio.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=22,id=hostnet0,vhost=on,vhostfd=23 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:28:8d:97,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -device usb-tablet,id=input0 -spice port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -vga qxl -global qxl-vga.vram_size=67108864 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -incoming tcp:0.0.0.0:49154 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6
Comment 14 Qunfang Zhang 2012-10-12 04:33:49 EDT
Created attachment 625852 [details]
bad result: libvirtd.log
Comment 15 Qunfang Zhang 2012-10-12 04:34:48 EDT
Created attachment 625853 [details]
good result: libvirtd.log
Comment 16 Qunfang Zhang 2012-10-15 05:05:07 EDT
Created attachment 627262 [details]
qemu log in the src host
Comment 17 Yonit Halperin 2012-10-15 05:31:40 EDT
Hi Qunfang,

In the qemu log you can find
"main_channel_client_handle_migrate_connected: client 0x7f4c51e6a640 connected: 0 seamless 0"

This means that the spice client didn't manage to connect to the destination upon client_migrate_info (maybe we can find more details for the reason in the qemu destination log).
The other interesting problem, that explains why you can't find "SPICE_MIGRATE_COMPLETED", is that sometimes libvirt shutdown the src qemu before this qmp event was emitted by spice: I can find in the log
"main_channel_marshall_migrate_switch:"
and then immediately 
"2012-10-12 06:41:09.453+0000: shutting down"
without a disconnection of spice client or a timeout that triggers the qmp event emission.
This is related to bug #836135. Michal, can you help?
Comment 18 Qunfang Zhang 2012-10-15 05:39:26 EDT
Created attachment 627272 [details]
qemu log in the destination host.
Comment 19 Michal Privoznik 2012-10-15 06:22:31 EDT
Pity we don't have timestamps in qemu log.

Anyway, if libvirt kills a qemu process it append a log message about it. And I cannot find anything in bad result log (comment #14). Maybe it was filtered out. Qunfang what was your log_filters= variable in /etc/libvirt/libvirtd.log? Or maybe it was just cut out during attaching into BZ. If you can please clear the variable out (that is no filter is applied) and reproduce, that would be helpful.

BTW libvirt really ignores the SPICE_MIGRATE_COMPLETED event. After we get 'status: completed' from 'query-migrate', we get into state where we ask 'query-spice' repeatedly until 'migrated:true'. Making libvirt listen to that event won't make any difference though as it would spare one 'query-spice' and 50ms wait at most.
Comment 20 Qunfang Zhang 2012-10-16 04:09:56 EDT
(In reply to comment #19)
> Pity we don't have timestamps in qemu log.
> 
> Anyway, if libvirt kills a qemu process it append a log message about it.
> And I cannot find anything in bad result log (comment #14). Maybe it was
> filtered out. Qunfang what was your log_filters= variable in
> /etc/libvirt/libvirtd.log? 
Hi, Michal
I add the following words to /etc/libvirt/libvirtd.conf
log_filters="1:qemu_monitor_json 1:qemu_driver"
log_outputs="1:file:/var/log/libvirt/libvirtd.log"


>Or maybe it was just cut out during attaching
> into BZ. If you can please clear the variable out (that is no filter is
> applied) and reproduce, that would be helpful.
> 
> BTW libvirt really ignores the SPICE_MIGRATE_COMPLETED event. After we get
> 'status: completed' from 'query-migrate', we get into state where we ask
> 'query-spice' repeatedly until 'migrated:true'. Making libvirt listen to
> that event won't make any difference though as it would spare one
> 'query-spice' and 50ms wait at most.
Comment 21 Qunfang Zhang 2012-10-16 06:17:28 EDT
Retest again with the following packages:

qemu-kvm-0.12.1.2-2.327.el6.x86_64
libvirt-0.10.2-2.el6.x86_64
spice-server-0.12.0-1.el6.x86_64
spice-protocol-0.12.2-1.el6.noarch
spice-gtk-0.14-2.el6.x86_64
virt-viewer-0.5.2-11.el6.x86_64

Note: This time I'm using 'remote-viewer' to connect guest to make sure it using spice client.

Steps:
0. Configure libvirtd.conf, add two lines to the file.
# vim /etc/libvirt/libvirtd.conf 
log_filters="1:qemu_monitor_json 1:qemu_driver"
log_outputs="1:file:/var/log/libvirt/libvirtd.log"

# iptables -F
# setenforce 1
# setsebool -P virt_use_nfs 1

And create a nfs server on host A, mount on both host A and B.
#mount $host_A:/var/lib/libvirt/images/ /var/lib/libvirt/migrate 


1. Boot a win7-64 guest on A (source host) with virt-manager, using spice+qxl.
2. Install qxl driver, virtio serial driver, etc inside win7-64 guest.
3. ps -ax | grep kvm to check the command line. It indeed uses spice+qxl and seamless-migration=on.
4. Connect the guest with remote-viewer.
#remote-viewer spice://$hostA_IP:5900
5. Migrate guest from host A to host B (destination host).
6. After finish migration, check if the guest works well. 
7. Migrate guest from host B back to host A.
8. Shutdown guest.

The remote-viewer is opening all the time. And qemu log tells me I'm using the spice client and seamless migration.
main_channel_client_handle_migrate_connected: client 0x7f093c9cdae0 connected: 1 seamless 1

Result:
From the libvirtd.log, I found guest process is closed when 'query-spice' returns: Migrated: false. And the qmp event 'SPICE_MIGRATE_COMPLETED' is not emitted.

Hi, Yonit
Could you help check my result? And I will update the libvirtd.log and qemu log on both src and dst host side.
Comment 22 Qunfang Zhang 2012-10-16 06:18:21 EDT
Created attachment 628080 [details]
New: libvirtd.log on source host.
Comment 23 Qunfang Zhang 2012-10-16 06:19:51 EDT
Created attachment 628081 [details]
New: libvirtd.log on destination host.
Comment 24 Qunfang Zhang 2012-10-16 06:20:18 EDT
Created attachment 628082 [details]
New: qemu.log on source host.
Comment 25 Qunfang Zhang 2012-10-16 06:20:54 EDT
Created attachment 628083 [details]
New: qemu.log on destnation host.
Comment 26 Yonit Halperin 2012-10-16 07:00:53 EDT
(In reply to comment #21)
> Retest again with the following packages:
> 
> qemu-kvm-0.12.1.2-2.327.el6.x86_64
> libvirt-0.10.2-2.el6.x86_64
> spice-server-0.12.0-1.el6.x86_64
> spice-protocol-0.12.2-1.el6.noarch
> spice-gtk-0.14-2.el6.x86_64
> virt-viewer-0.5.2-11.el6.x86_64
> 
> Note: This time I'm using 'remote-viewer' to connect guest to make sure it
> using spice client.
> 
> Steps:
> 0. Configure libvirtd.conf, add two lines to the file.
> # vim /etc/libvirt/libvirtd.conf 
> log_filters="1:qemu_monitor_json 1:qemu_driver"
> log_outputs="1:file:/var/log/libvirt/libvirtd.log"
> 
> # iptables -F
> # setenforce 1
> # setsebool -P virt_use_nfs 1
> 
> And create a nfs server on host A, mount on both host A and B.
> #mount $host_A:/var/lib/libvirt/images/ /var/lib/libvirt/migrate 
> 
> 
> 1. Boot a win7-64 guest on A (source host) with virt-manager, using
> spice+qxl.
> 2. Install qxl driver, virtio serial driver, etc inside win7-64 guest.
> 3. ps -ax | grep kvm to check the command line. It indeed uses spice+qxl and
> seamless-migration=on.
> 4. Connect the guest with remote-viewer.
> #remote-viewer spice://$hostA_IP:5900
> 5. Migrate guest from host A to host B (destination host).
> 6. After finish migration, check if the guest works well. 
> 7. Migrate guest from host B back to host A.
> 8. Shutdown guest.
> 
> The remote-viewer is opening all the time. And qemu log tells me I'm using
> the spice client and seamless migration.
> main_channel_client_handle_migrate_connected: client 0x7f093c9cdae0
> connected: 1 seamless 1
> 
> Result:
> From the libvirtd.log, I found guest process is closed when 'query-spice'
> returns: Migrated: false. And the qmp event 'SPICE_MIGRATE_COMPLETED' is not
> emitted.
> 
> Hi, Yonit
> Could you help check my result? And I will update the libvirtd.log and qemu
> log on both src and dst host side.
Michal, can you have a look? I don't see any "Migrated: true" in libvirt log
Comment 27 Qunfang Zhang 2012-10-17 23:27:29 EDT
Hi, Yonit and Michal

If we always keep remote-viewer opening, then will hit the problem I mentioned in Comment 21.
If we DO NOT open remote-viewer and then implement migration, then 'query-migrate' will return "Migrated: true".
So there may still exist problem, right? 

Thanks,
Qunfang
Comment 28 Yonit Halperin 2012-10-18 02:26:00 EDT
(In reply to comment #27)
> Hi, Yonit and Michal
> 
> If we always keep remote-viewer opening, then will hit the problem I
> mentioned in Comment 21.
> If we DO NOT open remote-viewer and then implement migration, then
> 'query-migrate' will return "Migrated: true".
> So there may still exist problem, right? 
> 
> Thanks,
> Qunfang
Yes, there is a problem: when a spice client is not opened, spice server will change the migration status to "true" immediately. But, when a spice client is up, the spice session data is migrated, and it takes longer (the data passes from the src ,via the client, to the destination). Only when this transfer ends, the migration status changes and the corresponding qmp event is emitted. libvirt should have waited for this change, and only then close the src qemu. According to what you have reported, libvirt didn't wait till the migration status had changed, and closed qemu before query-spice returned "Migrated: true".
Comment 29 Qunfang Zhang 2012-10-18 02:31:09 EDT
(In reply to comment #28)
> (In reply to comment #27)
> > Hi, Yonit and Michal
> > 
> > If we always keep remote-viewer opening, then will hit the problem I
> > mentioned in Comment 21.
> > If we DO NOT open remote-viewer and then implement migration, then
> > 'query-migrate' will return "Migrated: true".
> > So there may still exist problem, right? 
> > 
> > Thanks,
> > Qunfang
> Yes, there is a problem: when a spice client is not opened, spice server
> will change the migration status to "true" immediately. But, when a spice
> client is up, the spice session data is migrated, and it takes longer (the
> data passes from the src ,via the client, to the destination). Only when
> this transfer ends, the migration status changes and the corresponding qmp
> event is emitted. libvirt should have waited for this change, and only then
> close the src qemu. According to what you have reported, libvirt didn't wait
> till the migration status had changed, and closed qemu before query-spice
> returned "Migrated: true".

Yes, it is. Maybe the problem is not in the qemu side?  How about let's wait for the fix about this problem(even in other component)? And then I will verify this bug then to make sure everything work as expected
Comment 30 Michal Privoznik 2012-10-18 04:13:15 EDT
Yonit et all, it's libvirt bug. I've set it as dependency.
Comment 31 Qunfang Zhang 2012-11-06 05:16:35 EST
Re-test again with the latest libvirt packages after Bug 867724 is fixed, the test passed. 

Related Packages:
kernel-2.6.32-337.el6.x86_64
qemu-kvm-0.12.1.2-2.334.el6.x86_64
spice-server-0.12.0-2.el6.x86_64
libvirt-0.10.2-7.el6.x86_64

Steps:
Refer to comment 21.

Result:
1. After 'query-migrate' returns 'completed', start 'query-spice'.
2. After 'query-spice' returns 'migrated: true' and QMP emit a event 'SPICE_MIGRATE_COMPLETED', the qemu-kvm process will quit in source host.

Ping pong migration for several times, no issue. So this bug is verified pass.
Detail log will be updated.
Comment 32 Qunfang Zhang 2012-11-06 05:19:11 EST
Created attachment 639240 [details]
Verified pass result (libvirtd.log)
Comment 34 errata-xmlrpc 2013-02-21 02:37:39 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0527.html

Note You need to log in before you can comment on or make changes to this bug.