Bug 836133
Summary: | spice migration: prevent race with libvirt | ||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Yonit Halperin <yhalperi> | ||||||||||||||||||||
Component: | qemu-kvm | Assignee: | Yonit Halperin <yhalperi> | ||||||||||||||||||||
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||||||||||||||||
Severity: | unspecified | Docs Contact: | |||||||||||||||||||||
Priority: | unspecified | ||||||||||||||||||||||
Version: | 6.4 | CC: | acathrow, areis, bsarathy, dblechte, dyasny, juzhang, kraxel, michen, minovotn, mkenneth, mkrcmari, qzhang, shu, tburke, virt-maint | ||||||||||||||||||||
Target Milestone: | rc | Keywords: | Reopened | ||||||||||||||||||||
Target Release: | --- | ||||||||||||||||||||||
Hardware: | Unspecified | ||||||||||||||||||||||
OS: | Unspecified | ||||||||||||||||||||||
Whiteboard: | |||||||||||||||||||||||
Fixed In Version: | qemu-kvm-0.12.1.2-2.310.el6 | Doc Type: | Bug Fix | ||||||||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||||||||
Clone Of: | |||||||||||||||||||||||
: | 836135 (view as bug list) | Environment: | |||||||||||||||||||||
Last Closed: | 2013-02-21 07:37:39 UTC | Type: | Bug | ||||||||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||||
Embargoed: | |||||||||||||||||||||||
Bug Depends On: | 867724 | ||||||||||||||||||||||
Bug Blocks: | 798682, 836123, 836135, 846910, 846911, 894020 | ||||||||||||||||||||||
Attachments: |
|
Description
Yonit Halperin
2012-06-28 07:44:38 UTC
Has this already been proposed upstream? The process is to propose it upstream and get it merged. Afterwards, we open a bz to backport it to RHEL. I'll close this as NOTABUG if this hasn't been done yet. This request was not resolved in time for the current release. Red Hat invites you to ask your support representative to propose this request, if still desired, for consideration in the next release of Red Hat Enterprise Linux. This request was erroneously removed from consideration in Red Hat Enterprise Linux 6.4, which is currently under development. This request will be evaluated for inclusion in Red Hat Enterprise Linux 6.4. Closing as per comment 1. I think this bz is premature, as it's proposing one possible solution for a problem that has to be discussed upstream first. (In reply to comment #4) > Closing as per comment 1. I think this bz is premature, as it's proposing > one possible solution for a problem that has to be discussed upstream first. There has already been a discussion about it upstream, and this bug is based on it. http://lists.gnu.org/archive/html/qemu-devel/2012-06/msg00815.html Yonit, now I understand the problem, although I'm not completely sure that the event is enough. For example, what happens if libvirt misses the event? Let's do the following: I'll re-assign this bug for you. You can re-open it if you want, you post the patch(es) upstream and we discuss it there. (In reply to comment #6) > Yonit, now I understand the problem, although I'm not completely sure that > the event is enough. For example, what happens if libvirt misses the event? If libvirt was down (the only reason to miss the event), libvirt should query the migration status when it re-starts. The requires that the live migration info command should provide the exact details about the end of migration. Anyway it's a rare case and worse case we'll have the client disconnect because libvirt was done, that's acceptable. > Let's do the following: I'll re-assign this bug for you. You can re-open it > if you want, you post the patch(es) upstream and we discuss it there. (In reply to comment #7) > (In reply to comment #6) > > Yonit, now I understand the problem, although I'm not completely sure that > > the event is enough. For example, what happens if libvirt misses the event? > > If libvirt was down (the only reason to miss the event), libvirt should > query the migration status when it re-starts. The requires that the live > migration info command should provide the exact details about the end of > migration. > Anyway it's a rare case and worse case we'll have the client disconnect > because libvirt was done, that's acceptable. > There will be a problem if libvirt will endlessly wait for the above qmp event. Michal suggested (https://bugzilla.redhat.com/show_bug.cgi?id=836135#c4) to add a state to the migration, "spice-migration-completed", instead of using qmp event. > > Let's do the following: I'll re-assign this bug for you. You can re-open it > > if you want, you post the patch(es) upstream and we discuss it there. (In reply to comment #8) > (In reply to comment #7) > > (In reply to comment #6) > > > Yonit, now I understand the problem, although I'm not completely sure that > > > the event is enough. For example, what happens if libvirt misses the event? > > > > If libvirt was down (the only reason to miss the event), libvirt should > > query the migration status when it re-starts. The requires that the live > > migration info command should provide the exact details about the end of > > migration. > > Anyway it's a rare case and worse case we'll have the client disconnect > > because libvirt was done, that's acceptable. > > > There will be a problem if libvirt will endlessly wait for the above qmp > event. > Michal suggested (https://bugzilla.redhat.com/show_bug.cgi?id=836135#c4) to > add a state to the migration, "spice-migration-completed", instead of using > qmp event. It's pretty much the same case as we need both to publish it as an event base change and also add a info option in a synchronous way. > > > > Let's do the following: I'll re-assign this bug for you. You can re-open it > > > if you want, you post the patch(es) upstream and we discuss it there. Hello, Yonit I verified this bug on the following packages, but find it may still have some problem, please help confirm it and correct me if I am wrong. Thanks. Packages: [root@dhcp-8-207 ~]# uname -r 2.6.32-323.el6.x86_64 [root@dhcp-8-207 ~]# rpm -qa | grep qemu qemu-kvm-0.12.1.2-2.320.el6.x86_64 qemu-kvm-debuginfo-0.12.1.2-2.320.el6.x86_64 qemu-guest-agent-0.12.1.2-2.316.el6.x86_64 qemu-kvm-tools-0.12.1.2-2.320.el6.x86_64 gpxe-roms-qemu-0.9.7-6.9.el6.noarch qemu-img-0.12.1.2-2.320.el6.x86_64 [root@dhcp-8-207 ~]# rpm -qa | grep libvirt libvirt-0.10.2-2.el6.x86_64 libvirt-python-0.10.2-1.el6.x86_64 libvirt-client-0.10.2-2.el6.x86_64 [root@dhcp-8-207 ~]# rpm -qa | grep spice spice-gtk-0.14-2.el6.x86_64 spice-glib-0.14-2.el6.x86_64 spice-server-0.12.0-1.el6.x86_64 spice-gtk-python-0.14-2.el6.x86_64 spice-client-0.8.2-15.el6.x86_64 Steps: 1. Prepare a win7 guest image and place it on the nfs server. 2. Boot the guest up with virt-manager and open the guest desktop. 3. 'ps' to check if guest is using spice+qxl and seamless-migration is 'on' 4. Migrate guest to another host. # virsh migrate win7-64 qemu+ssh://10.66.6.67/system 5. Check the libvirt log under /var/log/libvirt/libvirtd.log. Result: Most of time, the result looks good (attachment will be upload). after 'query-migrate' returns 'complete', the 'query-spice' command is called. and when 'query-spice' command returns: Migrate: true and QMP event "SPICE_MIGRATE_COMPLETED" is emitted, guest process quits. But sometimes the result looks bad (attachment will be upload). After migration, there's no "SPICE_MIGRATE_COMPLETED" event emitted, and 'query-spice' command returns : migrated: false. But guest still quit on the source host after finish migration. Hi, Yonit Could you help have a look at the libvirtd log? Thanks a lot. Note: Command line: [root@t2 libvirt]# ps ax | grep kvm 1228 ? S 0:00 [kvm-irqfd-clean] 20528 ? Sl 0:57 /usr/libexec/qemu-kvm -name win7-64 -S -M rhel6.4.0 -enable-kvm -m 2048 -smp 2,sockets=2,cores=1,threads=1 -uuid 19fe9276-cf27-574b-5cfb-853a08165f9c -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/win7-64.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/libvirt/migrate/win7-64-virtio.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=22,id=hostnet0,vhost=on,vhostfd=23 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:28:8d:97,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -device usb-tablet,id=input0 -spice port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -vga qxl -global qxl-vga.vram_size=67108864 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -incoming tcp:0.0.0.0:49154 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 Created attachment 625852 [details]
bad result: libvirtd.log
Created attachment 625853 [details]
good result: libvirtd.log
Created attachment 627262 [details]
qemu log in the src host
Hi Qunfang, In the qemu log you can find "main_channel_client_handle_migrate_connected: client 0x7f4c51e6a640 connected: 0 seamless 0" This means that the spice client didn't manage to connect to the destination upon client_migrate_info (maybe we can find more details for the reason in the qemu destination log). The other interesting problem, that explains why you can't find "SPICE_MIGRATE_COMPLETED", is that sometimes libvirt shutdown the src qemu before this qmp event was emitted by spice: I can find in the log "main_channel_marshall_migrate_switch:" and then immediately "2012-10-12 06:41:09.453+0000: shutting down" without a disconnection of spice client or a timeout that triggers the qmp event emission. This is related to bug #836135. Michal, can you help? Created attachment 627272 [details]
qemu log in the destination host.
Pity we don't have timestamps in qemu log. Anyway, if libvirt kills a qemu process it append a log message about it. And I cannot find anything in bad result log (comment #14). Maybe it was filtered out. Qunfang what was your log_filters= variable in /etc/libvirt/libvirtd.log? Or maybe it was just cut out during attaching into BZ. If you can please clear the variable out (that is no filter is applied) and reproduce, that would be helpful. BTW libvirt really ignores the SPICE_MIGRATE_COMPLETED event. After we get 'status: completed' from 'query-migrate', we get into state where we ask 'query-spice' repeatedly until 'migrated:true'. Making libvirt listen to that event won't make any difference though as it would spare one 'query-spice' and 50ms wait at most. (In reply to comment #19) > Pity we don't have timestamps in qemu log. > > Anyway, if libvirt kills a qemu process it append a log message about it. > And I cannot find anything in bad result log (comment #14). Maybe it was > filtered out. Qunfang what was your log_filters= variable in > /etc/libvirt/libvirtd.log? Hi, Michal I add the following words to /etc/libvirt/libvirtd.conf log_filters="1:qemu_monitor_json 1:qemu_driver" log_outputs="1:file:/var/log/libvirt/libvirtd.log" >Or maybe it was just cut out during attaching > into BZ. If you can please clear the variable out (that is no filter is > applied) and reproduce, that would be helpful. > > BTW libvirt really ignores the SPICE_MIGRATE_COMPLETED event. After we get > 'status: completed' from 'query-migrate', we get into state where we ask > 'query-spice' repeatedly until 'migrated:true'. Making libvirt listen to > that event won't make any difference though as it would spare one > 'query-spice' and 50ms wait at most. Retest again with the following packages: qemu-kvm-0.12.1.2-2.327.el6.x86_64 libvirt-0.10.2-2.el6.x86_64 spice-server-0.12.0-1.el6.x86_64 spice-protocol-0.12.2-1.el6.noarch spice-gtk-0.14-2.el6.x86_64 virt-viewer-0.5.2-11.el6.x86_64 Note: This time I'm using 'remote-viewer' to connect guest to make sure it using spice client. Steps: 0. Configure libvirtd.conf, add two lines to the file. # vim /etc/libvirt/libvirtd.conf log_filters="1:qemu_monitor_json 1:qemu_driver" log_outputs="1:file:/var/log/libvirt/libvirtd.log" # iptables -F # setenforce 1 # setsebool -P virt_use_nfs 1 And create a nfs server on host A, mount on both host A and B. #mount $host_A:/var/lib/libvirt/images/ /var/lib/libvirt/migrate 1. Boot a win7-64 guest on A (source host) with virt-manager, using spice+qxl. 2. Install qxl driver, virtio serial driver, etc inside win7-64 guest. 3. ps -ax | grep kvm to check the command line. It indeed uses spice+qxl and seamless-migration=on. 4. Connect the guest with remote-viewer. #remote-viewer spice://$hostA_IP:5900 5. Migrate guest from host A to host B (destination host). 6. After finish migration, check if the guest works well. 7. Migrate guest from host B back to host A. 8. Shutdown guest. The remote-viewer is opening all the time. And qemu log tells me I'm using the spice client and seamless migration. main_channel_client_handle_migrate_connected: client 0x7f093c9cdae0 connected: 1 seamless 1 Result: From the libvirtd.log, I found guest process is closed when 'query-spice' returns: Migrated: false. And the qmp event 'SPICE_MIGRATE_COMPLETED' is not emitted. Hi, Yonit Could you help check my result? And I will update the libvirtd.log and qemu log on both src and dst host side. Created attachment 628080 [details]
New: libvirtd.log on source host.
Created attachment 628081 [details]
New: libvirtd.log on destination host.
Created attachment 628082 [details]
New: qemu.log on source host.
Created attachment 628083 [details]
New: qemu.log on destnation host.
(In reply to comment #21) > Retest again with the following packages: > > qemu-kvm-0.12.1.2-2.327.el6.x86_64 > libvirt-0.10.2-2.el6.x86_64 > spice-server-0.12.0-1.el6.x86_64 > spice-protocol-0.12.2-1.el6.noarch > spice-gtk-0.14-2.el6.x86_64 > virt-viewer-0.5.2-11.el6.x86_64 > > Note: This time I'm using 'remote-viewer' to connect guest to make sure it > using spice client. > > Steps: > 0. Configure libvirtd.conf, add two lines to the file. > # vim /etc/libvirt/libvirtd.conf > log_filters="1:qemu_monitor_json 1:qemu_driver" > log_outputs="1:file:/var/log/libvirt/libvirtd.log" > > # iptables -F > # setenforce 1 > # setsebool -P virt_use_nfs 1 > > And create a nfs server on host A, mount on both host A and B. > #mount $host_A:/var/lib/libvirt/images/ /var/lib/libvirt/migrate > > > 1. Boot a win7-64 guest on A (source host) with virt-manager, using > spice+qxl. > 2. Install qxl driver, virtio serial driver, etc inside win7-64 guest. > 3. ps -ax | grep kvm to check the command line. It indeed uses spice+qxl and > seamless-migration=on. > 4. Connect the guest with remote-viewer. > #remote-viewer spice://$hostA_IP:5900 > 5. Migrate guest from host A to host B (destination host). > 6. After finish migration, check if the guest works well. > 7. Migrate guest from host B back to host A. > 8. Shutdown guest. > > The remote-viewer is opening all the time. And qemu log tells me I'm using > the spice client and seamless migration. > main_channel_client_handle_migrate_connected: client 0x7f093c9cdae0 > connected: 1 seamless 1 > > Result: > From the libvirtd.log, I found guest process is closed when 'query-spice' > returns: Migrated: false. And the qmp event 'SPICE_MIGRATE_COMPLETED' is not > emitted. > > Hi, Yonit > Could you help check my result? And I will update the libvirtd.log and qemu > log on both src and dst host side. Michal, can you have a look? I don't see any "Migrated: true" in libvirt log Hi, Yonit and Michal If we always keep remote-viewer opening, then will hit the problem I mentioned in Comment 21. If we DO NOT open remote-viewer and then implement migration, then 'query-migrate' will return "Migrated: true". So there may still exist problem, right? Thanks, Qunfang (In reply to comment #27) > Hi, Yonit and Michal > > If we always keep remote-viewer opening, then will hit the problem I > mentioned in Comment 21. > If we DO NOT open remote-viewer and then implement migration, then > 'query-migrate' will return "Migrated: true". > So there may still exist problem, right? > > Thanks, > Qunfang Yes, there is a problem: when a spice client is not opened, spice server will change the migration status to "true" immediately. But, when a spice client is up, the spice session data is migrated, and it takes longer (the data passes from the src ,via the client, to the destination). Only when this transfer ends, the migration status changes and the corresponding qmp event is emitted. libvirt should have waited for this change, and only then close the src qemu. According to what you have reported, libvirt didn't wait till the migration status had changed, and closed qemu before query-spice returned "Migrated: true". (In reply to comment #28) > (In reply to comment #27) > > Hi, Yonit and Michal > > > > If we always keep remote-viewer opening, then will hit the problem I > > mentioned in Comment 21. > > If we DO NOT open remote-viewer and then implement migration, then > > 'query-migrate' will return "Migrated: true". > > So there may still exist problem, right? > > > > Thanks, > > Qunfang > Yes, there is a problem: when a spice client is not opened, spice server > will change the migration status to "true" immediately. But, when a spice > client is up, the spice session data is migrated, and it takes longer (the > data passes from the src ,via the client, to the destination). Only when > this transfer ends, the migration status changes and the corresponding qmp > event is emitted. libvirt should have waited for this change, and only then > close the src qemu. According to what you have reported, libvirt didn't wait > till the migration status had changed, and closed qemu before query-spice > returned "Migrated: true". Yes, it is. Maybe the problem is not in the qemu side? How about let's wait for the fix about this problem(even in other component)? And then I will verify this bug then to make sure everything work as expected Yonit et all, it's libvirt bug. I've set it as dependency. Re-test again with the latest libvirt packages after Bug 867724 is fixed, the test passed. Related Packages: kernel-2.6.32-337.el6.x86_64 qemu-kvm-0.12.1.2-2.334.el6.x86_64 spice-server-0.12.0-2.el6.x86_64 libvirt-0.10.2-7.el6.x86_64 Steps: Refer to comment 21. Result: 1. After 'query-migrate' returns 'completed', start 'query-spice'. 2. After 'query-spice' returns 'migrated: true' and QMP emit a event 'SPICE_MIGRATE_COMPLETED', the qemu-kvm process will quit in source host. Ping pong migration for several times, no issue. So this bug is verified pass. Detail log will be updated. Created attachment 639240 [details]
Verified pass result (libvirtd.log)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-0527.html |