Bug 524229

Summary: Local migration of kvm guest fails in Fedora12 Alpha
Product: [Fedora] Fedora Reporter: IBM Bug Proxy <bugproxy>
Component: qemuAssignee: Glauber Costa <gcosta>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: high    
Version: rawhideCC: berrange, clalance, dougsland, dwmw2, ehabkost, gansalmon, gcosta, itamar, jaswinder, jforbes, kernel-maint, markmc, mtosatti, quintela, virt-maint
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-10-26 18:27:46 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 498968    
Attachments:
Description Flags
/var/log/messages of the guest none

Description IBM Bug Proxy 2009-09-18 13:20:47 UTC
=Comment: #0=================================================
SANTWANA SAMANTRAY <santwana.samantray.com> - 

Local migration of kvm guest fails in Fedora12-Alpha using qemu-kvm.

After the guest is local migrated using qemu-kvm, it stops responding. Checking the status of the
guest using "info status" in qemu monitor shows the status as "Paused".
Issuing "cont" for the guest to unpause and start again, shows the below BUG and Call Trace in the
migrated guest.

Attachment: /var/log/messages of the guest

Call Trace:
Sep  7 14:52:40 rhel6 kernel: [<ffffffff8102907f>] ? kvm_mmu_op+0x30/0x55
Sep  7 14:52:40 rhel6 kernel: [<ffffffff810291cd>] ? kvm_deferred_mmu_op+0x46/0x94
Sep  7 14:52:40 rhel6 kernel: [<ffffffff8102927a>] ? kvm_mmu_write+0x33/0x3a
Sep  7 14:52:40 rhel6 kernel: [<ffffffff81029322>] ? kvm_set_pte_at+0x25/0x2a
Sep  7 14:52:40 rhel6 kernel: [<ffffffff810b49c6>] ? __do_fault+0x300/0x3d5
Sep  7 14:52:40 rhel6 kernel: [<ffffffff810b6a3e>] ? handle_mm_fault+0x349/0x7c5
Sep  7 14:52:40 rhel6 kernel: [<ffffffff813ad4b8>] ? do_page_fault+0x5b5/0x9e9
Sep  7 14:52:40 rhel6 kernel: [<ffffffff810bca82>] ? do_mmap_pgoff+0x304/0x367
Sep  7 14:52:40 rhel6 kernel: [<ffffffff8102a12a>] ? default_spin_lock_flags+0x9/0xf
Sep  7 14:52:40 rhel6 kernel: [<ffffffff813aa959>] ? trace_hardirqs_off_thunk+0x3a/0x6c
Sep  7 14:52:40 rhel6 kernel: [<ffffffff813ab015>] ? page_fault+0x25/0x30

Commands used for migration are as below:
qemu-kvm -no-hpet -drive file=/var/lib/libvirt/images/rhel6.raw,if=ide,cache=writeback,index=0 -smp
4 -cpu qemu64,+sse2 -m 2048 -net nic,macaddr=00:21:9B:85:98:E8,model=virtio -net
tap,script=/home/qemu-ifup-latest -vnc :1 -name rhel6_qemu

and 
qemu-kvm -no-hpet -drive file=/var/lib/libvirt/images/rhel6.raw,if=ide,cache=writeback,index=0 -smp
4 -cpu qemu64,+sse2 -m 2048 -net nic,macaddr=00:21:9B:85:98:E8,model=virtio -net
tap,script=/home/qemu-ifup-latest -vnc :2 -name rhel6_qemu_migrate -incoming tcp:0:4564

=Comment: #3=================================================
SANTWANA SAMANTRAY <santwana.samantray.com> - 
I tried local migration with the latest version of qemu-kvm(qemu-0.10.92-4.fc12.x86_64). After the
local migration of the guest, the below messages were noticed in the dmesg of the guest:

bad partial csum: csum=8448/34203 len=54
bad partial csum: csum=8448/34203 len=54
bad partial csum: csum=8448/34203 len=54

The guest was accessible for sometime, later it didn't respond to any keystrokes or mouse cursor.
Even the ssh session of the guest was not responsive. Below message was seen in the /var/log/message
of the guest, just after which, it stopped responding.

gnome-session[1893]: WARNING: Detected that screensaver has left the bus

Comment 1 IBM Bug Proxy 2009-09-18 13:20:57 UTC
Created attachment 361658 [details]
/var/log/messages of the guest

Comment 2 Mark McLoughlin 2009-09-21 13:53:46 UTC
(In reply to comment #0)

> Sep  7 14:52:40 rhel6 kernel: [<ffffffff8102907f>] ? kvm_mmu_op+0x30/0x55

Okay, looks like a guest kernel PV MMU issue

2.6.29.4-1.el6.x86_64 is the kernel version

Santwana: could you try and reproduce with 2.6.30 from F11 updates or using 2.6.31 from F12?

Comment 3 Marcelo Tosatti 2009-10-01 16:07:50 UTC
Fedora 12-Alpha was using 2.6.31-rc5 which contains a known pvmmu bug. Should be fixed in 2.6.31.

Closing bug, please reopen if problems still persist with FC12.

Comment 4 IBM Bug Proxy 2009-10-07 05:30:40 UTC
------- Comment From santwana.samantray.com 2009-10-07 01:25 EDT-------
Hello Redhat,

I verified this issue in F12 release(k.v- 2.6.31-33.fc12.x86_64) as host, and this issue is still reproducible. After local migration of the guest, the guest was accessible for sometime, later it didn't respond to any
keystrokes or mouse cursor.
Even the ssh session of the guest was not responsive. Checking the status of the guest using "info status" in qemu monitor shows as "running", but still the guest is unresponsive.

Thanks,
Santwana

Comment 5 Marcelo Tosatti 2009-10-07 21:07:21 UTC
The bug is the guest code, so you should upgrade it also.

Comment 6 IBM Bug Proxy 2009-10-15 06:30:42 UTC
------- Comment From santwana.samantray.com 2009-10-15 02:29 EDT-------
Hello Redhat,

I verified this issue in F12 release(k.v- 2.6.31-33.fc12.x86_64) as host, and guest kernel was 2.6.31-27.el6.x86_64. After local migration of the guest, the guest didn't respond to any keystrokes or mouse cursor. "info status" in qemu monitor shows as "running", but still the guest is unresponsive.

Thanks,
Santwana

Comment 7 Mark McLoughlin 2009-10-16 13:48:33 UTC
AFAIK the fix which Marcelo thought resolves this was in 2.6.31-27.el6.x86_64, so we may be looking at a different problem

Comment 8 Marcelo Tosatti 2009-10-19 22:25:25 UTC
OK, can reproduce it. 

Migration seems stable with either "no-kvmclock" or

commit 11ed4b344c0eb6f1c5d11a07c307e94174a13900
Author: Glauber Costa <glommer>
Date:   Fri Oct 16 15:27:38 2009 -0400

    properly save kvm system time msr registers
    
    Currently, the msrs involved in setting up pvclock are not saved over
    migration and/or save/restore. This patch puts their value in special
    fields in our CPUState, and deal with them using vmstate.
    
    kvm also has to account for it, by including them in the msr list
    for the ioctls.

Comment 9 Mark McLoughlin 2009-10-21 07:39:45 UTC
Glauber: should the fix Marcelo points out be backported to 0.11.0 for Fedora 12?

Comment 10 Glauber Costa 2009-10-21 11:53:57 UTC
Yes.

If we have vmstate in place, it should be quite easy. If not, I am backporting it to RHEL5, and we can use the same patch

Comment 11 Mark McLoughlin 2009-10-21 12:35:05 UTC
There's no vmstate in 0.11.0

Note, this is on F12VirtBlocker and it's less than two weeks to F12 GA freeze

Comment 12 Glauber Costa 2009-10-21 13:05:23 UTC
Ok, so we'll probably be able to use the same patch I'll write for RHEL. I'll work out something.

Comment 13 Justin M. Forbes 2009-10-26 18:27:46 UTC
This should be fixed with:
* Wed Oct 21 2009 Glauber Costa <gcosta> - 2:0.11.0-8
- Properly save kvm time registers (#524229)

Comment 14 IBM Bug Proxy 2009-10-27 08:50:42 UTC
------- Comment From santwana.samantray.com 2009-10-27 04:40 EDT-------
Hi,

I was able to reproduce this issue in the latest F12 rawhide (k.v- 2.6.31.5-96.fc12.x86_64). The guest becomes unresponsive after migration.
However, using the "no-kvmclock" option, the guest is responsive after local migration.

Thanks,
Santwana

Comment 15 Justin M. Forbes 2009-10-27 12:36:47 UTC
The fix is in qemu-kvm-0.11.0-8.  Please make sure that you are using this version of qemu-kvm.

Comment 16 IBM Bug Proxy 2009-10-29 09:40:55 UTC
------- Comment From santwana.samantray.com 2009-10-29 05:33 EDT-------
Hello Redhat,

The qemu-kvm version in the latest F12 rawhide (k.v-2.6.31.5-96.fc12.x86_64) is qemu-kvm-0.11.0-7.fc12.x86_64.
Can you give us a pointer for downloading "qemu-kvm-0.11.0-8", so that we can update the bug after verifying in the qemu-kvm-0.11.0-8 release.

Thanks,
Santwana

Comment 17 Mark McLoughlin 2009-10-29 11:41:29 UTC
Should be in rawhide since the 27th:

  http://www.redhat.com/archives/fedora-test-list/2009-October/msg00674.html

you can also download it from:

  http://koji.fedoraproject.org/koji/buildinfo?buildID=137730

Comment 18 IBM Bug Proxy 2009-10-29 13:00:50 UTC
------- Comment From santwana.samantray.com 2009-10-29 08:50 EDT-------
Hello Redhat,

Thanks for the link. It was there in rawhide since 27th. After installing "qemu-kvm-0.11.0-8", the issue is resolved now. We can close this issue.

Thanks for your support
Santwana

------- Comment From bnpoorni.com 2009-10-29 08:55 EDT-------
Closing as per the above comment...