Bug 694750

Summary: Qemu-kvm instance quitting very slow after ping-pong migration for long time
Product: Red Hat Enterprise Linux 6 Reporter: Mike Cao <bcao>
Component: qemu-kvmAssignee: Rik van Riel <riel>
Status: CLOSED WORKSFORME QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.1CC: bcao, gcosta, juzhang, michen, mkenneth, tburke, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 766534 (view as bug list) Environment:
Last Closed: 2011-12-12 08:11:15 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 766534    

Description Mike Cao 2011-04-08 09:23:17 UTC
Description of problem:


Version-Release number of selected component (if applicable):
2.6.32-128.el6.x86_64
# rpm -q qemu-kvm
qemu-kvm-0.12.1.2-2.156.el6.x86_64


How reproducible:
sometimes

Steps to Reproduce:
1.Start guest in src host
2.start listenning port
3.do ping-pong live migration ,once migration complete, quit the qemu-kvm process in src host
  
Actual results:
after migration ,qemu-monitor slow to respond 
issue (qemu) quit , guest quiting very slow

Expected results:
after migration ,qemu-monitor still works as normal ,
after issue "quit" in qemu-monitor ,qemu-kvm process can quit fast

Additional info:

Comment 2 RHEL Program Management 2011-04-09 06:00:19 UTC
Since RHEL 6.1 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 3 Rik van Riel 2011-06-08 18:55:23 UTC
Here is the conversation that got me the bug.

Information wanted: how can I reproduce the bug?  What commands do I need to type and where? :)

--- byount is now known as byount|kcs
<quintela> https://bugzilla.redhat.com/show_bug.cgi?id=694750
<supybot> Bug 694750: medium, medium, rc, quintela, NEW, Qemu-kvm instance quitting very slow after ping-pong migration for long time
<quintela> I want to give this to you/andrea
<riel> strange bug
--> ddd (~ddutile.redhat.com) has joined #kvm
<riel> do you suspect the kernel is involved at all?  or just qemu-kvm weirdness?
--- twoerner is now known as twoerner_gone
<riel> How reproducible:
<riel> sometimes
<riel> and probably no customer impact :)
<quintela> riel: it is always reproducible for me
<quintela> if you migrate, source of migration gets unresponsive
<quintela> it is kernel issue
<riel> ohhhh
<quintela> whole host
<riel> so it takes just a single migration?
<quintela> my understanding is that we have not "undo" the migration log, and then we do it at a very bad way
<riel> for the entire source host to become unresponsive?
<quintela> but I don't know how to do that
<quintela> riel: yeap
<riel> interesting
<quintela> riel: you need a big guest (better 8/16GB guest)
<quintela> i.e. we have something exponential somewhere
<riel> and two big hosts in the lab
<quintela> for me, it happens that I do a migration, and then exit qemu on the source
<quintela> and it takes 10-15 seconds
<quintela> for that time, no remote shell answers
<riel> woah
<riel> not even in other ssh sessions to the host?
<quintela> riel: yeap, not even that
<quintela> and QE was able to reproduce, so it is not my imagination O:-)
--> Sanjay_M|commute (~smehrotr.redhat.com) has joined #kvm
--- Sanjay_M|commute is now known as Sanjay_M
<-- ulio has quit (Quit: Leaving)
<riel> quintela: sure I'll take the bug
<riel> quintela: do we have hosts set up somewhere that reproduce it?  (can you teach me how to reproduce it?)
<riel> I wonder what qemu does to make migrate work - lots of mprotect calls?
<quintela> riel: kernel assistance
<quintela> we ask the kernel to mark what pages have changed
<quintela> and we reload a bitmap with the changed pages each time that we sent everything dirty on the bitmap
<quintela> userspace does nothing more than reading that bitmap, and sending out the dirty ones
--> simong (~simong.redhat.com) has joined #kvm
<riel> quintela: where is the code that sets up and maintains that bitmap?  (and yeah, you can assign the bug to me)

Comment 5 Rik van Riel 2011-10-05 14:11:43 UTC
Juan, I have not reproduced the bug on the lab setup.  I have managed to make a 15GB guest migrate back and forth in a tight loop between two hosts, and it's always died after a while, without me observing the bug.

Comment 7 Dor Laor 2011-12-12 08:11:15 UTC
Closing, if QE can propose a script that helps reproduce it would be re-opened