| Summary: | Qemu-kvm instance quitting very slow after ping-pong migration for long time | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Mike Cao <bcao> | |
| Component: | qemu-kvm | Assignee: | Rik van Riel <riel> | |
| Status: | CLOSED WORKSFORME | QA Contact: | Virtualization Bugs <virt-bugs> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | medium | |||
| Version: | 6.1 | CC: | bcao, gcosta, juzhang, michen, mkenneth, tburke, virt-maint | |
| Target Milestone: | rc | |||
| Target Release: | --- | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | Bug Fix | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 766534 (view as bug list) | Environment: | ||
| Last Closed: | 2011-12-12 08:11:15 UTC | Type: | --- | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Bug Depends On: | ||||
| Bug Blocks: | 766534 | |||
|
Description
Mike Cao
2011-04-08 09:23:17 UTC
Since RHEL 6.1 External Beta has begun, and this bug remains unresolved, it has been rejected as it is not proposed as exception or blocker. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. Here is the conversation that got me the bug. Information wanted: how can I reproduce the bug? What commands do I need to type and where? :) --- byount is now known as byount|kcs <quintela> https://bugzilla.redhat.com/show_bug.cgi?id=694750 <supybot> Bug 694750: medium, medium, rc, quintela, NEW, Qemu-kvm instance quitting very slow after ping-pong migration for long time <quintela> I want to give this to you/andrea <riel> strange bug --> ddd (~ddutile.redhat.com) has joined #kvm <riel> do you suspect the kernel is involved at all? or just qemu-kvm weirdness? --- twoerner is now known as twoerner_gone <riel> How reproducible: <riel> sometimes <riel> and probably no customer impact :) <quintela> riel: it is always reproducible for me <quintela> if you migrate, source of migration gets unresponsive <quintela> it is kernel issue <riel> ohhhh <quintela> whole host <riel> so it takes just a single migration? <quintela> my understanding is that we have not "undo" the migration log, and then we do it at a very bad way <riel> for the entire source host to become unresponsive? <quintela> but I don't know how to do that <quintela> riel: yeap <riel> interesting <quintela> riel: you need a big guest (better 8/16GB guest) <quintela> i.e. we have something exponential somewhere <riel> and two big hosts in the lab <quintela> for me, it happens that I do a migration, and then exit qemu on the source <quintela> and it takes 10-15 seconds <quintela> for that time, no remote shell answers <riel> woah <riel> not even in other ssh sessions to the host? <quintela> riel: yeap, not even that <quintela> and QE was able to reproduce, so it is not my imagination O:-) --> Sanjay_M|commute (~smehrotr.redhat.com) has joined #kvm --- Sanjay_M|commute is now known as Sanjay_M <-- ulio has quit (Quit: Leaving) <riel> quintela: sure I'll take the bug <riel> quintela: do we have hosts set up somewhere that reproduce it? (can you teach me how to reproduce it?) <riel> I wonder what qemu does to make migrate work - lots of mprotect calls? <quintela> riel: kernel assistance <quintela> we ask the kernel to mark what pages have changed <quintela> and we reload a bitmap with the changed pages each time that we sent everything dirty on the bitmap <quintela> userspace does nothing more than reading that bitmap, and sending out the dirty ones --> simong (~simong.redhat.com) has joined #kvm <riel> quintela: where is the code that sets up and maintains that bitmap? (and yeah, you can assign the bug to me) Juan, I have not reproduced the bug on the lab setup. I have managed to make a 15GB guest migrate back and forth in a tight loop between two hosts, and it's always died after a while, without me observing the bug. Closing, if QE can propose a script that helps reproduce it would be re-opened |