Bug 965169
Summary: | Unable to move tasks from domain cgroup to emulator cgroup | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | IBM Bug Proxy <bugproxy> | ||||
Component: | libvirt | Assignee: | Eric Blake <eblake> | ||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 18 | CC: | berrange, clalancette, eblake, itamar, jforbes, jkachuck, jyang, laine, libvirt-maint, veillard, virt-maint, wgomerin | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | All | ||||||
Whiteboard: | |||||||
Fixed In Version: | libvirt-0.10.2.6-1.fc18 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2013-06-25 03:24:58 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
IBM Bug Proxy
2013-05-20 15:20:39 UTC
Created attachment 750629 [details]
qemu cgroup trace
See also upstream thread about this problem: https://www.redhat.com/archives/libvir-list/2013-May/msg01360.html Known issue, and I'm working on the fix. There's a race in libvirt (race one in this thread: https://www.redhat.com/archives/libvir-list/2013-May/msg01360.html). Sometimes, qemu starts short-lived threads (perhaps glibc is spawning a thread to do aio work while reading a disk image); the race is that if the temporary qemu thread exits in between the time that libvirt reads two tids from the source cgroup and but only one tid remains alive at the time of the write to the destination cgroup, then the second write will fail, and libvirt is turning that failure into a catastrophic cascade that prevents the domain from starting. It looks like the fix will be teaching libvirt to ignore failure on moving an (exited) process into another cgroup. Upstream patch posted: https://www.redhat.com/archives/libvir-list/2013-May/msg01478.html ------- Comment From aliguori.com 2013-05-21 14:31 EDT------- Hi Eric, You should CC qemu-devel if you resubmit. These threads are our AIO pool. It has a fixed size and we have logic to tear down idle threads and respawn threads as needed. We could also add an option to not tear down idle threads if it made cgroup management more deterministic... I don't think there's any need for QEMU to change what its doing. We now iterate until the original cgroup tasks file is empty, so we can be guaranteed to move all QEMU threads, even if it spawns more while we're working. QEMU isn't spawning these threads so fast that this approach is a problem. Furthermore, if all your threads are being spawned by a master thread (all helper threads share a common parent) rather than the alternative of spawning each new thread from the most-recent thread (later threads are separated from the original parent thread by intermediate threads), then the moment we have moved the parent thread, all further threads that the parent spawns will already be in the right group, at which point libvirt's looping code will generally iterate at most twice before picking up all threads, no matter how fast your master thread is spawning them. I agree that qemu doesn't need to change its policy on thread usage at this time. Patch in comment 5 is now commit 83e4c775 upstream, and I have already backported it to v0.10.2-maint (F18) and v1.0.5-maint (F19). libvirt-0.10.2.6-1.fc18 has been submitted as an update for Fedora 18. https://admin.fedoraproject.org/updates/libvirt-0.10.2.6-1.fc18 Package libvirt-0.10.2.6-1.fc18: * should fix your issue, * was pushed to the Fedora 18 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing libvirt-0.10.2.6-1.fc18' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2013-10805/libvirt-0.10.2.6-1.fc18 then log in and leave karma (feedback). ------- Comment From kamaleshb.com 2013-06-19 07:07 EDT------- Hi, tested is successfully with libvirt-1.0.6-487.kvm.20130610.ga191a2b.s390x in fc18. thanks Agi libvirt-0.10.2.6-1.fc18 has been pushed to the Fedora 18 stable repository. If problems still persist, please make note of it in this bug report. |