Bug 692663
| Summary: | [Libvirt] Libvirtd hangs when qemu processes are unresponsive | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | David Naori <dnaori> | ||||||||
| Component: | libvirt | Assignee: | Michal Privoznik <mprivozn> | ||||||||
| Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||||
| Severity: | high | Docs Contact: | |||||||||
| Priority: | high | ||||||||||
| Version: | 6.1 | CC: | bowe, dallan, dyuan, eblake, gsun, hateya, jgalipea, kxiong, mgoldboi, mprivozn, nzhang, rwu, syeghiay, vbian, veillard, weizhan, whuang, ydu, yoyzhang | ||||||||
| Target Milestone: | rc | ||||||||||
| Target Release: | --- | ||||||||||
| Hardware: | x86_64 | ||||||||||
| OS: | Linux | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | libvirt-0.9.4-10.el6 | Doc Type: | Bug Fix | ||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2011-12-06 11:03:50 UTC | Type: | --- | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Attachments: |
|
||||||||||
|
Description
David Naori
2011-03-31 19:27:01 UTC
Created attachment 489208 [details]
gdb
Since RHEL 6.1 External Beta has begun, and this bug remains unresolved, it has been rejected as it is not proposed as exception or blocker. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. This BZ is queued behind others I'm currently working on so I still don't have any update to put here. The problem is a combination of several factors: - all existing workers are waiting for reply from qemu - we only start new workers when accepting new client connections and not when a new requests arrives through existing connection - by starting new workers for new requests instead of new connections we would only increase the number from 5 to max_workers (20 by default) So I think we should do something clever to make libvirt robust enough to be able to survive any number of such misbehaving guests so that we can at least ask libvirtd to kill them (once this functionality is in). I was thinking about tracking how long workers are occupied with processing their current request and automatically create a new worker for in coming request if all workers are occupied for more than some limit. *** Bug 665979 has been marked as a duplicate of this bug. *** During implementation it turned out we need a slightly different approach: https://www.redhat.com/archives/libvir-list/2011-June/msg00788.html Waiting for somebody to review and ack. *** Bug 634069 has been marked as a duplicate of this bug. *** *** Bug 669777 has been marked as a duplicate of this bug. *** reproduce steps:
on libvirt-0.8.7-18.el6.x86_64
1. change on /etc/libvirt/libvirtd.conf
max_clients = max_workers + 1 (at least)
2.start 20 guest
#for i in {1..20}; do virsh start guest$i ; done
3. STGSTOP to all the qemu processes
#for i in `ps aux | grep qemu | grep -v grep | awk '{print $2}'`; do kill -19 $i;done
4. do virsh command
# virsh list
it will hang
Created attachment 517772 [details]
Overview of the way libvirt dispatchs RPC & issues involved with QEMU monitor blocking
Implementation of ideas Daniel suggested sent upstream (and wait for review): https://www.redhat.com/archives/libvir-list/2011-August/msg00710.html Moving to POST: http://post-office.corp.redhat.com/archives/rhvirt-patches/2011-September/msg00206.html (In reply to comment #10) > reproduce steps: > on libvirt-0.8.7-18.el6.x86_64 > > 1. change on /etc/libvirt/libvirtd.conf > max_clients = max_workers + 1 (at least) > > 2.start 20 guest > #for i in {1..20}; do virsh start guest$i ; done > > 3. STGSTOP to all the qemu processes > #for i in `ps aux | grep qemu | grep -v grep | awk '{print $2}'`; do kill -19 > $i;done > > 4. do virsh command > # virsh list > > it will hang Sorry for omitting 1 step before step4, should be 4. sh memstat.sh cat memstat.sh #!/bin/bash for i in {1..20} do virsh dommemstat foo_$i & done 5. do virsh list it will hang can reproduced on libvirt-0.8.7-18.el6.x86_64 qemu-kvm-0.12.1.2-2.185.el6.x86_64 kernel-2.6.32-193.el6.x86_64 When do test on libvirt-0.9.4-9.el6.x86_64 qemu-kvm-0.12.1.2-2.185.el6.x86_64 kernel-2.6.32-193.el6.x86_64 when do step 4 "sh memstat.sh", it will report error like error: Failed to get memory statistics for domain foo_2 error: End of file while reading data: Input/output error then libvirtd will crash # service libvirtd status libvirtd dead but pid file exists Created attachment 521825 [details]
libvirtd crash log
Thanks for catching that. Fixed and moving to POST: http://post-office.corp.redhat.com/archives/rhvirt-patches/2011-September/msg00261.html Thanks for resolving this so quickly. Verify pass on qemu-kvm-0.12.1.2-2.185.el6.x86_64 kernel-2.6.32-193.el6.x86_64 libvirt-0.9.4-10.el6.x86_64 The step is as comment 16 shows. After step 5, virsh list works fine without hang. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2011-1513.html |