Bug 692663 - [Libvirt] Libvirtd hangs when qemu processes are unresponsive
[Libvirt] Libvirtd hangs when qemu processes are unresponsive
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: libvirt (Show other bugs)
6.1
x86_64 Linux
high Severity high
: rc
: ---
Assigned To: Michal Privoznik
Virtualization Bugs
:
: 634069 665979 669777 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2011-03-31 15:27 EDT by David Naori
Modified: 2011-12-06 06:03 EST (History)
19 users (show)

See Also:
Fixed In Version: libvirt-0.9.4-10.el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-12-06 06:03:50 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
gdb (213.47 KB, text/plain)
2011-03-31 15:33 EDT, David Naori
no flags Details
Overview of the way libvirt dispatchs RPC & issues involved with QEMU monitor blocking (9.13 KB, text/plain)
2011-08-11 06:49 EDT, Daniel Berrange
no flags Details
libvirtd crash log (64.12 KB, text/plain)
2011-09-07 05:22 EDT, weizhang
no flags Details

  None (edit)
Description David Naori 2011-03-31 15:27:01 EDT
Description of problem:
When running ~180 vms using vdsm and- SIGSTOP to 5 qemu processes libvirtd hangs forever.

Version-Release number of selected component (if applicable):
libvirt-0.8.7-15
vdsm-4.9-57

How reproducible:
100%

Steps to Reproduce:
1.run 180 vms
2.kill -19 to 5 qemu processes 

Attached - t a a bt full of libvirtd
Comment 1 David Naori 2011-03-31 15:33:40 EDT
Created attachment 489208 [details]
gdb
Comment 2 RHEL Product and Program Management 2011-04-03 22:05:28 EDT
Since RHEL 6.1 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.
Comment 3 Jiri Denemark 2011-05-09 09:09:23 EDT
This BZ is queued behind others I'm currently working on so I still don't have any update to put here.
Comment 4 Jiri Denemark 2011-05-16 04:17:11 EDT
The problem is a combination of several factors:

- all existing workers are waiting for reply from qemu
- we only start new workers when accepting new client connections and not when
  a new requests arrives through existing connection
- by starting new workers for new requests instead of new connections we would
  only increase the number from 5 to max_workers (20 by default)

So I think we should do something clever to make libvirt robust enough to be
able to survive any number of such misbehaving guests so that we can at least
ask libvirtd to kill them (once this functionality is in).

I was thinking about tracking how long workers are occupied with processing
their current request and automatically create a new worker for in coming
request if all workers are occupied for more than some limit.
Comment 5 Dave Allan 2011-06-09 21:58:44 EDT
*** Bug 665979 has been marked as a duplicate of this bug. ***
Comment 6 Michal Privoznik 2011-06-16 10:37:02 EDT
During implementation it turned out we need a slightly different approach:

https://www.redhat.com/archives/libvir-list/2011-June/msg00788.html

Waiting for somebody to review and ack.
Comment 7 Dave Allan 2011-06-17 14:29:51 EDT
*** Bug 634069 has been marked as a duplicate of this bug. ***
Comment 8 Dave Allan 2011-06-20 23:35:29 EDT
*** Bug 669777 has been marked as a duplicate of this bug. ***
Comment 10 weizhang 2011-07-17 23:36:41 EDT
reproduce steps:
on libvirt-0.8.7-18.el6.x86_64

1. change on /etc/libvirt/libvirtd.conf
max_clients = max_workers + 1 (at least)

2.start 20 guest
 #for i in {1..20}; do virsh start guest$i ; done

3. STGSTOP to all the qemu processes
 #for i in `ps aux | grep qemu | grep -v grep | awk '{print $2}'`; do kill -19 $i;done

4. do virsh command
# virsh list

it will hang
Comment 11 Daniel Berrange 2011-08-11 06:49:33 EDT
Created attachment 517772 [details]
Overview of the way libvirt dispatchs RPC & issues involved with QEMU monitor blocking
Comment 12 Michal Privoznik 2011-08-16 12:43:41 EDT
Implementation of ideas Daniel suggested sent upstream (and wait for review):

https://www.redhat.com/archives/libvir-list/2011-August/msg00710.html
Comment 16 weizhang 2011-09-07 05:21:20 EDT
(In reply to comment #10)
> reproduce steps:
> on libvirt-0.8.7-18.el6.x86_64
> 
> 1. change on /etc/libvirt/libvirtd.conf
> max_clients = max_workers + 1 (at least)
> 
> 2.start 20 guest
>  #for i in {1..20}; do virsh start guest$i ; done
> 
> 3. STGSTOP to all the qemu processes
>  #for i in `ps aux | grep qemu | grep -v grep | awk '{print $2}'`; do kill -19
> $i;done
> 
> 4. do virsh command
> # virsh list
> 
> it will hang

Sorry for omitting 1 step before step4, should be
4. sh memstat.sh

cat memstat.sh
#!/bin/bash

for i in {1..20}
do 
  virsh dommemstat foo_$i &
done

5. do virsh list
it will hang 

can reproduced on
libvirt-0.8.7-18.el6.x86_64
qemu-kvm-0.12.1.2-2.185.el6.x86_64
kernel-2.6.32-193.el6.x86_64

When do test on 
libvirt-0.9.4-9.el6.x86_64
qemu-kvm-0.12.1.2-2.185.el6.x86_64
kernel-2.6.32-193.el6.x86_64

when do step 4 "sh memstat.sh", it will report error like
error: Failed to get memory statistics for domain foo_2
error: End of file while reading data: Input/output error

then libvirtd will crash
# service libvirtd status
libvirtd dead but pid file exists
Comment 17 weizhang 2011-09-07 05:22:32 EDT
Created attachment 521825 [details]
libvirtd crash log
Comment 18 Michal Privoznik 2011-09-07 08:30:34 EDT
Thanks for catching that. Fixed and moving to POST:

http://post-office.corp.redhat.com/archives/rhvirt-patches/2011-September/msg00261.html
Comment 19 weizhang 2011-09-07 22:20:17 EDT
Thanks for resolving this so quickly.
Verify pass on
qemu-kvm-0.12.1.2-2.185.el6.x86_64
kernel-2.6.32-193.el6.x86_64
libvirt-0.9.4-10.el6.x86_64

The step is as comment 16 shows. After step 5, virsh list works fine without hang.
Comment 20 errata-xmlrpc 2011-12-06 06:03:50 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1513.html

Note You need to log in before you can comment on or make changes to this bug.