Bug 692663

Summary:

[Libvirt] Libvirtd hangs when qemu processes are unresponsive

Product:

Red Hat Enterprise Linux 6

Reporter:

David Naori <dnaori>

Component:

libvirt

Assignee:

Michal Privoznik <mprivozn>

Status:

CLOSED ERRATA

QA Contact:

Virtualization Bugs <virt-bugs>

Severity:

high

Docs Contact:

Priority:

high

Version:

6.1

CC:

bowe, dallan, dyuan, eblake, gsun, hateya, jgalipea, kxiong, mgoldboi, mprivozn, nzhang, rwu, syeghiay, vbian, veillard, weizhan, whuang, ydu, yoyzhang

Target Milestone:

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

libvirt-0.9.4-10.el6

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2011-12-06 11:03:50 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
gdb	none
Overview of the way libvirt dispatchs RPC & issues involved with QEMU monitor blocking	none
libvirtd crash log	none

Description David Naori 2011-03-31 19:27:01 UTC

Description of problem:
When running ~180 vms using vdsm and- SIGSTOP to 5 qemu processes libvirtd hangs forever.

Version-Release number of selected component (if applicable):
libvirt-0.8.7-15
vdsm-4.9-57

How reproducible:
100%

Steps to Reproduce:
1.run 180 vms
2.kill -19 to 5 qemu processes 

Attached - t a a bt full of libvirtd

Comment 1 David Naori 2011-03-31 19:33:40 UTC

Created attachment 489208 [details]
gdb

Comment 2 RHEL Program Management 2011-04-04 02:05:28 UTC

Since RHEL 6.1 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 3 Jiri Denemark 2011-05-09 13:09:23 UTC

This BZ is queued behind others I'm currently working on so I still don't have any update to put here.

Comment 4 Jiri Denemark 2011-05-16 08:17:11 UTC

The problem is a combination of several factors:

- all existing workers are waiting for reply from qemu
- we only start new workers when accepting new client connections and not when
  a new requests arrives through existing connection
- by starting new workers for new requests instead of new connections we would
  only increase the number from 5 to max_workers (20 by default)

So I think we should do something clever to make libvirt robust enough to be
able to survive any number of such misbehaving guests so that we can at least
ask libvirtd to kill them (once this functionality is in).

I was thinking about tracking how long workers are occupied with processing
their current request and automatically create a new worker for in coming
request if all workers are occupied for more than some limit.

Comment 5 Dave Allan 2011-06-10 01:58:44 UTC

*** Bug 665979 has been marked as a duplicate of this bug. ***

Comment 6 Michal Privoznik 2011-06-16 14:37:02 UTC

During implementation it turned out we need a slightly different approach:

https://www.redhat.com/archives/libvir-list/2011-June/msg00788.html

Waiting for somebody to review and ack.

Comment 7 Dave Allan 2011-06-17 18:29:51 UTC

*** Bug 634069 has been marked as a duplicate of this bug. ***

Comment 8 Dave Allan 2011-06-21 03:35:29 UTC

*** Bug 669777 has been marked as a duplicate of this bug. ***

Comment 10 weizhang 2011-07-18 03:36:41 UTC

reproduce steps:
on libvirt-0.8.7-18.el6.x86_64

1. change on /etc/libvirt/libvirtd.conf
max_clients = max_workers + 1 (at least)

2.start 20 guest
 #for i in {1..20}; do virsh start guest$i ; done

3. STGSTOP to all the qemu processes
 #for i in `ps aux | grep qemu | grep -v grep | awk '{print $2}'`; do kill -19 $i;done

4. do virsh command
# virsh list

it will hang

Comment 11 Daniel Berrangé 2011-08-11 10:49:33 UTC

Created attachment 517772 [details]
Overview of the way libvirt dispatchs RPC & issues involved with QEMU monitor blocking

Comment 12 Michal Privoznik 2011-08-16 16:43:41 UTC

Implementation of ideas Daniel suggested sent upstream (and wait for review):

https://www.redhat.com/archives/libvir-list/2011-August/msg00710.html

Comment 14 Michal Privoznik 2011-09-05 16:53:27 UTC

Moving to POST:

http://post-office.corp.redhat.com/archives/rhvirt-patches/2011-September/msg00206.html

Comment 16 weizhang 2011-09-07 09:21:20 UTC

(In reply to comment #10)
> reproduce steps:
> on libvirt-0.8.7-18.el6.x86_64
> 
> 1. change on /etc/libvirt/libvirtd.conf
> max_clients = max_workers + 1 (at least)
> 
> 2.start 20 guest
>  #for i in {1..20}; do virsh start guest$i ; done
> 
> 3. STGSTOP to all the qemu processes
>  #for i in `ps aux | grep qemu | grep -v grep | awk '{print $2}'`; do kill -19
> $i;done
> 
> 4. do virsh command
> # virsh list
> 
> it will hang

Sorry for omitting 1 step before step4, should be
4. sh memstat.sh

cat memstat.sh
#!/bin/bash

for i in {1..20}
do 
  virsh dommemstat foo_$i &
done

5. do virsh list
it will hang 

can reproduced on
libvirt-0.8.7-18.el6.x86_64
qemu-kvm-0.12.1.2-2.185.el6.x86_64
kernel-2.6.32-193.el6.x86_64

When do test on 
libvirt-0.9.4-9.el6.x86_64
qemu-kvm-0.12.1.2-2.185.el6.x86_64
kernel-2.6.32-193.el6.x86_64

when do step 4 "sh memstat.sh", it will report error like
error: Failed to get memory statistics for domain foo_2
error: End of file while reading data: Input/output error

then libvirtd will crash
# service libvirtd status
libvirtd dead but pid file exists

Comment 17 weizhang 2011-09-07 09:22:32 UTC

Created attachment 521825 [details]
libvirtd crash log

Comment 18 Michal Privoznik 2011-09-07 12:30:34 UTC

Thanks for catching that. Fixed and moving to POST:

http://post-office.corp.redhat.com/archives/rhvirt-patches/2011-September/msg00261.html

Comment 19 weizhang 2011-09-08 02:20:17 UTC

Thanks for resolving this so quickly.
Verify pass on
qemu-kvm-0.12.1.2-2.185.el6.x86_64
kernel-2.6.32-193.el6.x86_64
libvirt-0.9.4-10.el6.x86_64

The step is as comment 16 shows. After step 5, virsh list works fine without hang.

Comment 20 errata-xmlrpc 2011-12-06 11:03:50 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1513.html