Bug 671564 - libvirtd crash when exceeding 30 VMs
Summary: libvirtd crash when exceeding 30 VMs
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: libvirt
Version: 6.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: 6.1
Assignee: Eric Blake
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-01-21 21:12 UTC by Eric Blake
Modified: 2011-05-19 13:26 UTC (History)
7 users (show)

Fixed In Version: libvirt-0.8.7-4.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-05-19 13:26:04 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:0596 0 normal SHIPPED_LIVE libvirt bug fix and enhancement update 2011-05-18 17:56:36 UTC

Description Eric Blake 2011-01-21 21:12:56 UTC
Description of problem:
Upstream commit e6b68d7 introduced a regression where the libvirtd daemon could try to free active event handles (fds being polled or timeouts); if the event subsequently occurs, this can result in arbitrary behavior including crashes.

Version-Release number of selected component (if applicable):
libvirt-0.8.7-3.el6

How reproducible:
100%

Steps to Reproduce:
1. start 60 VMs: for i in `seq 60`; do virsh start vm$i; done
2. stop 60 VMs: for i in `seq 60`; do virsh destroy vm$i; done
3. list all known VMs: virsh list --all
  
Actual results:
libvirtd crashed immediately after stopping the 60th VM

Expected results:
libvirt should never crash

Additional info:
This may be the root cause of bug 670848, although the stack trace from the above test does not match that bug report, so this is opened as a separate BZ until we are sure that there aren't any other problems.

Initial upstream patch to fix this and one other bug:
https://www.redhat.com/archives/libvir-list/2011-January/msg00921.html

Comment 2 Daniel Veillard 2011-01-24 02:23:51 UTC
ACK, we really need to get this fixed !

thanks for chasing this, hopefully it will also solve 670848 !

Daniel

Comment 5 Min Zhan 2011-01-30 06:11:53 UTC
Verified with Passed in below environment:
# uname -a
Linux intel-5405-32-4.englab.nay.redhat.com 2.6.32-107.el6.x86_64 #1 SMP Thu Jan 27 23:11:23 EST 2011 x86_64 x86_64 x86_64 GNU/Linux

libvirt-0.8.7-4.el6.x86_64
kernel-2.6.32-107.el6.x86_64
qemu-kvm-0.12.1.2-2.132.el6.x86_64

Steps
1. start 60 VMs: 
# for i in `seq 60`; do virsh start a$i; done
Domain a1 started

Domain a2 started

Domain a3 started
...

2. stop 60 VMs: 
# for i in `seq 60`; do virsh destroy a$i; done
Domain a1 destroyed

Domain a2 destroyed

Domain a3 destroyed
....

3. list all VMs: # virsh list --all
# virsh list --all
 Id Name                 State
----------------------------------
  - a1                   shut off
  - a10                  shut off
  - a11                  shut off
  - a12                  shut off
  - a13                  shut off
  - a14                  shut off
  - a15                  shut off
  - a16                  shut off
  - a17                  shut off
  - a18                  shut off
  - a19                  shut off
  - a2                   shut off
  - a20                  shut off
.....
--------------
I have reproduced this bug with libvirt-0.8.7-3.el6. It also applys for 60 guests.
# for i in `seq 40`; do virsh start a$i; done

# for i in `seq 40`; do virsh destroy a$i; done
...
Domain a38 destroyed

Domain a39 destroyed

error: cannot recv data: : Connection reset by peer
error: failed to connect to the hypervisor


# virsh list --all
error: unable to connect to '/var/run/libvirt/libvirt-sock', libvirtd may need to be started: Connection refused
error: failed to connect to the hypervisor

# service libvirtd status
libvirtd dead but pid file exists

Comment 8 errata-xmlrpc 2011-05-19 13:26:04 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0596.html


Note You need to log in before you can comment on or make changes to this bug.