Bug 789098 - [libvirt] [scalability] libvirt daemon stops responding when running 130+ vms
Summary: [libvirt] [scalability] libvirt daemon stops responding when running 130+ vms
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Virtualization Tools
Classification: Community
Component: libvirt
Version: unspecified
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
Assignee: Libvirt Maintainers
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-02-09 19:47 UTC by Haim
Modified: 2016-03-23 20:46 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-03-23 20:46:09 UTC


Attachments (Terms of Use)

Description Haim 2012-02-09 19:47:26 UTC
Description of problem:

The following issue was reproduced on 2 different machines: 

- host running 130+ vms 
- at certain point, libvirt daemon fails to respond to calls, hence, vdsm and hyper-visor operation starting to fail. 

[root@nott-vds3 ~]# pgrep qemu  | wc -l
134

[root@nott-vds3 ~]# ps -ww `pgrep libvirt`
  PID TTY      STAT   TIME COMMAND
 1551 ?        Sl     9:02 libvirtd --daemon --listen

- virsh commands not responding in anyway

[root@nott-vds3 ~]# virsh -r capabilities
setlocale: No such file or directory

- libvirt log is not written
[root@nott-vds3 ~]# cat /var/log/libvirtd.log
[root@nott-vds3 ~]# 

[root@nott-vds3 ~]# free -m 
             total       used       free     shared    buffers     cached
Mem:         32166      19475      12690          0         22        139
-/+ buffers/cache:      19313      12853
Swap:        15999         48      15951

[root@nott-vds3 ~]# rpm -q libvirt
libvirt-0.9.9-1.fc16.x86_64

Thread-3177::ERROR::2012-02-09 18:28:36,637::libvirtconnection::89::vds::(wrapper) connection to libvirt broken. taking vdsm down.
Thread-3163::ERROR::2012-02-09 18:28:36,638::libvirtconnection::89::vds::(wrapper) connection to libvirt broken. taking vdsm down.
Thread-3163::DEBUG::2012-02-09 18:28:36,640::clientIF::127::vds::(prepareForShutdown) cannot run prepareForShutdown concurrently
Thread-3177::DEBUG::2012-02-09 18:28:36,641::task::588::TaskManager.Task::(_updateState) Task=`fc4feaf5-d9ae-435b-9ff4-e159cae1cdec`::moving from state init -> state preparing
Thread-3186::ERROR::2012-02-09 18:28:36,642::libvirtconnection::89::vds::(wrapper) connection to libvirt broken. taking vdsm down.
Thread-3154::ERROR::2012-02-09 18:28:36,642::libvirtconnection::89::vds::(wrapper) connection to libvirt broken. taking vdsm down.
Thread-3163::WARNING::2012-02-09 18:28:36,643::libvirtvm::1197::vm.Vm::(_domDependentInit) vmId=`d02b8af8-8ef9-4789-998c-c956dc6c1a54`::failed to set Vm niceness
Traceback (most recent call last):
  File "/usr/share/vdsm/libvirtvm.py", line 1195, in _domDependentInit
    self._dom.setSchedulerParameters({'cpu_shares': (20 - nice) * 51})
  File "/usr/share/vdsm/libvirtvm.py", line 483, in f
    ret = attr(*args, **kwargs)
  File "/usr/share/vdsm/libvirtconnection.py", line 79, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1599, in setSchedulerParameters
    if ret == -1: raise libvirtError ('virDomainSetSchedulerParameters() failed', dom=self)
libvirtError: Cannot write data: Broken pipe

Comment 1 Haim 2012-02-12 07:47:26 UTC
connected to process with gdb - see bt:

#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1  0x00000033b6009f97 in _L_lock_863 () from /lib64/libpthread.so.0
#2  0x00000033b6009deb in __pthread_mutex_lock (mutex=0x7f75d000aa78) at pthread_mutex_lock.c:65
#3  0x000000367a2982b0 in virDomainEventStateLock (state=0x7f75d000aa60) at conf/domain_event.c:591
#4  virDomainEventStateDeregisterConn (conn=0x7f75c8092a90, state=0x7f75d000aa60) at conf/domain_event.c:1510
#5  0x000000000045a0bc in qemudClose (conn=0x7f75c8092a90) at qemu/qemu_driver.c:908
#6  0x000000367a2c3d1b in virReleaseConnect (conn=0x7f75c8092a90) at datatypes.c:114
#7  0x000000367a2c4028 in virUnrefConnect (conn=0x7f75c8092a90) at datatypes.c:149
#8  0x000000367a296ef7 in virDomainEventCallbackListPurgeMarked (cbList=0x7f75d00143a0) at conf/domain_event.c:347
#9  virDomainEventStateFlush (state=0x7f75d000aa60) at conf/domain_event.c:1307
#10 virDomainEventTimer (timer=<optimized out>, opaque=0x7f75d000aa60) at conf/domain_event.c:630
#11 0x000000367a254da8 in virEventPollDispatchTimeouts () at util/event_poll.c:440
#12 virEventPollRunOnce () at util/event_poll.c:633
#13 0x000000367a253927 in virEventRunDefaultImpl () at util/event.c:247
#14 0x000000367a31a84d in virNetServerRun (srv=0x2700540) at rpc/virnetserver.c:736
#15 0x0000000000421e36 in main (argc=<optimized out>, argv=<optimized out>) at libvirtd.c:160

Comment 2 Daniel Berrangé 2012-02-13 09:44:43 UTC
That is only one of the libvirtd threads - can you get a stack trace from all of them 'thread apply all bt'.

Also the libvirtd.log file would be probably useful

Comment 3 Haim 2012-02-19 07:50:03 UTC
(In reply to comment #2)
> That is only one of the libvirtd threads - can you get a stack trace from all
> of them 'thread apply all bt'.
> 
> Also the libvirtd.log file would be probably useful

Daniel, Tried to reproduce last week with no success, I will try to reproduce it again in the coming week.


Note You need to log in before you can comment on or make changes to this bug.