| Summary: | [libvirt] [scalability] libvirt daemon stops responding when running 130+ vms | ||
|---|---|---|---|
| Product: | [Community] Virtualization Tools | Reporter: | Haim <hateya> |
| Component: | libvirt | Assignee: | Libvirt Maintainers <libvirt-maint> |
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | unspecified | CC: | berrange, bsettle, crobinso, danken, iheim, mgoldboi, xen-maint, yeylon |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-03-23 20:46:09 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
connected to process with gdb - see bt: #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136 #1 0x00000033b6009f97 in _L_lock_863 () from /lib64/libpthread.so.0 #2 0x00000033b6009deb in __pthread_mutex_lock (mutex=0x7f75d000aa78) at pthread_mutex_lock.c:65 #3 0x000000367a2982b0 in virDomainEventStateLock (state=0x7f75d000aa60) at conf/domain_event.c:591 #4 virDomainEventStateDeregisterConn (conn=0x7f75c8092a90, state=0x7f75d000aa60) at conf/domain_event.c:1510 #5 0x000000000045a0bc in qemudClose (conn=0x7f75c8092a90) at qemu/qemu_driver.c:908 #6 0x000000367a2c3d1b in virReleaseConnect (conn=0x7f75c8092a90) at datatypes.c:114 #7 0x000000367a2c4028 in virUnrefConnect (conn=0x7f75c8092a90) at datatypes.c:149 #8 0x000000367a296ef7 in virDomainEventCallbackListPurgeMarked (cbList=0x7f75d00143a0) at conf/domain_event.c:347 #9 virDomainEventStateFlush (state=0x7f75d000aa60) at conf/domain_event.c:1307 #10 virDomainEventTimer (timer=<optimized out>, opaque=0x7f75d000aa60) at conf/domain_event.c:630 #11 0x000000367a254da8 in virEventPollDispatchTimeouts () at util/event_poll.c:440 #12 virEventPollRunOnce () at util/event_poll.c:633 #13 0x000000367a253927 in virEventRunDefaultImpl () at util/event.c:247 #14 0x000000367a31a84d in virNetServerRun (srv=0x2700540) at rpc/virnetserver.c:736 #15 0x0000000000421e36 in main (argc=<optimized out>, argv=<optimized out>) at libvirtd.c:160 That is only one of the libvirtd threads - can you get a stack trace from all of them 'thread apply all bt'. Also the libvirtd.log file would be probably useful (In reply to comment #2) > That is only one of the libvirtd threads - can you get a stack trace from all > of them 'thread apply all bt'. > > Also the libvirtd.log file would be probably useful Daniel, Tried to reproduce last week with no success, I will try to reproduce it again in the coming week. |
Description of problem: The following issue was reproduced on 2 different machines: - host running 130+ vms - at certain point, libvirt daemon fails to respond to calls, hence, vdsm and hyper-visor operation starting to fail. [root@nott-vds3 ~]# pgrep qemu | wc -l 134 [root@nott-vds3 ~]# ps -ww `pgrep libvirt` PID TTY STAT TIME COMMAND 1551 ? Sl 9:02 libvirtd --daemon --listen - virsh commands not responding in anyway [root@nott-vds3 ~]# virsh -r capabilities setlocale: No such file or directory - libvirt log is not written [root@nott-vds3 ~]# cat /var/log/libvirtd.log [root@nott-vds3 ~]# [root@nott-vds3 ~]# free -m total used free shared buffers cached Mem: 32166 19475 12690 0 22 139 -/+ buffers/cache: 19313 12853 Swap: 15999 48 15951 [root@nott-vds3 ~]# rpm -q libvirt libvirt-0.9.9-1.fc16.x86_64 Thread-3177::ERROR::2012-02-09 18:28:36,637::libvirtconnection::89::vds::(wrapper) connection to libvirt broken. taking vdsm down. Thread-3163::ERROR::2012-02-09 18:28:36,638::libvirtconnection::89::vds::(wrapper) connection to libvirt broken. taking vdsm down. Thread-3163::DEBUG::2012-02-09 18:28:36,640::clientIF::127::vds::(prepareForShutdown) cannot run prepareForShutdown concurrently Thread-3177::DEBUG::2012-02-09 18:28:36,641::task::588::TaskManager.Task::(_updateState) Task=`fc4feaf5-d9ae-435b-9ff4-e159cae1cdec`::moving from state init -> state preparing Thread-3186::ERROR::2012-02-09 18:28:36,642::libvirtconnection::89::vds::(wrapper) connection to libvirt broken. taking vdsm down. Thread-3154::ERROR::2012-02-09 18:28:36,642::libvirtconnection::89::vds::(wrapper) connection to libvirt broken. taking vdsm down. Thread-3163::WARNING::2012-02-09 18:28:36,643::libvirtvm::1197::vm.Vm::(_domDependentInit) vmId=`d02b8af8-8ef9-4789-998c-c956dc6c1a54`::failed to set Vm niceness Traceback (most recent call last): File "/usr/share/vdsm/libvirtvm.py", line 1195, in _domDependentInit self._dom.setSchedulerParameters({'cpu_shares': (20 - nice) * 51}) File "/usr/share/vdsm/libvirtvm.py", line 483, in f ret = attr(*args, **kwargs) File "/usr/share/vdsm/libvirtconnection.py", line 79, in wrapper ret = f(*args, **kwargs) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1599, in setSchedulerParameters if ret == -1: raise libvirtError ('virDomainSetSchedulerParameters() failed', dom=self) libvirtError: Cannot write data: Broken pipe