Bug 998365 - libvirt_lxc holds open connection to dbus daemon for too long
libvirt_lxc holds open connection to dbus daemon for too long
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt (Show other bugs)
7.0
Unspecified Unspecified
unspecified Severity unspecified
: rc
: ---
Assigned To: Daniel Berrange
Virtualization Bugs
: Regression
: 1001465 (view as bug list)
Depends On:
Blocks: 960861
  Show dependency treegraph
 
Reported: 2013-08-19 03:17 EDT by Monson Shao
Modified: 2014-09-21 18:51 EDT (History)
10 users (show)

See Also:
Fixed In Version: libvirt-1.1.1-9.el7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-06-13 06:32:56 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Monson Shao 2013-08-19 03:17:16 EDT
Description of problem:

When start the #252 container (begin fron #1), it always fails with the log:

2013-08-19 06:47:00.101+0000: 2287: debug : virDomainLookupByName:2242 : conn=0x7f596c0c2820, name=bash-0252
2013-08-19 06:47:00.107+0000: 2287: debug : virDomainFree:2406 : dom=0x7f598000ed10, (VM: name=bash-0252, uuid=6cd7a5e0-27a8-4da8-bbc3-ddd48ce93f91)
2013-08-19 06:47:00.117+0000: 24854: debug : virDomainCreate:9415 : dom=0x7f5940011250, (VM: name=bash-0252, uuid=6cd7a5e0-27a8-4da8-bbc3-ddd48ce93f91)
2013-08-19 06:47:00.119+0000: 24854: debug : virLXCProcessStart:1035 : Setting current domain def as transient
2013-08-19 06:47:00.120+0000: 24854: debug : virLXCProcessStart:1060 : Preparing host devices
2013-08-19 06:47:00.120+0000: 24854: debug : virLXCProcessStart:1076 : Generating domain security label (if required)
2013-08-19 06:47:00.121+0000: 24854: debug : virLXCProcessStart:1087 : Setting domain security labels
2013-08-19 06:47:01.067+0000: 24854: error : virNetSocketNewConnectUNIX:567 : Failed to connect socket to '/var/run/libvirt/lxc/bash-0252.sock': No such file or directory
2013-08-19 06:47:01.067+0000: 24854: debug : virLXCMonitorDispose:199 : mon=0x7f594000f090
2013-08-19 06:47:01.308+0000: 24854: error : virCommandWait:2348 : internal error: Child process (ip link del macvlan0) unexpected exit status 1: Cannot find device "macvlan0"
2013-08-19 06:47:01.309+0000: 24854: debug : virDomainFree:2406 : dom=0x7f5940011250, (VM: name=bash-0252, uuid=6cd7a5e0-27a8-4da8-bbc3-ddd48ce93f91)
2013-08-19 06:47:01.329+0000: 2282: debug : virConnectClose:1501 : conn=0x7f596c0c2820

Note that we have raise the LimitNOFILE in libvirtd:

[root@hp-dl585g7-03 ~]# systemctl show libvirtd | grep '^Limit'
LimitCPU=18446744073709551615
LimitFSIZE=18446744073709551615
LimitDATA=18446744073709551615
LimitSTACK=18446744073709551615
LimitCORE=18446744073709551615
LimitRSS=18446744073709551615
LimitNOFILE=204800
LimitAS=18446744073709551615
LimitNPROC=513372
LimitMEMLOCK=65536
LimitLOCKS=18446744073709551615
LimitSIGPENDING=513372
LimitMSGQUEUE=819200
LimitNICE=0
LimitRTPRIO=0
LimitRTTIME=18446744073709551615


Version-Release number of selected component (if applicable):
kernel-3.10.0-9.el7.x86_64
libvirt-1.1.1-2.el7.x86_64
libvirt-sandbox-0.5.0-2.el7.x86_64
libvirt-glib-0.1.7-1.el7.x86_64

How reproducible:
always

Steps to Reproduce:
1. create more than 252 containers
2. start 251 contains
3. # virsh -c lxc:/// start bash-0252

Actual results:
fail to start #252 container

Expected results:
succeed

Additional info:
Comment 1 Monson Shao 2013-08-19 03:29:06 EDT
Note: libvirtd.conf setting:

[root@hp-dl585g7-03 ~]# cat /etc/libvirt/libvirtd.conf
max_clients = 10000
max_workers = 10000
# log_level=1
log_filters="1:libvirt 1:lxc"
log_outputs="1:file:/root/libvirtd.log"
Comment 2 Alex Jia 2013-08-19 04:20:57 EDT
# virsh -c lxc:/// -q list|wc -l
253

# virsh -c lxc:/// start apache254
error: Failed to start domain apache254
error: Failed to connect socket to '/var/run/libvirt/lxc/apache254.sock': No such file or directory

# systemctl status libvirtd.service|tail -5
Aug 19 15:47:03 ibm-x3650m4-05.qe.lab.eng.nay.redhat.com systemd[1]: Starting Virtualization daemon...
Aug 19 15:47:03 ibm-x3650m4-05.qe.lab.eng.nay.redhat.com systemd[1]: Started Virtualization daemon.
Aug 19 15:47:03 ibm-x3650m4-05.qe.lab.eng.nay.redhat.com dnsmasq[13043]: read /etc/hosts - 2 addresses
Aug 19 15:47:03 ibm-x3650m4-05.qe.lab.eng.nay.redhat.com dnsmasq[13043]: read /var/lib/libvirt/dnsmasq/default.addnhosts - 0 addresses
Aug 19 15:47:03 ibm-x3650m4-05.qe.lab.eng.nay.redhat.com dnsmasq-dhcp[13043]: read /var/lib/libvirt/dnsmasq/default.hostsfile


# virsh -c lxc:/// net-list
 Name                 State      Autostart     Persistent
----------------------------------------------------------
 default              active     yes           yes

# virsh -c lxc:/// net-dumpxml default
<network connections='296'>
  <name>default</name>
  <uuid>1f74ccdf-eb2c-4e8b-8bce-93793983c9c3</uuid>
  <forward mode='nat'>
    <nat>
      <port start='1024' end='65535'/>
    </nat>
  </forward>
  <bridge name='virbr0' stp='on' delay='0' />
  <mac address='52:54:00:56:55:d5'/>
  <ip address='192.168.122.1' netmask='255.255.255.0'>
    <dhcp>
      <range start='192.168.122.2' end='192.168.122.254' />
    </dhcp>
  </ip>
</network>

The dhcp range is from '192.168.122.2' to '192.168.122.254', so should can
successfully start 253=254-2+1 guests, I think it should be a expected result for me, but I know Monson is using a MAC Vlan to assign IP to the guests then
it shouldn't touch this issue, however, from log point of view, it's the same question, Monson, could you check your network configuration? 


Libvirtd debug information:


2013-08-19 08:10:06.937+0000: 13064: debug : virDomainLookupByName:2242 : conn=0x7f10500e9ff0, name=apache254
2013-08-19 08:10:06.937+0000: 13064: debug : virDomainFree:2406 : dom=0x7f105c412ca0, (VM: name=apache254, uuid=d6086109-ff09-4f29-a075-e7ac1dcc26b1)
2013-08-19 08:10:06.941+0000: 13061: debug : virDomainCreate:9415 : dom=0x7f1040170710, (VM: name=apache254, uuid=d6086109-ff09-4f29-a075-e7ac1dcc26b1)
2013-08-19 08:10:06.944+0000: 13061: debug : virLXCProcessStart:1035 : Setting current domain def as transient
2013-08-19 08:10:06.945+0000: 13061: debug : virLXCProcessStart:1060 : Preparing host devices
2013-08-19 08:10:06.945+0000: 13061: debug : virLXCProcessStart:1076 : Generating domain security label (if required)
2013-08-19 08:10:06.946+0000: 13061: debug : virLXCProcessStart:1087 : Setting domain security labels
2013-08-19 08:10:06.947+0000: 13061: debug : virNetworkLookupByName:11509 : conn=0x7f10500e9ff0, name=default
2013-08-19 08:10:06.947+0000: 13061: debug : virNetworkIsActive:17712 : net=0x7f104011a580
2013-08-19 08:10:06.947+0000: 13061: debug : virNetworkGetBridgeName:12101 : network=0x7f104011a580
2013-08-19 08:10:06.947+0000: 13061: debug : virNetworkFree:11905 : network=0x7f104011a580
2013-08-19 08:10:06.947+0000: 13061: debug : virLXCProcessSetupInterfaceBridged:244 : calling vethCreate()
2013-08-19 08:10:06.969+0000: 13061: debug : virLXCProcessSetupInterfaceBridged:248 : parentVeth: veth253, containerVeth: veth254
2013-08-19 08:10:07.097+0000: 13061: error : virNetSocketNewConnectUNIX:567 : Failed to connect socket to '/var/run/libvirt/lxc/apache254.sock': No such file or directory
2013-08-19 08:10:07.097+0000: 13061: debug : virLXCMonitorDispose:199 : mon=0x7f1040179e80
2013-08-19 08:10:07.107+0000: 13061: error : virCommandWait:2348 : internal error: Child process (ip link del veth254) unexpected exit status 1: Cannot find device "veth254"

2013-08-19 08:10:07.109+0000: 13061: debug : virDomainFree:2406 : dom=0x7f1040170710, (VM: name=apache254, uuid=d6086109-ff09-4f29-a075-e7ac1dcc26b1)
2013-08-19 08:10:07.116+0000: 13058: debug : virConnectClose:1501 : conn=0x7f10500e9ff0

# rpm -q libvirt libvirt-sandbox kernel systemd
libvirt-1.1.1-2.el7.x86_64
libvirt-sandbox-0.5.0-2.el7.x86_64
kernel-3.10.0-9.el7.x86_64
systemd-204-9.el7.1.x86_64
Comment 3 Monson Shao 2013-08-19 04:37:16 EDT
[root@hp-dl585g7-03 ~]# virsh -c lxc:/// -q list | wc -l
251
[root@hp-dl585g7-03 ~]# virsh net-dumpxml lxc
<network connections='496'>
  <name>lxc</name>
  <uuid>d0018e3e-b22f-4dda-af26-02316b43ca55</uuid>
  <forward dev='em2' mode='bridge'>
    <interface dev='em2' connections='496'/>
  </forward>
</network>

I have no idea why there are more connections than active containers.

BTW, containers are created via script like:
# virt-sandbox-service create --network source=lxc,address=$ip/16,route=192.168.0.0/16%192.168.255.2 ${name} /bin/bash
Comment 4 Alex Jia 2013-08-19 06:26:03 EDT
I defined 4 virtual networks + 1 default virtual networks then create 253 guests on each virtual network, I also met this limit(only can successfully created 253 guests) and got a extra error as follows, but it may be not relevant with the bug.


# cat /var/log/libvirt/lxc/apache235.log
2013-08-19 08:07:26.858+0000: starting up
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin LIBVIRT_DEBUG=3 LIBVIRT_LOG_FILTERS=1:libvirt 1:lxc 1:conf 1:json 3:remote 4:event  LIBVIRT_LOG_OUTPUTS=3:stderr /usr/libexec/libvirt_lxc --name apache235 --console 22 --security=selinux --handshake 727 --background --veth veth235
2013-08-19 08:07:26.923+0000: 32722: info : libvirt version: 1.1.1, package: 2.el7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2013-08-02-14:20:06, x86-020.build.eng.bos.redhat.com)
2013-08-19 08:07:26.923+0000: 32722: error : virDBusCallMethod:1135 : The name org.freedesktop.machine1 was not provided by any .service files
PATH=/bin:/sbin TERM=linux container=lxc-libvirt container_uuid=99ba2780-ac90-4e72-94cb-32d2df1cb563 LIBVIRT_LXC_UUID=99ba2780-ac90-4e72-94cb-32d2df1cb563 LIBVIRT_LXC_NAME=apache235 /usr/libexec/libvirt-sandbox-init-lxc
Comment 10 Daniel Berrange 2013-10-11 09:23:56 EDT
The real error message was hidden in libvirt lxc logs and not reported. This error reporting will be improved in https://bugzilla.redhat.com/show_bug.cgi?id=927072

The error I believe we're hitting here is

2013-10-11 13:16:12.091+0000: 20130: error : virDBusGetSystemBus:87 : internal error: Unable to get DBus system bus connection: The maximum number of active connections for UID 0 has been reached

This happens for me when I run approximately 230 containers. The exact number will vary on every machine, depending on how many other apps are connected to dbus.

As a work around you can edit /etc/dbus-1/system.conf and set


  <limit name="max_connections_per_user">8192</limit>
  <limit name="max_completed_connections">8192</limit>

and then send SIGHUP to the dbus-daemon which is running as root, or reboot.

You should now be able to create many more containers.

The reason we've just hit this is that libvirt has switched to using systemd to create cgroups, which means each libvirt_lxc process needs to connect to dbus. Unfortunately it is never disconnecting from dbus so we hold open the connection.

We should try to close the connection, so that the dbus daemon config change is not required.
Comment 11 Daniel Berrange 2013-10-11 09:25:17 EDT
*** Bug 1001465 has been marked as a duplicate of this bug. ***
Comment 12 Daniel Berrange 2013-10-11 11:21:36 EDT
Fix posted upstream

https://www.redhat.com/archives/libvir-list/2013-October/msg00533.html
Comment 15 Alex Jia 2013-10-16 03:40:31 EDT
I successfully started 332 container without hitting error "Failed to connect socket to '/var/run/libvirt/lxc/apache$i.sock': No such file or directory".

Monson, I will feel more comfortable if you could also give a confirmation, thanks.


# virsh -c lxc:/// -q list |wc -l
332

# free
             total       used       free     shared    buffers     cached
Mem:      16147644   15023568    1124076      50684      78952    8573644
-/+ buffers/cache:    6370972    9776672
Swap:      1048572      21376    1027196


# rpm -q libvirt libvirt-sandbox kernel systemd selinux-policy
libvirt-1.1.1-9.el7.x86_64
libvirt-sandbox-0.5.0-5.el7.x86_64
kernel-3.10.0-33.el7.x86_64
systemd-207-2.el7.x86_64
selinux-policy-3.12.1-86.el7.noarch
Comment 16 Alex Jia 2013-10-31 02:04:30 EDT
Move the bug to VERIFIED status based on Comment15.
Comment 17 Ludek Smid 2014-06-13 06:32:56 EDT
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.

Note You need to log in before you can comment on or make changes to this bug.