Bug 980743 - Failed to start 1000+ containters
Failed to start 1000+ containters
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt-sandbox (Show other bugs)
7.0
x86_64 Linux
high Severity medium
: rc
: ---
Assigned To: Daniel Berrange
Virtualization Bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-07-03 03:31 EDT by Alex Jia
Modified: 2014-06-13 06:16 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-06-13 06:16:23 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Alex Jia 2013-07-03 03:31:39 EDT
Description of problem:
I can successfully create 1000+ apache containers, but failed to start them together and hit many errors, for example, "Too many open files", "XML error: Invalid security label", "authentication failed", etc. I will file separated bugs for each one if necessary.

Version-Release number of selected component (if applicable):
# rpm -q libvirt-sandbox libvirt selinux-policy systemd kernel
libvirt-sandbox-0.2.0-1.el7.x86_64
libvirt-1.1.0-1.el7.x86_64
selinux-policy-3.12.1-56.el7.noarch
systemd-204-9.el7.1.x86_64
kernel-3.7.0-0.36.el7.x86_64
kernel-3.10.0-0.rc7.64.el7.x86_64


How reproducible:
always

Steps to Reproduce:
1. # echo -e "max_clients = 10000\nmax_workers = 10000" >> /etc/libvirt/libvirtd.conf
2. # systemctl restart libvirtd.service
3. # for i in {1..1001}; do virt-sandbox-service create -C -u httpd.service apache$i; done
4. # for i in {1..1001};do virt-sandbox-service start apache$i && sleep 3 & done

Actual results:

Unable to start container: Failed to create domain: Unable to create pipe: Too many open files
Unable to open connection: Unable to open lxc:///: Failed to open file '/etc/libvirt/libvirt.conf': Too many open files
.....
Unable to open connection: Unable to open lxc:///: Failed to open file '/proc/10335/stat': Too many open files
Unable to open connection: Unable to open lxc:///: Failed to find group record for gid '0': Too many open files
......
Unable to open connection: Unable to open lxc:///: authentication failed: authentication failed
......
Unable to start container: Failed to create domain: XML error: Invalid security label system_u:system_r:svirt_lxc_net_t:s0
......


Expected results:
Can successfully start 1000+ containers.

Additional info:

# getenforce
Enforcing

# sysctl fs.file-nr
fs.file-nr = 30176	0	7904760

# lsof|wc -l
203916


# lsof|grep `pidof libvirtd`|wc -l
160950

# virt-sandbox-service list|wc -l
1001

# time virt-sandbox-service list -r|wc -l
/usr/bin/virt-sandbox-service: Too many open files
87

real	3m40.013s
user	0m6.809s
sys	0m0.891sc


Notes, very slow to list running containers, and it seems the result is also incorrect with real running virt-sandbox-service-util process number, I will file a new bug to trace it.

# ps -ef|grep virt-sandbox-service-util|grep -v grep|wc -l
140

Notes, only 140 containers are successfully started.


# grep avc /var/log/audit/audit.log|grep systemd
type=USER_AVC msg=audit(1372832337.780:9008): pid=1 uid=0 auid=4294967295 ses=4294967295  subj=system_u:system_r:init_t:s0 msg='avc:  received policyload notice (seqno=2)  exe="/usr/lib/systemd/systemd" sauid=0 hostname=? addr=? terminal=?'

# systemctl status libvirtd.service

libvirtd.service - Virtualization daemon
   Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled)
   Active: active (running) since Wed 2013-07-03 13:53:12 HKT; 1h 20min ago
 Main PID: 8892 (libvirtd)
   CGroup: name=systemd:/system/libvirtd.service
           ├─ 3014 /sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf
           ├─ 8892 /usr/sbin/libvirtd
           ├─10150 /usr/libexec/libvirt_lxc --name apache82 --console 22 --security=selinux --handshake 25 --background
           ├─10621 /usr/libexec/libvirt_lxc --name apache44 --console 35 --security=selinux --handshake 38 --background

......


               ├─18997 /usr/lib/systemd/systemd-journald
               └─19312 /usr/lib/systemd/systemd-journald

Jul 03 14:11:38 localhost.localdomain libvirtd[8892]: Falling back to pseudorandom UUID, failed to generate random bytes: Too many open files
Jul 03 14:11:38 localhost.localdomain libvirtd[8892]: Falling back to pseudorandom UUID, failed to generate random bytes: Too many open files
Jul 03 14:11:38 localhost.localdomain libvirtd[8892]: Falling back to pseudorandom UUID, failed to generate random bytes: Too many open files
Jul 03 14:20:04 localhost.localdomain libvirtd[8892]: unable to make pipe: Too many open files
Jul 03 14:20:23 localhost.localdomain libvirtd[8892]: cannot open /dev/null: Too many open files
Jul 03 14:25:58 localhost.localdomain libvirtd[8892]: cannot open /dev/null: Too many open files
Jul 03 14:40:38 localhost.localdomain libvirtd[8892]: cannot open /dev/null: Too many open files
Jul 03 14:42:11 localhost.localdomain libvirtd[8892]: cannot open /dev/null: Too many open files
Jul 03 15:02:29 localhost.localdomain libvirtd[8892]: End of file while reading data: Input/output error
Jul 03 15:02:29 localhost.localdomain libvirtd[8892]: End of file while reading data: Input/output error
Comment 4 Alex Jia 2014-01-27 04:02:41 EST
If users want ro run more than 1000 containers, they must explicitly increase max open files for running libvirt daemon, it's hard to know concrete numbers for users, for details, please see the following test steps, in addition, it will be better if we have corresponding document or guide about this, Daniel, any suggestion? thanks.

BTW, the LIBVIRTD_NOFILES_LIMIT=$NUM in /etc/sysconfig/libvirtd is invalid for running libvirt daemon limitation on RHEL7, please see bug 1046189.


# sysctl fs.file-nr
fs.file-nr = 82944	0	3245263

# ulimit -n
65536

# tail -2 /etc/libvirt/libvirtd.conf
max_clients = 10000
max_workers = 10000

# grep 'open files' /proc/`pidof libvirtd`/limits
Max open files            1024                 4096                 files

Notes, it seems we should successfully start 1024 containers, but in fact, we only started 332 containers.

# systemctl restart libvirtd.service
# for i in {1..1001}; do virt-sandbox-service create -C -u httpd.service apache$i; done
# for i in {1..1001}; do virsh -c lxc:/// start apache$i; done

Only successfully start 332 containers.

# virsh -c lxc:/// -q list|wc -l
332

# tail -4 /var/log/messages
Jan 27 15:06:51 ibm-x3850x5-09 systemd: Starting Container lxc-apache332.
Jan 27 15:06:51 ibm-x3850x5-09 systemd-machined: New machine lxc-apache332.
Jan 27 15:06:51 ibm-x3850x5-09 systemd: Started Container lxc-apache332.
Jan 27 15:06:51 ibm-x3850x5-09 systemd-machined: Machine lxc-apache332 terminated.

# LIBVIRT_DEBUG=1 LIBVIRT_LOG_FILTERS="1:libvirt 1:lxc" virsh -c lxc:/// start apache333

<slice>

2014-01-27 07:16:58.660+0000: 27531: debug : virDomainGetName:3532 : domain=0x7fc082525fe0
2014-01-27 07:16:58.660+0000: 27532: debug : virEventPollCleanupTimeouts:514 : Cleanup 0
error: 2014-01-27 07:16:58.660+0000: 27532: debug : virEventPollCleanupTimeouts:550 : Found 0 out of 0 timeout slots used, releasing 0
Failed to start domain apache333
2014-01-27 07:16:58.660+0000: 27532: debug : virEventPollCleanupHandles:562 : Cleanup 2
2014-01-27 07:16:58.660+0000: 27532: debug : virEventRunDefaultImpl:270 : running default event implementation
2014-01-27 07:16:58.660+0000: 27531: debug : virDomainFree:2428 : dom=0x7fc082525fe0, (VM: name=apache333, uuid=d78a073e-b521-4ad9-9b34-1413be83ef6d)
2014-01-27 07:16:58.660+0000: 27532: debug : virEventPollCleanupTimeouts:514 : Cleanup 0
2014-01-27 07:16:58.660+0000: 27531: debug : virObjectUnref:256 : OBJECT_UNREF: obj=0x7fc082525fe0
2014-01-27 07:16:58.660+0000: 27532: debug : virEventPollCleanupTimeouts:550 : Found 0 out of 0 timeout slots used, releasing 0
2014-01-27 07:16:58.660+0000: 27531: debug : virObjectUnref:258 : OBJECT_DISPOSE: obj=0x7fc082525fe0
2014-01-27 07:16:58.660+0000: 27532: debug : virEventPollCleanupHandles:562 : Cleanup 2
2014-01-27 07:16:58.660+0000: 27531: debug : virDomainDispose:262 : release domain 0x7fc082525fe0 apache333 d78a073e-b521-4ad9-9b34-1413be83ef6d
2014-01-27 07:16:58.660+0000: 27532: debug : virEventPollMakePollFDs:391 : Prepare n=0 w=1, f=4 e=1 d=0
2014-01-27 07:16:58.661+0000: 27531: debug : virObjectUnref:256 : OBJECT_UNREF: obj=0x7fc082526050
error: 2014-01-27 07:16:58.661+0000: 27532: debug : virEventPollMakePollFDs:391 : Prepare n=1 w=2, f=6 e=1 d=0
cannot open /dev/null: Too many open files
2014-01-27 07:16:58.661+0000: 27532: debug : virEventPollCalculateTimeout:332 : Calculate expiry of 0 timers

</slice>


# lsof|grep `pidof libvirtd`|wc -l
12294

Notes, in fact, there is 12294 open files.

To increase max open files for running libvirt daemon.
# prlimit --nofile=65535:65535 --pid `pidof libvirtd`

# grep 'open files' /proc/`pidof libvirtd`/limits
Max open files            65535                65535                files

Try again.
# virsh -c lxc:/// start apache333
Domain apache333 started

Now, continue to start the rest of containers.
# for i in {334..1001}; do virsh -c lxc:/// start apache$i; done

Everything is okay now.
# virsh -c lxc:/// -q list|wc -l
1001

The current open files.
# lsof|grep `pidof libvirtd`|wc -l
34345
Comment 5 Daniel Berrange 2014-01-29 06:05:02 EST
(In reply to Alex Jia from comment #4)
> If users want ro run more than 1000 containers, they must explicitly
> increase max open files for running libvirt daemon, it's hard to know
> concrete numbers for users, for details, please see the following test
> steps, in addition, it will be better if we have corresponding document or
> guide about this, Daniel, any suggestion? thanks.

Yes, I think that there will need to be some documentation in the containers guide about configuring the host to allow many containers. That'd need a separate bug filed for docs team to look at.

> BTW, the LIBVIRTD_NOFILES_LIMIT=$NUM in /etc/sysconfig/libvirtd is invalid
> for running libvirt daemon limitation on RHEL7, please see bug 1046189.

Yes, you must set it in the systemd unit file instead.

> 
> 
> # sysctl fs.file-nr
> fs.file-nr = 82944	0	3245263
> 
> # ulimit -n
> 65536
> 
> # tail -2 /etc/libvirt/libvirtd.conf
> max_clients = 10000
> max_workers = 10000
> 
> # grep 'open files' /proc/`pidof libvirtd`/limits
> Max open files            1024                 4096                 files
> 
> Notes, it seems we should successfully start 1024 containers, but in fact,
> we only started 332 containers.
> 
> # systemctl restart libvirtd.service
> # for i in {1..1001}; do virt-sandbox-service create -C -u httpd.service
> apache$i; done
> # for i in {1..1001}; do virsh -c lxc:/// start apache$i; done
> 
> Only successfully start 332 containers.

I think you are probably hitting https://bugzilla.redhat.com/show_bug.cgi?id=1043776 which is a problem due to high rate of container creation.
Comment 6 Alex Jia 2014-01-29 23:33:34 EST
(In reply to Daniel Berrange from comment #5)
> (In reply to Alex Jia from comment #4)
> > If users want ro run more than 1000 containers, they must explicitly
> > increase max open files for running libvirt daemon, it's hard to know
> > concrete numbers for users, for details, please see the following test
> > steps, in addition, it will be better if we have corresponding document or
> > guide about this, Daniel, any suggestion? thanks.
> 
> Yes, I think that there will need to be some documentation in the containers
> guide about configuring the host to allow many containers. That'd need a
> separate bug filed for docs team to look at.

Filed a new bug 1059518 for LXC guide.

> I think you are probably hitting
> https://bugzilla.redhat.com/show_bug.cgi?id=1043776 which is a problem due
> to high rate of container creation.

Yes, maybe, Daniel, thanks a lot!

We can successfully start 1000+ containers by explicitly increasing max open files for running libvirt daemon, so move the bug to VERIFIED status.

# rpm -q libvirt libvirt-sandbox kernel
libvirt-1.1.1-21.el7.x86_64
libvirt-sandbox-0.5.0-8.el7.x86_64
kernel-3.10.0-67.el7.x86_64
Comment 7 Ludek Smid 2014-06-13 06:16:23 EDT
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.

Note You need to log in before you can comment on or make changes to this bug.