Bug 1372589

Summary:	Please bump up qemu.conf the max_files to 131072 and max_processes to 65536
Product:	Red Hat OpenStack	Reporter:	Robin Cernin <rcernin>
Component:	openstack-tripleo-heat-templates	Assignee:	Giulio Fidente <gfidente>
Status:	CLOSED ERRATA	QA Contact:	Yogev Rabl <yrabl>
Severity:	medium	Docs Contact:	Don Domingo <ddomingo>
Priority:	medium
Version:	8.0 (Liberty)	CC:	bengland, dbecker, ddomingo, gfidente, jcoufal, jefbrown, johfulto, jomurphy, kbader, mburns, mcornea, mnelson, morazi, rhel-osp-director-maint, scohen, tpetr, twilkins, vumrao
Target Milestone:	Upstream M3	Keywords:	Triaged
Target Release:	11.0 (Ocata)
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	openstack-tripleo-heat-templates-6.0.0-0.20170127041112.ce54697.el7ost.1.noarch	Doc Type:	Enhancement
Doc Text:	It is now possible to use puppet hieradata to set the max_files and max_processes for QEMU instances spawned by libvirtd. This can be done through an environment file containing the appropriate puppet classes. For example, to set the max_files and max_processes to 32768 and 131072 respectively, use: parameter_defaults: ExtraConfig nova::compute::libvirt::qemu::max_files: 32768 nova::compute::libvirt::qemu::max_processes: 131072 This update also sets these values as the default, since QEMU instances launched by libvirtd might consume a large number of file descriptors or threads. This depends on Compute guest hosted on each compute node and of Ceph RBD images each instance attaches to. It is necessary to be able to configure these limits in large clusters. With these new default values, the Compute service should be able to use more than 700 OSDs. This was previously identified as the limit imposed by the low number of max_files (originally 1024).	Story Points:	---
Clone Of:
Clones:	1430002 (view as bug list)		Environment:
Last Closed:	2017-05-17 19:32:55 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1386905, 1387431, 1414466, 1414467, 1430002

Description Robin Cernin 2016-09-02 07:22:45 UTC

Request for adding deployment option to director to bump up the max_files and max_processes in /etc/libvirt/qemu.conf

We are seeing that the current default limit deployed in RHEL7 is 1024 that is not enough for deployment with Ceph cluster.

When we have a Ceph cluster, each librbd needs 1 fd and 2 threads(read and write), so in worst case each RBD is talking to each OSD in cluster.

For a medium size cluster it is 200 OSDs:

  Each OSD needs 1 fd and 2 threads:
  - This means we will need 200 fds and 400 threads for 1 RBD image.

  max_files = fds (open fds)
  max_processes = threads

 - Lets consider that user has not more than 500 RBD images in these 200 OSDs
 
  max_files = 500(RBD images) * 200(fds) = 100000
  max_processes = 400(threads) * 200(fds) =  80000

Yet this is the worst scenario, and we think it would make sense to bump the value for OSP director to max_files = 131072 max_processes = 65536:

/etc/libvirt/qemu.conf
  max_files = 131072
  max_processes = 65536

Thank you,
Robin Cernin

Comment 2 Vikhyat Umrao 2016-09-02 13:49:26 UTC

little correction:

Each connection from RBD image to OSD needs 1 fd and 2 threads for example if you have 200 OSDs:
  - This means we will need 200 fds and 400 threads for 1 RBD image.

  max_files = fds (open fds)
  max_processes = threads

 - Lets consider that user has not more than 500 RBD images in these 200 OSDs
 
  max_files = 500(RBD images) * 200(fds) = 100000
  max_processes = 500(RBD images) * 400 (threads) =  200000

But as number of files limit is per process basis and number of process limit is system wide user.

So we hit mostly FD limits not the max_processes limit.


- Same is given in man page of getrlimit function , means number of files limit is per process and processes limit is for system wide user.
   # man 2 getrlimit

   RLIMIT_NOFILE
              Specifies a value one greater than the maximum file descriptor number that can be opened by this process.  Attempts (open(2), pipe(2), dup(2), etc.)  to exceed  this  limit  yield  the  error
              EMFILE.  (Historically, this limit was named RLIMIT_OFILE on BSD.)

       RLIMIT_NPROC
              The  maximum  number of processes (or, more precisely on Linux, threads) that can be created for the real user ID of the calling process.  Upon encountering this limit, fork(2) fails with the
              error EAGAIN.  This limit is not enforced for processes that have either the CAP_SYS_ADMIN or the CAP_SYS_RESOURCE capability.

- For now we can start from here ---

/etc/libvirt/qemu.conf
  max_files = 131072
  max_processes = 65536

Comment 3 Vikhyat Umrao 2016-09-02 13:51:39 UTC

For more information please check below given article:

Ceph - VM hangs when transfering large amounts of data to RBD disk
https://access.redhat.com/solutions/1602683

Comment 4 Jaromir Coufal 2016-10-18 14:30:10 UTC

This seems as Ceph requirement and its related issue in QEMU. Please re-assign if the evaluation of the group assignment is wrong.

Comment 5 Giulio Fidente 2016-11-23 10:57:09 UTC

*** Bug 1389503 has been marked as a duplicate of this bug. ***

Comment 9 Ben England 2016-12-16 22:29:09 UTC

Thanks for adjusting this!  max_files seems fine.

I'm a little unclear about what max_processes means, exactly.  Is it the maximum number of threads per user?  Sorry to be pedantic, I may be overthinking this, just trying to get a picture of what is being tuned.  How does this max_processes relate to kernel.pid-max change here:

https://bugzilla.redhat.com/show_bug.cgi?id=1389502#c8

Comment 10 Giulio Fidente 2016-12-19 10:32:49 UTC

(In reply to Ben England from comment #9)
> Thanks for adjusting this!  max_files seems fine.
> 
> I'm a little unclear about what max_processes means, exactly.  Is it the
> maximum number of threads per user?  Sorry to be pedantic, I may be
> overthinking this, just trying to get a picture of what is being tuned.  How
> does this max_processes relate to kernel.pid-max change here:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1389502#c8

hi Ben, yes max_processes in qemu.conf seems to set the maximum number of processes (counting threads too) for the user which libvirtd uses to launch qemu instances.

The pid-max systctl is global, not per-user and I've basically multiplied all three (max_files too) by a factor of 32, yet I guess the real goal for the bugs is to make them customizable.

Comment 18 Yogev Rabl 2017-04-13 13:30:38 UTC

verified on openstack-tripleo-heat-templates-6.0.0-3.el7ost.noarch

1) the heat templates have been merged to tripleo-heat-templates
2) the configuration on /etc/libvirt/qemu.conf has been set properly

Comment 19 errata-xmlrpc 2017-05-17 19:32:55 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1245

Comment 20 Ben England 2017-06-13 19:33:59 UTC

The problem reoccurred on RHOSP 11, see comment https://bugzilla.redhat.com/show_bug.cgi?id=1430002#c10.  Does /etc/security/limits.d/20-nproc.conf need to bump up the process limit from 4096 to a higher value?  If so I will re-open bz.

Comment 21 Ben England 2017-06-13 20:52:14 UTC

I verified that the qemu user does not have privs to create enough threads for librados with 1000 OSDs.   This problem is going away with RHCS 3 with async messenger, which requires way fewer threads.

The program here:

http://perf1.perf.lab.eng.bos.redhat.com/bengland/public/openstack/
thread-create.c

tests thread creation limits, and indeed it fails because of limits on the qemu account placed by /etc/security/limits.d/20-nproc.conf , but if you raise this limit or change it with ulimit, then the problem goes away.  So I guess that's the workaround.  I had to change qemu account in /etc/passwd to allow a shell to do this.

# su - qemu
Last login: Tue Jun 13 20:43:25 UTC 2017 on pts/0
-bash-4.2$ /tmp/thread-create 4096
thread count: 4096
cat /proc/358010/limits
Limit                     Soft Limit           Hard Limit           Units  
...   
Max processes             4096                 1030485              processes 
...
x: 0
fatal: Error creating thread
errno 11: Resource temporarily unavailable

-bash-4.2$ tail /etc/security/limits.d/20-nproc.conf 
# Default limit for number of user's processes to prevent
# accidental fork bombs.
# See rhbz #432903 for reasoning.

*          soft    nproc     4096
root       soft    nproc     unlimited