Bug 1430002

Summary: Please bump up qemu.conf the max_files to 131072 and max_processes to 65536
Product: Red Hat OpenStack Reporter: Giulio Fidente <gfidente>
Component: openstack-tripleo-heat-templatesAssignee: John Fulton <johfulto>
Status: CLOSED ERRATA QA Contact: Yogev Rabl <yrabl>
Severity: medium Docs Contact:
Priority: medium    
Version: 10.0 (Newton)CC: bengland, dbecker, ddomingo, gfidente, jcoufal, jefbrown, johfulto, jomurphy, jschluet, kbader, mburns, mcornea, mnelson, morazi, rcernin, rhel-osp-director-maint, scohen, sgordon, tpetr, twilkins, vumrao, yrabl
Target Milestone: z3Keywords: Triaged, ZStream
Target Release: 10.0 (Newton)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-5.2.0-9.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1372589 Environment:
Last Closed: 2017-06-28 14:46:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1372589    
Bug Blocks: 1386905, 1387431, 1414466    

Comment 1 Giulio Fidente 2017-03-07 16:00:08 UTC
Note, the three submissions linked by the BZ are for three different components:

openstack-tripleo-heat-templates
puppet-tripleo
puppet-nova

They will all need to be backported into the newton branches.

Comment 2 John Fulton 2017-03-08 07:00:36 UTC
The following went cleanly: 

 https://review.openstack.org/#/c/442969/
 https://review.openstack.org/#/c/442970/

There was a merge conflict for:

 https://review.openstack.org/#/c/411987/

I will follow up as it I don't think it should be too hard too add the following three lines to tripleo-heat-templates/puppet/services/nova-libvirt.yaml in Newton at first glance. 

            nova::compute::libvirt::qemu::configure_qemu: true
            nova::compute::libvirt::qemu::max_files: 32768
            nova::compute::libvirt::qemu::max_processes: 131072

Comment 4 John Fulton 2017-03-21 14:33:21 UTC
Newton backports ready to be tested. After I test I will take out of WIP status for CI and then review: 

- https://review.openstack.org/#/c/448122 THT
- https://review.openstack.org/#/c/442970 puppet-tripleo
- https://review.openstack.org/#/c/442969 puppet-nova

Comment 5 John Fulton 2017-04-03 17:49:13 UTC
Upstream changes for this BZ merged: 

 https://review.openstack.org/#/q/topic:bug/1673995

Comment 9 Yogev Rabl 2017-05-24 20:07:36 UTC
verified on openstack-tripleo-heat-templates-5.2.0-18.el7ost.noarch

Comment 10 Ben England 2017-06-13 18:28:25 UTC
Does bz status need to change?  or is this a new bz?

the changes above did alter the config, but the problem is still happening :-(   Ceph librados saw that pthread_create did not succeed.  This is RHOSP 11 now.

Thread::try_create(): pthread_create failed with error 11common/Thread.cc: In function 'void Thread::create(const char*, size_t)' thread 7fb744feb700 time 2017-06-08 13:44:55.748364
common/Thread.cc: 160: FAILED assert(ret == 0)
 ceph version 10.2.5-37.el7cp (033f137cde8573cfc5a4662b4ed6a63b8a8d1464)
 1: (()+0x175375) [0x7fb7634d6375]
 2: (()+0x198d2a) [0x7fb7634f9d2a]
 3: (()+0x3362c5) [0x7fb7636972c5]
 4: (()+0x33697e) [0x7fb76369797e]
 5: (()+0xd1b6e) [0x7fb763432b6e]
 6: (()+0xd27d7) [0x7fb7634337d7]
 7: (()+0xd5992) [0x7fb763436992]
 8: (()+0xd5cad) [0x7fb763436cad]
 9: (()+0xa960b) [0x7fb76340a60b]
 10: (librados::IoCtx::aio_operate(std::string const&, librados::AioCompletion*, librados::ObjectWriteOperation*, unsigned long, std::vector<unsigned long, std::allocator<unsigned long> >&)+0xe1) [0x7fb7633d7341]
 11: (()+0x88159) [0x7fb76cb6a159]
 12: (()+0x8867b) [0x7fb76cb6a67b]
 13: (()+0x89f8e) [0x7fb76cb6bf8e]
 14: (()+0x8b0ad) [0x7fb76cb6d0ad]
 15: (()+0x77f69) [0x7fb76cb59f69]
 16: (()+0x9036a) [0x7fb76cb7236a]
 17: (()+0x9ec6d) [0x7fb7633ffc6d]
 18: (()+0x87019) [0x7fb7633e8019]
 19: (()+0x174526) [0x7fb7634d5526]
 20: (()+0x7dc5) [0x7fb75e65edc5]
 21: (clone()+0x6d) [0x7fb75e38d73d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2017-06-08 13:44:56.108+0000: shutting down


I noticed that for qemu the ulimit -u value for processes was 4096, much lower than the desired limit for qemu-kvm guests.  Which value takes precedence?  This one?

# more /etc/security/limits.d/20-nproc.conf 
# Default limit for number of user's processes to prevent
# accidental fork bombs.
# See rhbz #432903 for reasoning.
*          soft    nproc     4096
root       soft    nproc     unlimited

or this one:

root@overcloud-osdcompute-27:~
# tail -2 /etc/libvirt/qemu.conf 
max_files = 32768
max_processes = 131072

Comment 11 Ben England 2017-06-13 19:30:39 UTC
Sorry, wrong bz, Tim and I were running Ocata.

stack@b10-h25-r620:~
$ rpm -q openstack-tripleo-heat-templates
openstack-tripleo-heat-templates-6.0.0-10.el7ost.noarch

OSP10 --> https://bugzilla.redhat.com/show_bug.cgi?id=1430002
OSP11 --> https://bugzilla.redhat.com/show_bug.cgi?id=1372589

But they both had similar fixes, right?  Switching to 1372589.

Comment 13 errata-xmlrpc 2017-06-28 14:46:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1585

Comment 16 Martin Schuppert 2018-10-26 06:58:19 UTC
*** Bug 1263828 has been marked as a duplicate of this bug. ***