Note, the three submissions linked by the BZ are for three different components: openstack-tripleo-heat-templates puppet-tripleo puppet-nova They will all need to be backported into the newton branches.
The following went cleanly: https://review.openstack.org/#/c/442969/ https://review.openstack.org/#/c/442970/ There was a merge conflict for: https://review.openstack.org/#/c/411987/ I will follow up as it I don't think it should be too hard too add the following three lines to tripleo-heat-templates/puppet/services/nova-libvirt.yaml in Newton at first glance. nova::compute::libvirt::qemu::configure_qemu: true nova::compute::libvirt::qemu::max_files: 32768 nova::compute::libvirt::qemu::max_processes: 131072
Newton backports ready to be tested. After I test I will take out of WIP status for CI and then review: - https://review.openstack.org/#/c/448122 THT - https://review.openstack.org/#/c/442970 puppet-tripleo - https://review.openstack.org/#/c/442969 puppet-nova
Upstream changes for this BZ merged: https://review.openstack.org/#/q/topic:bug/1673995
verified on openstack-tripleo-heat-templates-5.2.0-18.el7ost.noarch
Does bz status need to change? or is this a new bz? the changes above did alter the config, but the problem is still happening :-( Ceph librados saw that pthread_create did not succeed. This is RHOSP 11 now. Thread::try_create(): pthread_create failed with error 11common/Thread.cc: In function 'void Thread::create(const char*, size_t)' thread 7fb744feb700 time 2017-06-08 13:44:55.748364 common/Thread.cc: 160: FAILED assert(ret == 0) ceph version 10.2.5-37.el7cp (033f137cde8573cfc5a4662b4ed6a63b8a8d1464) 1: (()+0x175375) [0x7fb7634d6375] 2: (()+0x198d2a) [0x7fb7634f9d2a] 3: (()+0x3362c5) [0x7fb7636972c5] 4: (()+0x33697e) [0x7fb76369797e] 5: (()+0xd1b6e) [0x7fb763432b6e] 6: (()+0xd27d7) [0x7fb7634337d7] 7: (()+0xd5992) [0x7fb763436992] 8: (()+0xd5cad) [0x7fb763436cad] 9: (()+0xa960b) [0x7fb76340a60b] 10: (librados::IoCtx::aio_operate(std::string const&, librados::AioCompletion*, librados::ObjectWriteOperation*, unsigned long, std::vector<unsigned long, std::allocator<unsigned long> >&)+0xe1) [0x7fb7633d7341] 11: (()+0x88159) [0x7fb76cb6a159] 12: (()+0x8867b) [0x7fb76cb6a67b] 13: (()+0x89f8e) [0x7fb76cb6bf8e] 14: (()+0x8b0ad) [0x7fb76cb6d0ad] 15: (()+0x77f69) [0x7fb76cb59f69] 16: (()+0x9036a) [0x7fb76cb7236a] 17: (()+0x9ec6d) [0x7fb7633ffc6d] 18: (()+0x87019) [0x7fb7633e8019] 19: (()+0x174526) [0x7fb7634d5526] 20: (()+0x7dc5) [0x7fb75e65edc5] 21: (clone()+0x6d) [0x7fb75e38d73d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 2017-06-08 13:44:56.108+0000: shutting down I noticed that for qemu the ulimit -u value for processes was 4096, much lower than the desired limit for qemu-kvm guests. Which value takes precedence? This one? # more /etc/security/limits.d/20-nproc.conf # Default limit for number of user's processes to prevent # accidental fork bombs. # See rhbz #432903 for reasoning. * soft nproc 4096 root soft nproc unlimited or this one: root@overcloud-osdcompute-27:~ # tail -2 /etc/libvirt/qemu.conf max_files = 32768 max_processes = 131072
Sorry, wrong bz, Tim and I were running Ocata. stack@b10-h25-r620:~ $ rpm -q openstack-tripleo-heat-templates openstack-tripleo-heat-templates-6.0.0-10.el7ost.noarch OSP10 --> https://bugzilla.redhat.com/show_bug.cgi?id=1430002 OSP11 --> https://bugzilla.redhat.com/show_bug.cgi?id=1372589 But they both had similar fixes, right? Switching to 1372589.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1585
*** Bug 1263828 has been marked as a duplicate of this bug. ***