Description of problem:
OSP10 has best practice limits of scaling up to 10 Ceph storage nodes. For OSP11 we need to investigate what limits the level of scale and improve it to support up to 25 Ceph nodes. This starts as an investigation and it will generate a series of BZ's that will need to be resolved to allow for the increased numb er of Ceph nodes supported.
We currently do not know exactly what components will require modification, so puppet-heat that is currently set is a place holder.
Steps to Reproduce:
A usable OSP solution that scales to at least 25 Ceph nodes will be supported. The usabilility of the solution will be validated by the Storage consulting field suport team.
please add 'fixed in version' info
To support Scaling to 25 Ceph nodes with as many as 36 OSDs per node, you need to address the bz 1372589 to increase file descriptor limit for compute nodes from its default of 1024. If you do this, you may be able to get much farther than 25 nodes - Tim Wilkinson and I got to 29 nodes and 1043 OSDs this way with an external Ceph cluster connected to OpenStack.
Note that http://tracker.ceph.com/issues/17573 means that you don't immediately find out that something is wrong if you don't fix this, because your VM doesn't create all the ceph OSD sockets immediately - instead it creates them on demand, until it runs out of file descriptors for Ceph sockets, at which point you may get different behaviors including hangs. You can see this happened in /var/log/libvirt/qemu/*.log but it's better to just not allow this to happen and a simple config change is all it takes.
Originally duplicate bz 1389503 was filed as part of Red Hat OpenStack scale lab project to get to > 1000 OSDs across 29 servers. 1389502 (kernel.pid-max0 appears to have been fixed, but was also necessary for this goal.
cc'ing Rick and Andy - this impacts OpenStack Performance & Scale Release Criteria.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.