| Summary: | Research the issues/obstacles to allowing OSPd managed Ceph Node Scaling to at least 25 nodes | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Jeff Brown <jefbrown> |
| Component: | puppet-heat | Assignee: | Sébastien Han <shan> |
| Status: | CLOSED ERRATA | QA Contact: | Yogev Rabl <yrabl> |
| Severity: | high | Docs Contact: | Derek <dcadzow> |
| Priority: | medium | ||
| Version: | 11.0 (Ocata) | CC: | abond, bengland, derli, jjoyce, jliberma, jomurphy, jschluet, jtaleric, kholden, nlevinki, rsussman, scohen, shan, slinaber, sputhenp, tvignaud, twilkins |
| Target Milestone: | Upstream M3 | Keywords: | FutureFeature, Triaged |
| Target Release: | 11.0 (Ocata) | Flags: | slinaber:
needinfo?
(shan) |
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-05-17 19:36:00 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Bug Depends On: | 1372589, 1430002 | ||
| Bug Blocks: | 1387431, 1414466, 1422721 | ||
|
Description
Jeff Brown
2016-10-19 19:15:47 UTC
please add 'fixed in version' info To support Scaling to 25 Ceph nodes with as many as 36 OSDs per node, you need to address the bz 1372589 to increase file descriptor limit for compute nodes from its default of 1024. If you do this, you may be able to get much farther than 25 nodes - Tim Wilkinson and I got to 29 nodes and 1043 OSDs this way with an external Ceph cluster connected to OpenStack. Note that http://tracker.ceph.com/issues/17573 means that you don't immediately find out that something is wrong if you don't fix this, because your VM doesn't create all the ceph OSD sockets immediately - instead it creates them on demand, until it runs out of file descriptors for Ceph sockets, at which point you may get different behaviors including hangs. You can see this happened in /var/log/libvirt/qemu/*.log but it's better to just not allow this to happen and a simple config change is all it takes. Originally duplicate bz 1389503 was filed as part of Red Hat OpenStack scale lab project to get to > 1000 OSDs across 29 servers. 1389502 (kernel.pid-max0 appears to have been fixed, but was also necessary for this goal. cc'ing Rick and Andy - this impacts OpenStack Performance & Scale Release Criteria. https://docs.google.com/document/d/1I4l1UzDoykh4o9jUQJdzYkSji3iamERk8OtPDxjKj9E/edit#heading=h.gu9zkkcebb85 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1245 |