Hide Forgot
Description of problem: In the current hardware specification requirement in our documentation, there's no mention of a config that runs both OSP Controllers and Ceph Monitors on the same nodes. We have heard of customers running into performance issues when running both on the same node. The implication of running both set of services on the same nodes should be explained.
(In reply to Alexandre Marangone from comment #0) > Description of problem: > In the current hardware specification requirement in our documentation, > there's no mention of a config that runs both OSP Controllers and Ceph > Monitors on the same nodes. We have heard of customers running into > performance issues when running both on the same node. > The implication of running both set of services on the same nodes should be > explained. Hi Alexandre, Can you provide some further details? Specifically: * What performance issues are customers running into? * What specific implications should be documented? - Dan
Hi Dan, I'm adding Sheldon Mustard, he can comment better on the performance implication since one of his customer ran into issues in a high performance environment. There's also https://access.redhat.com/documentation/en/red-hat-openstack-platform/8/director-installation-and-usage/24-overcloud-requirement where the recommendations are exclusively made for OSP-Controller and do not take into account that OSP-d will colocate the Ceph Mons and OSP Controllers. Usually for a Ceph Mon, we recommend to have SSDs for the mon store and more memory as well (at least 16GB)
In the situations I have seen the performance issues generally came down to local root disk performance and ram utilization. Both of these can obviously be overcome but as Alex said I think either a disclaimer or a bump in the recommended specs would make sense. The other issue is around availability, with the mons running on the openstack controllers they become a critical piece of the availability of the ceph cluster. AFAIK the controllers could have issues and the cloud overall would be fine but with the mons on them the ceph cluster would have issues when/if you lost >50% of them. Not a case which would happen often but again I think we should warn customers somewhere about this risk.
Dan -- the main issue here is that we must encourage people not to under-spec the controller nodes if they will also be used as ceph monitors. So in the hardware requirements section, we should recommend that the controller nodes meet the minimum recommended requirements for a Ceph monitor node if they will be used as such. I don't know the exact requirements, but to minimize the risk of performance problems, we should recommend: 1. At least 16 GB of RAM 2. SSD drives for the monitor store There might be some additional doc work required to specify the location of the monitor store in OSPd to ensure it uses the SSD drives, or to mount the SSD drives in the appropriate place to ensure director uses them. Thanks again, Jacob
(In reply to jliberma from comment #5) > There might be some additional doc work required to specify the location of > the monitor store in OSPd to ensure it uses the SSD drives, or to mount the > SSD drives in the appropriate place to ensure director uses them. I might need some help with this. It looks like we'll need a script that does the following: 1. Checks /etc/fstab for an entry for /var/lib/ceph/mon. If it does, stop. If it doesn't continue. 2. Identifies the disk to use for the mon data 3. Formats it and adds a partition 4. Adds a mount in /etc/fstab at /var/lib/ceph/mon The tricky part for me is step 2... What would be the best way to identify the disk to use? jliberman, smustard, amarango -- any suggestions?
Does anyone have any further updates on comment #6?
I don't have a suggestion for identifying SSD
I think it's safe to close this bug since the composable services allow you to split the Ceph Mon from the Controller if need be. This pretty much mitigates the issues with keeping the Ceph Mon on the Controller nodes. Beyond this, I don't think I can provide any further documentation than the commit implemented in comment #7 (which is now published [1]). If further documentation is required for this issue, please feel free to reopen this BZ. [1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/11/html/red_hat_ceph_storage_for_the_overcloud/introduction#setting_requirements