Bug 1459891
Summary: | [Docs] Director should increase kernel.thread-max on ceph backed compute nodes | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Tomas Rusnak <trusnak> |
Component: | documentation | Assignee: | RHOS Documentation Team <rhos-docs> |
Status: | CLOSED EOL | QA Contact: | RHOS Documentation Team <rhos-docs> |
Severity: | high | Docs Contact: | |
Priority: | low | ||
Version: | 10.0 (Newton) | CC: | bengland, cminkema, dwilson, jomurphy, jtaleric, kbader, mburns, mnelson, nlevinki, srevivo, twilkins |
Target Milestone: | --- | Keywords: | Documentation |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-07-07 10:40:50 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Tomas Rusnak
2017-06-08 13:10:43 UTC
Tomas, Good points, it got me thinking harder about this. We should document thread-related kernel parameter requirements for RHOSP 10 and 11, which depend on the older RHCS 2. In RHCS 3.0, Ceph is switching to a new "async messenger", replacing the "simple messenger" component that caused the massive consumption of threads per OSD and per instance (in librados). So this should not be as much of an issue at that point. RHCS 3.0 is supposed to be the release used with RHOSP 12 (Pike) and will definitely be used for RHOSP 13 (Queens). However, for RHOSP 10 and 11, which integrate with RHCS 2, this will still be an issue. I think threads_max does not have to be as big as pid_max, but with simple messenger you still need on the order of 2 threads/OSD x (guests + OSDs). I saw librados pthread_create failing to create a thread recently (RHOSP 11), even with the higher pid_max. https://bugzilla.redhat.com/show_bug.cgi?id=1461530#c8 I used this program to investigate what was going on - it just does pthread_create N times to see how many threads can be run at the same time. http://perf1.perf.lab.eng.bos.redhat.com/bengland/public/openstack/thread-create.c and looked at just kernel.threads-max, kernel.pid_max, and vm.max_map_count When I started with an untuned RHEL7.3 kernel on a 256-GB host: [root@c04-h01-6048r ~]# sysctl -a | grep threads-max kernel.threads-max = 2061221 [root@c04-h01-6048r ~]# sysctl -a | grep pid_max kernel.pid_max = 57344 [root@c04-h01-6048r ~]# sysctl -a | grep vm.max_map_count vm.max_map_count = 65530 [root@c04-h01-6048r ~]# ./thread-create 200000 thread count: 200000 fatal: Error creating thread errno 12 with thrd=56824: Cannot allocate memory [root@c04-h01-6048r ~]# sysctl -w kernel.pid_max=1048576 kernel.pid_max = 1048576 [root@c04-h01-6048r ~]# sysctl -w vm.max_map_count=400000 vm.max_map_count = 400000 [root@c04-h01-6048r ~]# ./thread-create 200000 thread count: 200000 fatal: Error creating thread errno 12 with thrd=199989: Cannot allocate memory [root@c04-h01-6048r ~]# sysctl -w vm.max_map_count=500000 vm.max_map_count = 500000 [root@c04-h01-6048r ~]# ./thread-create 200000 thread count: 200000 SUCCESS So this vm.max_map_count limits how many threads you can create! Was not obvious to me at first. Found this in a discussion of JVM thread creation. https://stackoverflow.com/questions/5635362/max-thread-per-process-in-linux Note that on a RHEL7.3 kernel with 256 GB RAM, the thread-max default for this appears to be: [root@c04-h01-6048r ~]# sysctl -a | grep threads-max kernel.threads-max = 2061221 So kernel.threads-max was likely not the problem in this case. Clearing target release pending docs triage. So my conclusion above was that vm.max_map_count had to be increased to >> 2x the total number of threads used by Ceph OSDs or RADOS clients, and before RHCS 3.0, this is quite high for a large cluster. To calculate: number of processes using librados (RBD clients, OSDs, RGWs, Cephfs clients) x number of OSDs x 2. For example, in an RHHI cluster with 36 OSDs/host, and 50 guests with Cinder volumes, and 1000 OSDs in the cluster: 50 guests/host x 1000 OSD connections/guest x 2 threads/OSD = 100000 36 OSDs/host x 1000 OSD connections/OSD x 2 threads/connection = 72000 The default value on RHEL7.4 is 65530. So Ceph would not be able to connect up the cluster. This problem does go away in RHCS 3.0 but there is still a huge problem with RHOSP 12 and RHOSP 11 support. At a minimum this needs to be documented. Can we get this fix into RHOSP12.z? This bug is not present in RHCS 3.0 or RHOSP 13 so mark it fixed in next (long-term) release? |