Description of problem: RGW container fails to start with: Jul 15 14:20:43 servera kernel: cgroup: fork rejected by pids controller in /machine.slice/libpod-54832992fcbbbf92b0d10d0491f7ff987728bec87c1c55b79cb3921c6f503f49.scope Jul 15 14:20:43 servera conmon[34853]: terminate called after throwing an instance of 'std::system_error' Jul 15 14:20:43 servera conmon[34853]: what(): Resource temporarily unavailable Jul 15 14:20:43 servera conmon[34853]: *** Caught signal (Aborted) ** Jul 15 14:20:43 servera conmon[34853]: in thread 7f5d605e1280 The podman default pids-limit is set to 2048. $ grep . sys/fs/cgroup/pids/machine.slice/libpod-*/pids.max sys/fs/cgroup/pids/machine.slice/libpod-9336707e04da464b9128b7c57a0ee9b70efc5acb5207ea03ab413583c2264283.scope/pids.max:2048 sys/fs/cgroup/pids/machine.slice/libpod-d0eb2257c2fb371e015af9b68c716ed280399d9d2e88925aad9323ac26f659f3.scope/pids.max:2048 While this value of 2048 is more than sufficient when the rgw thread pool size uses its default value of 512, when rgw thread pool size is increased up to a value near to the pids-limit value, it does not leave place for the other processes to spawn and run within the container and the container crashes. Version-Release number of selected component (if applicable): ceph 4.2z2 How reproducible: Steps to Reproduce: 1. Change the value of rgw thread pool size from 512 to 2048 and redeploy the RGW using ceph-ansible. 2. start the RGW using systemctl start ceph-radosgw.rgwX Actual results: Jul 15 14:20:42 servera systemd[1]: Started Ceph RGW. Jul 15 14:20:42 servera conmon[34853]: 2021-07-15 14:20:42.800 7f5d605e1280 0 deferred set uid:gid to 167:167 (ceph:ceph) Jul 15 14:20:42 servera conmon[34853]: 2021-07-15 14:20:42.800 7f5d605e1280 0 ceph version 14.2.11-181.el8cp (68fea1005601531fe60d2979c56ea63bc073c84f) nautilus (stable), process radosgw, pid 109 Jul 15 14:20:42 servera conmon[35177]: 2021-07-15 14:20:42 /opt/ceph-container/bin/entrypoint.sh: static: does not generate config Jul 15 14:20:43 servera conmon[35177]: HEALTH_WARN 1 pools have too few placement groups; 20 pools have too many placement groups Jul 15 14:20:43 servera conmon[34853]: 2021-07-15 14:20:43.245 7f5d605e1280 0 starting handler: beast Jul 15 14:20:43 servera conmon[34853]: 2021-07-15 14:20:43.246 7f5d605e1280 0 set uid:gid to 167:167 (ceph:ceph) Jul 15 14:20:43 servera kernel: cgroup: fork rejected by pids controller in /machine.slice/libpod-54832992fcbbbf92b0d10d0491f7ff987728bec87c1c55b79cb3921c6f503f49.scope Jul 15 14:20:43 servera conmon[34853]: terminate called after throwing an instance of 'std::system_error' Jul 15 14:20:43 servera conmon[34853]: what(): Resource temporarily unavailable Jul 15 14:20:43 servera conmon[34853]: *** Caught signal (Aborted) ** Jul 15 14:20:43 servera conmon[34853]: in thread 7f5d605e1280 thread_name:radosgw Jul 15 14:20:43 servera conmon[34853]: ceph version 14.2.11-181.el8cp (68fea1005601531fe60d2979c56ea63bc073c84f) nautilus (stable) Jul 15 14:20:43 servera conmon[34853]: 1: (()+0x12b20) [0x7f5d53285b20] Jul 15 14:20:43 servera conmon[34853]: 2: (gsignal()+0x10f) [0x7f5d525b637f] Jul 15 14:20:43 servera conmon[34853]: 3: (abort()+0x127) [0x7f5d525a0db5] Jul 15 14:20:43 servera conmon[34853]: 4: (()+0x9009b) [0x7f5d52f6e09b] Jul 15 14:20:43 servera conmon[34853]: 5: (()+0x9653c) [0x7f5d52f7453c] Jul 15 14:20:43 servera conmon[34853]: 6: (()+0x96597) [0x7f5d52f74597] Jul 15 14:20:43 servera conmon[34853]: 7: (()+0x967f8) [0x7f5d52f747f8] Jul 15 14:20:43 servera conmon[34853]: 8: (()+0x9223b) [0x7f5d52f7023b] Jul 15 14:20:43 servera conmon[34853]: 9: (()+0xc2e9d) [0x7f5d52fa0e9d] Jul 15 14:20:43 servera conmon[34853]: 10: (RGWAsioFrontend::run()+0x1c5) [0x55a5bfa88b85] Jul 15 14:20:43 servera conmon[34853]: 11: (main()+0x2851) [0x55a5bfa2d851] Jul 15 14:20:43 servera conmon[34853]: 12: (__libc_start_main()+0xf3) [0x7f5d525a2493] Jul 15 14:20:43 servera conmon[34853]: 13: (_start()+0x2e) [0x55a5bfa47cae] Jul 15 14:20:43 servera conmon[34853]: 2021-07-15 14:20:43.303 7f5d605e1280 -1 *** Caught signal (Aborted) ** Jul 15 14:20:43 servera conmon[34853]: in thread 7f5d605e1280 thread_name:radosgw Expected results: Container should start. Additional info: ceph-ansible does not take into account that when rgw thread pool size is increased the pids-max of container should be adapted. The solution would be to modify ceph-rgw/templates/ceph-radosgw.service.j2 to add to the command line parameter --pids-limit={{ radosgw_thread_pool_size + 2048 }} to podman
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat Ceph Storage 4.3 Security and Bug Fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:1716