Hide Forgot
Description of problem: OSP 16.1 latest Running into the following issue with launching instances: 2020-08-19 21:14:42.722+0000: 34000: error : virFork:274 : cannot fork child process: Resource temporarily unavailable 2020-08-19 21:14:42.724+0000: 34000: error : virFork:274 : cannot fork child process: Resource temporarily unavailable This is with 184 instances: # sudo podman exec -ti nova_libvirt virsh list --all |wc -l 184 Seems to be hitting PID limit with the nova_libvirt container. Perhaps: # sudo podman inspect nova_libvirt |grep PidsLimit "PidsLimit": 4096, How is this config managed? Version-Release number of selected component (if applicable): 16.1 current How reproducible: Unknown Steps to Reproduce: 1. Launch a significant # of instances. 2. 3. Additional info: I'll provide additional libvirtd and nova logs to show the issue.
Note the podman change log: https://github.com/containers/podman/blob/v1.6/changelog.txt - Changelog for v1.6.2-rc1 (2019-10-16) [...] * Setup a reasonable default for pids-limit 4096 From https://github.com/containers/podman/blob/v1.6/RELEASE_NOTES.md 1.6.2 Misc The default PID limit for containers is now set to 4096. It can be adjusted back to the old default (unlimited) by passing --pids-limit 0 to podman create and podman run It seems perhaps this change is not considered for OSP 16.1 containers.
This going to involve patching openstack/paunch & tripleo-ansible (tripleo-container-manage role) to support these options and then use these options in THT.
Ussuri patch ready for paunch. Will for on tripleo-ansible in // so that we can add the needed option to nova_libvirt in master already.
Master patch against tripleo-ansible (needed for master/osp-17) ready for review. Will now work on t-h-t content, adding a Depends-On the tripleo-ansible patch for master. Backports will need to point to the paunch patch for the Depends-On.
To summarize IRC discussion with libvirt developers (thanks, DanPB): - Removing the PID limit altogether can lead to a fork bomb; so we shouldn't do that. - DanPB elaborates: libvirtd configures 'TasksMax=32768' — which means, 32768 should allow one to launch about 1200 guests (given that you were able to launch 150K guests with 4096 limit). However, this is more complicated: in Ceph configurations, if you have 100 storage hosts, then Ceph will create 1 thread per host, so you'll have each QEMU consume 100 threads. So 1200 guests in in a non-ceph configuration may turn into 200 guests in a Ceph-based setup. In the end, going with the tunable parameter (https://review.opendev.org/#/c/747826/) is indeed a better option.
Updated my t-h-t patch and added it here.
(In reply to Kashyap Chamarthy from comment #8) > To summarize IRC discussion with libvirt developers (thanks, DanPB): > > - Removing the PID limit altogether can lead to a fork bomb; so we shouldn't > do that. > > - DanPB elaborates: libvirtd configures 'TasksMax=32768' — which means, > 32768 should allow one to launch about 1200 guests (given that you were able > to launch 150K guests with 4096 limit). However, this is more complicated: > in Ceph configurations, if you have 100 storage hosts, then Ceph will create > 1 thread per host, so you'll have each QEMU consume 100 threads. So 1200 > guests in in a non-ceph configuration may turn into 200 guests in a > Ceph-based setup. > > In the end, going with the tunable parameter > (https://review.opendev.org/#/c/747826/) is indeed a better option. We actually went by setting the higher PID, which is also acceptable: https://review.opendev.org/#/c/747835/ (In general, if we _can_ avoid yet-more tunables, it's good.)
Hot-fix consists only in two packages to install on the Undercloud/Director node: - new tripleo-heat-templates - new python3-paunch
On a standard deploy saw default PidsLimit sudo podman inspect nova_libvirt |grep PidsLimit "PidsLimit": 65536, Also, changed: /usr/share/openstack-tripleo-heat-templates/deployment/nova/nova-libvirt-container-puppet.yaml to contain a ContainerNovaLibvirtPidsLimit of 60000 after deploying the non-default value was in effect: sudo podman inspect nova_libvirt |grep PidsLimit "PidsLimit": 60000,
Hello all, Seeing the amount of things being pulled in with the hotfix (creating issues), and seeing the approaching 16.1.2 release (due this week if everything goes according to the plan), it would be better to actually wait for 16.1.2 and do the tests on it, since we'll get everything pulled in, with the matching versions for both packages and containers. Would that be OK? Cheers, C.
(In reply to Cédric Jeanneret from comment #40) > Hello all, > > Seeing the amount of things being pulled in with the hotfix (creating > issues), and seeing the approaching 16.1.2 release (due this week if > everything goes according to the plan), it would be better to actually wait > for 16.1.2 and do the tests on it, since we'll get everything pulled in, > with the matching versions for both packages and containers. > > Would that be OK? > > Cheers, > > C. Hi, I believe that is the best option. Thanks for the update. Regards, Matt
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.1 bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:4284