Description of problem: Since NSS_SDB_USE_CACHE=no is not set before calling curl in the container health checks, the dentry cache on controller nodes grows continually. On controllers with a larger amount of RAM, this can lead to soft lockups once memory pressure forces a reclamation of the extraneous cache entries. See RHBZ 1044666 for background. Version-Release number of selected component (if applicable): 13.0 How reproducible: Run the following systemtap script on a containerized OSP13 controller: probe kernel.function("d_alloc").return { log(reverse_path_walk($return)) } Observe many repeated calls referencing lib/docker/overlay2/<some_id>/diff/etc/pki/nssdb/.<some_number>_dOeSnotExist_.db Steps to Reproduce: 1. Deploy an OSP13 environment with containerized control plane 2. Run the systemtap script above for 1 minute 3. Observe many calls for nonexistent NSS DB files 4. Run the following to add NSS_SDB_USE_CACHE=no to the healthcheck function: docker ps -q | xargs -I {} docker exec -u root {} sed -i '/^healthcheck_curl/a \ \ export NSS_SDB_USE_CACHE=no' /usr/share/openstack-tripleo-common/healthcheck/common.sh 5. Re-run the systemtap script 6. Observe a large reduction (90%+) in dentry cache calls over 1 minute Actual results: [root@ctl01 ~]# stap -o test1.out -T 60 dentry.stap [root@ctl01 ~]# wc -l test1.out 186526 test1.out [root@ctl01 ~]# grep dOeSnotExist test1.out | wc -l 158649 Expected results: [root@ctl01 ~]# stap -o test2.out -T 60 dentry.stap [root@ctl01 ~]# wc -l test2.out 17544 test2.out [root@ctl01 ~]# grep dOeSnotExist test2.out | wc -l 0
The customer environment where this behavior was noted originally had approximately 1 GB of dentry cache growth every 10 minutes. After applying the in-place change to /usr/share/openstack-tripleo-common/healthcheck/common.sh, that dropped to less than 10MB per 10 minutes.
openstack-tripleo-common-8.6.8-3.el7ost.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0939
*** Bug 1737305 has been marked as a duplicate of this bug. ***