Previously, `NSS_SDB_USE_CACHE=no` was not set before calling curl in the container health checks, and the `dentry` cache on controller nodes grew continuously. Controller nodes with a large amount of RAM experienced soft lockups when memory pressure forces a reclamation of the extraneous cache entries.
With this update, the `NSS_SDB_USE_CACHE=no` environment variable is set before executing a curl statement in the container health check. As a result, the `dentry` cache on controller nodes no longer grows continuously and does not cause soft lockups.
Description of problem:
Since NSS_SDB_USE_CACHE=no is not set before calling curl in the container health checks, the dentry cache on controller nodes grows continually. On controllers with a larger amount of RAM, this can lead to soft lockups once memory pressure forces a reclamation of the extraneous cache entries.
See RHBZ 1044666 for background.
Version-Release number of selected component (if applicable):
13.0
How reproducible:
Run the following systemtap script on a containerized OSP13 controller:
probe kernel.function("d_alloc").return { log(reverse_path_walk($return)) }
Observe many repeated calls referencing lib/docker/overlay2/<some_id>/diff/etc/pki/nssdb/.<some_number>_dOeSnotExist_.db
Steps to Reproduce:
1. Deploy an OSP13 environment with containerized control plane
2. Run the systemtap script above for 1 minute
3. Observe many calls for nonexistent NSS DB files
4. Run the following to add NSS_SDB_USE_CACHE=no to the healthcheck function:
docker ps -q | xargs -I {} docker exec -u root {} sed -i '/^healthcheck_curl/a \ \ export NSS_SDB_USE_CACHE=no' /usr/share/openstack-tripleo-common/healthcheck/common.sh
5. Re-run the systemtap script
6. Observe a large reduction (90%+) in dentry cache calls over 1 minute
Actual results:
[root@ctl01 ~]# stap -o test1.out -T 60 dentry.stap
[root@ctl01 ~]# wc -l test1.out
186526 test1.out
[root@ctl01 ~]# grep dOeSnotExist test1.out | wc -l
158649
Expected results:
[root@ctl01 ~]# stap -o test2.out -T 60 dentry.stap
[root@ctl01 ~]# wc -l test2.out
17544 test2.out
[root@ctl01 ~]# grep dOeSnotExist test2.out | wc -l
0
The customer environment where this behavior was noted originally had approximately 1 GB of dentry cache growth every 10 minutes. After applying the in-place change to /usr/share/openstack-tripleo-common/healthcheck/common.sh, that dropped to less than 10MB per 10 minutes.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2019:0939
Description of problem: Since NSS_SDB_USE_CACHE=no is not set before calling curl in the container health checks, the dentry cache on controller nodes grows continually. On controllers with a larger amount of RAM, this can lead to soft lockups once memory pressure forces a reclamation of the extraneous cache entries. See RHBZ 1044666 for background. Version-Release number of selected component (if applicable): 13.0 How reproducible: Run the following systemtap script on a containerized OSP13 controller: probe kernel.function("d_alloc").return { log(reverse_path_walk($return)) } Observe many repeated calls referencing lib/docker/overlay2/<some_id>/diff/etc/pki/nssdb/.<some_number>_dOeSnotExist_.db Steps to Reproduce: 1. Deploy an OSP13 environment with containerized control plane 2. Run the systemtap script above for 1 minute 3. Observe many calls for nonexistent NSS DB files 4. Run the following to add NSS_SDB_USE_CACHE=no to the healthcheck function: docker ps -q | xargs -I {} docker exec -u root {} sed -i '/^healthcheck_curl/a \ \ export NSS_SDB_USE_CACHE=no' /usr/share/openstack-tripleo-common/healthcheck/common.sh 5. Re-run the systemtap script 6. Observe a large reduction (90%+) in dentry cache calls over 1 minute Actual results: [root@ctl01 ~]# stap -o test1.out -T 60 dentry.stap [root@ctl01 ~]# wc -l test1.out 186526 test1.out [root@ctl01 ~]# grep dOeSnotExist test1.out | wc -l 158649 Expected results: [root@ctl01 ~]# stap -o test2.out -T 60 dentry.stap [root@ctl01 ~]# wc -l test2.out 17544 test2.out [root@ctl01 ~]# grep dOeSnotExist test2.out | wc -l 0