Bug 1720479

Summary: Curl command in Elasticsearch readiness probe bloats dentry cache
Product: OpenShift Container Platform Reporter: ggore
Component: LoggingAssignee: Jeff Cantrill <jcantril>
Status: CLOSED ERRATA QA Contact: Anping Li <anli>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.11.0CC: aos-bugs, rmeggins
Target Milestone: ---   
Target Release: 3.11.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openshift/ose-logging-elasticsearch5:v3.11.124 Doc Type: Bug Fix
Doc Text:
Cause: The readiness probe for Elasticsearch uses curl with https. curl uses NSS for crypto. NSS uses too much file system cache. Consequence: The file system cache dentry grows too large too fast. Elasticsearch and the node may become unresponsive. Fix: Set the flag NSS_SDB_USE_CACHE=no in the readiness probe to workaround the NSS behavior. Result: No undue growth in the dentry cache for Elasticsearch is seen.
Story Points: ---
Clone Of:
: 1720656 (view as bug list) Environment:
Last Closed: 2019-07-23 19:56:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1720656    

Description ggore 2019-06-14 06:39:22 UTC
Description of problem:
Curl command in Elasticsearch readiness probe bloats dentry cache.

If the elasticsearch is running the slab Memory is growing up continuously. 
   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                             
265843 1000120+  20   0  148.5g   9.1g 341508 S   9.5  2.4 178:29.23 java   

Top memory consuming slab objects:
~~~
slabname         pagesperslab  num_slabs  pages_used  MiB
dentry           2             35433675   70867350    276825.59
kmalloc-32       1             4647140    4647140     18152.89
buffer_head      2             77110      154220      602.42
radix_tree_node  8             14021      112168      438.16
nfs_inode_cache  8             8726       69808       272.69
kmalloc-2048     8             6489       51912       202.78
kmalloc-512      8             4366       34928       136.44
kmalloc-1024     8             3621       28968       113.16
xfs_inode        8             2574       20592       80.44
kmalloc-256      4             5027       20108       78.55
~~~

Using systemTap we found that curl command in readiness probe causing dentry cache to grow:
~~~~~
$ awk '/PROCESS/,/STACK/' dentry_alloc_2.stp_output.log3 | egrep -v '(PROCESS|STACKS)' | tr -s "(" " " | awk '{m[$2] += $1} END {for (i in m) {printf "%5s %s \n", m[i], i} }' | sort -nrk 1 | head
 3250 curl 
 1889 hyperkube 
  357 docker-containe 
  233 docker-runc-cur 
  227 vmtoolsd 
  198 snmpd 
  192 runc:[2:INIT] 
  153 iptables 
  144 node_exporter 
  104 java 



$ grep curl dentry_alloc_2.stp_output.log3 
487	 curl(95499) readiness.sh(95494) docker-containe(95476) docker-containe(12517) dockerd-current(12509)
487	 curl(95499) readiness.sh(95494) docker-containe(95476) docker-containe(12517) dockerd-current(12509)
502	 curl(95630) readiness.sh(95625) docker-containe(95592) docker-containe(12517) dockerd-current(12509)
392	 curl(95782) readiness.sh(95776) docker-containe(95755) docker-containe(12517) dockerd-current(12509)
393	 curl(95705) readiness.sh(95699) docker-containe(95680) docker-containe(12517) dockerd-current(12509)
487	 curl(95499) readiness.sh(95494) docker-containe(95476) docker-containe(12517) dockerd-current(12509)
502	 curl(95630) readiness.sh(95625) docker-containe(95592) docker-containe(12517) dockerd-current(12509)

$ awk '$2 ~ "hyperkube"' dentry_alloc_2.stp_output.log3 
221	 hyperkube(26323)
472	 hyperkube(26323)
1196	 hyperkube(26323)


$ awk '/FILES/,/END/' dentry_alloc_2.stp_output.log3  | egrep -v '(FILES|END)' | wc -l
2219

$ awk '/FILES/,/END/' dentry_alloc_2.stp_output.log3  | egrep -v '(FILES|END)' | grep -c dOeSnotExist_.db
1446


$ awk '/FILES/,/END/' dentry_alloc_2.stp_output.log3  | egrep -v '(FILES|END)' | grep dOeSnotExist_.db | head -30
etc/pki/nssdb/.120403000_dOeSnotExist_.db
lib/docker/overlay2/b24c66bc3669affef8d584b657fd44b1443b7832d956d24fc2a5efd4a261b280/diff/opt/app-root/src/.pki/nssdb/.120403014_dOeSnotExist_.db
var/tmp/.120403087_dOeSnotExist_.db
opt/app-root/src/.pki/nssdb/.120402988_dOeSnotExist_.db
etc/pki/nssdb/.120403021_dOeSnotExist_.db
lib/docker/overlay2/7dd8209c664277c4419a530714d40afffc0e8923a7a84a9dd2c1a0a736ee4a3b/diff/var/tmp/.120403045_dOeSnotExist_.db
etc/pki/nssdb/.120403006_dOeSnotExist_.db
lib/docker/overlay2/7dd8209c664277c4419a530714d40afffc0e8923a7a84a9dd2c1a0a736ee4a3b/diff/var/tmp/.120403103_dOeSnotExist_.db
lib/docker/overlay2/7821d43608bb733d738a102d380145a38a3c1b097b61bf861e71810bab729bb3/diff/var/tmp/.120403061_dOeSnotExist_.db
lib/docker/overlay2/7821d43608bb733d738a102d380145a38a3c1b097b61bf861e71810bab729bb3/diff/var/tmp/.120403105_dOeSnotExist_.db
lib/docker/overlay2/7dd8209c664277c4419a530714d40afffc0e8923a7a84a9dd2c1a0a736ee4a3b/diff/var/tmp/.120403060_dOeSnotExist_.db
etc/pki/nssdb/.120403027_dOeSnotExist_.db
lib/docker/overlay2/7dd8209c664277c4419a530714d40afffc0e8923a7a84a9dd2c1a0a736ee4a3b/diff/var/tmp/.120403058_dOeSnotExist_.db
lib/docker/overlay2/b24c66bc3669affef8d584b657fd44b1443b7832d956d24fc2a5efd4a261b280/diff/opt/app-root/src/.pki/nssdb/.120403006_dOeSnotExist_.db
var/tmp/.120403036_dOeSnotExist_.db
opt/app-root/src/.pki/nssdb/.120402985_dOeSnotExist_.db
lib/docker/overlay2/b24c66bc3669affef8d584b657fd44b1443b7832d956d24fc2a5efd4a261b280/diff/opt/app-root/src/.pki/nssdb/.120403012_dOeSnotExist_.db
lib/docker/overlay2/b24c66bc3669affef8d584b657fd44b1443b7832d956d24fc2a5efd4a261b280/diff/opt/app-root/src/.pki/nssdb/.120403003_dOeSnotExist_.db
lib/docker/overlay2/b24c66bc3669affef8d584b657fd44b1443b7832d956d24fc2a5efd4a261b280/diff/opt/app-root/src/.pki/nssdb/.120402979_dOeSnotExist_.db
opt/app-root/src/.pki/nssdb/.120402968_dOeSnotExist_.db
lib/docker/overlay2/7dd8209c664277c4419a530714d40afffc0e8923a7a84a9dd2c1a0a736ee4a3b/diff/etc/pki/nssdb/.120403037_dOeSnotExist_.db
lib/docker/overlay2/b24c66bc3669affef8d584b657fd44b1443b7832d956d24fc2a5efd4a261b280/diff/opt/app-root/src/.pki/nssdb/.120402987_dOeSnotExist_.db
lib/docker/overlay2/b24c66bc3669affef8d584b657fd44b1443b7832d956d24fc2a5efd4a261b280/diff/opt/app-root/src/.pki/nssdb/.120403010_dOeSnotExist_.db
var/tmp/.120403084_dOeSnotExist_.db
lib/docker/overlay2/7dd8209c664277c4419a530714d40afffc0e8923a7a84a9dd2c1a0a736ee4a3b/diff/etc/pki/nssdb/.120403026_dOeSnotExist_.db
lib/docker/overlay2/7dd8209c664277c4419a530714d40afffc0e8923a7a84a9dd2c1a0a736ee4a3b/diff/etc/pki/nssdb/.120403002_dOeSnotExist_.db
lib/docker/overlay2/7dd8209c664277c4419a530714d40afffc0e8923a7a84a9dd2c1a0a736ee4a3b/diff/etc/pki/nssdb/.120403000_dOeSnotExist_.db
lib/docker/overlay2/b24c66bc3669affef8d584b657fd44b1443b7832d956d24fc2a5efd4a261b280/diff/opt/app-root/src/.pki/nssdb/.120402984_dOeSnotExist_.db
lib/docker/overlay2/7dd8209c664277c4419a530714d40afffc0e8923a7a84a9dd2c1a0a736ee4a3b/diff/var/tmp/.120403066_dOeSnotExist_.db
lib/docker/overlay2/7821d43608bb733d738a102d380145a38a3c1b097b61bf861e71810bab729bb3/diff/var/tmp/.120403054_dOeSnotExist_.db



$ awk '/STACKS/,/FILES/' dentry_alloc_2.stp_output.log3  | egrep -v '(FILES|STACKS)'  | grep --no-group-separator -B2 Count  | grep Count -A1 --no-group-separator | paste - - | awk '{m[$5] += $2} END {printf "COUNT SYM \n"; for (i in m) {printf "%d %s \n", m[i], i} }' | column -t

COUNT  SYM
3230   SyS_access+0x18/0x20
30     sys_ioctl+0xa1/0xc0
117    SyS_getdents64+0x9c/0x120
1314   sys_newfstatat+0xe/0x10
65     sys_mknodat+0xe2/0x230
11     sys_newstat+0xe/0x10
13     sys_renameat+0x11/0x20
655    sys_open+0x1e/0x20
239    SyS_newlstat+0xe/0x10
62     SyS_statfs+0xe/0x10
73     sys_readlinkat+0x5b/0x150
13     SyS_mkdirat+0x48/0x100
35     sys_connect+0xe/0x10
14     sys_rename+0x1e/0x20
12     sys_unlink+0x16/0x20
1243   sys_openat+0x14/0x20
52     sys_execve+0x29/0x30
~~~~~

Comment 8 Anping Li 2019-07-11 10:16:56 UTC
The fix is in ose-logging-elasticsearch5:v3.11.128

Comment 10 errata-xmlrpc 2019-07-23 19:56:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:1753