Bug 1240172
Summary: | nfs-ganesha: OOM kill of nfs-ganesha process on all nodes while executing iozone | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Saurabh <saujain> | ||||||||||
Component: | nfs-ganesha | Assignee: | Soumya Koduri <skoduri> | ||||||||||
Status: | CLOSED ERRATA | QA Contact: | Shashank Raj <sraj> | ||||||||||
Severity: | urgent | Docs Contact: | |||||||||||
Priority: | urgent | ||||||||||||
Version: | rhgs-3.1 | CC: | annair, asrivast, dblack, divya, hamiller, kkeithle, mpillai, mzywusko, ndevos, nlevinki, olim, rabhat, rcyriac, rhinduja, sankarshan, sashinde, skoduri, smohan | ||||||||||
Target Milestone: | --- | Keywords: | ZStream | ||||||||||
Target Release: | RHGS 3.1.3 | ||||||||||||
Hardware: | x86_64 | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | nfs-ganesha-2.3.1-5 | Doc Type: | Bug Fix | ||||||||||
Doc Text: |
Previously, NFS-ganesha tried to reap the inodes when any share is being unexported or if the number of entries in the cache reaches the limit (default: 100000). But gfapi did not flush those inodes immediately and it resulted in the nfs-ganesha process consuming lots of memory. As a consequence, NFS-ganesha process would get OOM killed due to excessive memory usage. With this fix, gfapi immediately flushes the inodes in which NFS-ganesha releases and also to limit the number of entries nfs-ganesha can cache by adding configuration option to '/etc/ganesha/ganesha.conf' file. This will control the memory usage of NFS-ganesha process to some extent.
|
Story Points: | --- | ||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2016-06-23 05:32:09 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | 1295107, 1311441 | ||||||||||||
Bug Blocks: | 1282669, 1299183, 1337867 | ||||||||||||
Attachments: |
|
Description
Saurabh
2015-07-06 06:33:25 UTC
Created attachment 1048675 [details]
sosreport of nfs11
Created attachment 1048679 [details]
nfs11 ganesha.log
Created attachment 1048680 [details]
dmesg log from nfs11
Created attachment 1048681 [details]
nfs11 ganesha-gfapi.log
As per the discussion with Saurabh --> * iozone tests were ran successfully in the earlier builds/setup. * The difference we see with this setup is that quota feature is ON * In addition, it seems that this was not hit just after plain mount. There was a failover triggered by killing the nfs-ganesha service manually after around 15-min the time the test has started. Post that after some time nfs-ganesha processes got auto-killed due to OOM issues on all other nodes as well. We would like to run the test on following scenarios - * plain nfs-ganesha mount * with quota feature on * with failover test in parellel * with quota feature on and failover test in parallel and compare the results. I tried to reproduce the OOM but this attempt of similar test I could not see the issue. Please include details about the exact iozone command that was run, and how many clients are used. I used "iozone -a" from one client only. We have requested QE to re-test and reproduce the issue. Also there are few gfAPI mem leak issues being worked upon in the upstream as part of bug1295107. If we could reproduce this issue, we can apply those patches and retest it. The fixes merged in upstream which can be taken for this bug are - http://review.gluster.org/13125 http://review.gluster.org/13096 http://review.gluster.org/#/c/13232/ Here are few of my findings/experiments done to determine cache_inode limit. From gdb and code, looks like total size occupied for each inode (including context) in a distributed replicated volume is (gdb) p sizeof(struct cache_entry_t) + sizeof(struct glusterfs_handle) + sizeof(struct glfs_object) + sizeof(afr_inode_ctx_t) + sizeof(dht_inode_ctx_t) + sizeof(ioc_inode_t) + sizeof(struct md_cache) + sizeof(struct ios_stat) + sizeof(uint64_t) + sizeof(wb_inode_t) + sizeof(qr_inode_t) $18 = 2296 i.e, around ~2k bytes. But when I tried to monitor the process memory size when data is being written to a newly created file, I see varied results - [root@dhcp43-58 ~]# showmount -e Export list for dhcp43-58.lab.eng.blr.redhat.com: [root@dhcp43-58 ~]# ps aux | grep ganesha root 19044 0.2 0.4 1360752 8240 ? Ssl 15:03 0:00 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT root 19172 0.0 0.1 117000 2144 pts/0 S+ 15:03 0:00 grep --color=auto ganesha [root@dhcp43-58 ~]# cat /proc/19044/oom_score 2 Exporting a volume 'testvol' [root@dhcp43-58 ~]# showmount -e Export list for dhcp43-58.lab.eng.blr.redhat.com: /testvol (everyone) [root@dhcp43-58 ~]# cat /proc/19044/oom_score 14 [root@dhcp43-58 ~]# ps aux | grep ganesha root 19044 0.3 3.0 1636920 61904 ? Ssl 15:03 0:00 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT root 19501 0.0 0.1 117000 2196 pts/0 S+ 15:04 0:00 grep --color=auto ganesha [root@dhcp43-58 ~]# i.e, the RSS value increased by ~54M and oom_score by 12 Exporting another volume - [root@dhcp43-58 ~]# showmount -e Export list for dhcp43-58.lab.eng.blr.redhat.com: /testvol (everyone) /test (everyone) [root@dhcp43-58 ~]# cat /proc/19044/oom_score 26 [root@dhcp43-58 ~]# ps aux | grep ganesha root 19044 0.2 5.3 1750168 110440 ? Ssl 15:03 0:00 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT root 20347 0.0 0.1 117000 2244 pts/0 S+ 15:06 0:00 grep --color=auto ganesha [root@dhcp43-58 ~]# >>> the RSS value increased by ~49M and oom_score by 12 [root@dhcp43-58 ~]# showmount -e Export list for dhcp43-58.lab.eng.blr.redhat.com: /testvol (everyone) /test (everyone) /new1 (everyone) [root@dhcp43-58 ~]# cat /proc/19044/oom_score 37 [root@dhcp43-58 ~]# [root@dhcp43-58 ~]# ps aux | grep ganesha root 19044 0.3 7.7 1863416 158888 ? Ssl 15:03 0:01 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT root 21308 0.0 0.1 117004 2228 pts/0 S+ 15:08 0:00 grep --color=auto ganesha [root@dhcp43-58 ~]# >>> the RSS value increased by ~49M and oom_score by 11 So, leaving the first volume, on an avg., each volume export resulted in increase in memory by ~49M. [root@skoduri ~]# mount -t nfs 10.70.43.58:/testvol /mnt [root@dhcp43-58 ~]# ps aux | grep ganesha root 19044 0.3 7.7 1863416 158904 ? Ssl 15:03 0:01 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT root 22175 0.0 0.1 117004 2204 pts/0 S+ 15:10 0:00 grep --color=auto ganesha [root@dhcp43-58 ~]# After writing into 100 files and reading the data [root@skoduri mnt]# for i in `seq 1 100`; do echo 3 > /proc/sys/vm/drop_caches;dd if=/dev/zero of=/mnt/file$i bs=100K count=1 conv=sync & done ; [root@skoduri mnt]# cat * [root@dhcp43-58 ~]# ps aux | grep ganesha root 19044 0.5 7.9 1863416 161756 ? Ssl 15:03 0:03 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT root 23486 0.0 0.1 117000 2096 pts/0 S+ 15:13 0:00 grep --color=auto ganesha [root@dhcp43-58 ~]# [root@dhcp43-58 ~]# cat /proc/19044/oom_score 38 [root@dhcp43-58 ~]# So on an avg, each file resulted in increase in memory by (161756-158904)/100 = ~28K But further creations of 100 new files showed different values - [root@dhcp43-58 ~]# ps aux | grep ganesha root 19044 0.5 7.9 1863416 161768 ? Ssl 15:03 0:04 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT root 24589 0.0 0.1 117000 2132 pts/0 S+ 15:16 0:00 grep --color=auto ganesha [root@dhcp43-58 ~]# [root@skoduri mnt]# for i in `seq 101 200`; do echo 3 > /proc/sys/vm/drop_caches;dd if=/dev/zero of=/mnt/file$i bs=100K count=1 conv=sync & done ; [root@skoduri mnt]# cat * [root@dhcp43-58 ~]# ps aux | grep ganesha root 19044 0.6 8.2 1867512 169272 ? Ssl 15:03 0:06 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT root 25505 0.0 0.1 117000 2192 pts/0 R+ 15:18 0:00 grep --color=auto ganesha [root@dhcp43-58 ~]# That means on an avg of ~75K increase per file. I have requested Manoj(mpillai) from perf team to do longevity tests and monitor nfs-ganesha process memory usage. We would like to determine following things a)determine an optimal value to set cache_inode limit b)if there are any other major leaks which we have missed. From https://bugzilla.redhat.com/show_bug.cgi?id=1282669#c4, Nov 13 14:53:26 sv-2000lvp88 kernel: Out of memory: Kill process 28790 (ganesha.nfsd) score 855 or sacrifice child Nov 13 14:53:26 sv-2000lvp88 kernel: Killed process 28790 (ganesha.nfsd) total-vm:19276892kB, anon-rss:15252240kB, file-rss:0kB Nov 15 02:35:27 sv-2000lvp87 kernel: Out of memory: Kill process 50329 (ganesha.nfsd) score 853 or sacrifice child Nov 15 02:35:27 sv-2000lvp87 kernel: Killed process 50329 (ganesha.nfsd) total-vm:19244616kB, anon-rss:15307828kB, file-rss:0kB Looks like on the customer setup, nfs-ganesha had consumed around ~14GB and had oom_score >800. We may need to tweak the oom_score calculated for this process (if possible). Kindly let me know when we will have a final build with all ther required patches. Based on comment 43 and 44, we are still waiting for the new build with relevant fixes. Once we have that, will start the verification process for hotfix. Verified this bug with latest glusterfs and ganesha builds and the original issue "OOM kill of nfs-ganesha process while executing iozone" is not reproducible. Tried with below scenarios: * plain nfs-ganesha mount * On volume with quota enabled. * On volume with quota and failover/failback The customer bug (https://bugzilla.redhat.com/show_bug.cgi?id=1282669) and the ganesha memory consumption related issues will now be tracked as part of (https://bugzilla.redhat.com/show_bug.cgi?id=1337867) bug. For any hotfix related requests, please update bug 1337867 with the details. Based on above explanation, amrking this bug as Verified. Soumya, Could you review and sign-off the edited doc text. Divya,
doc_text looks good to me. Few minor corrections needed for the below statement -
>>> With this fix, gfapi immediately flushes the inodes in which NFS-ganesha releases and also to limit the number of entries nfs-ganesha can cache by adding configuration option to '/etc/ganesha/ganesha.conf' file.
Update: With this fix, gfapi immediately flushes the inodes which NFS-ganesha releases and also to limit the number of entries nfs-ganesha can cache, a new configuration option is added to '/etc/ganesha/ganesha.conf' file.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2016:1247 |