Bug 1381452
Summary: | OOM kill of nfs-ganesha on one node while fs-sanity test suite is executed. | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Shashank Raj <sraj> | |
Component: | distribute | Assignee: | Jiffin <jthottan> | |
Status: | CLOSED ERRATA | QA Contact: | Arthy Loganathan <aloganat> | |
Severity: | urgent | Docs Contact: | ||
Priority: | unspecified | |||
Version: | rhgs-3.2 | CC: | aloganat, amukherj, jthottan, kkeithle, mzywusko, ndevos, rcyriac, rhs-bugs, sbhaloth, skoduri, storage-qa-internal, tdesala | |
Target Milestone: | --- | Keywords: | Triaged | |
Target Release: | RHGS 3.2.0 | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-3.8.4-7 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1397052 (view as bug list) | Environment: | ||
Last Closed: | 2017-03-23 06:07:42 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1351528, 1397052, 1401021, 1401023, 1401029, 1401032 |
Description
Shashank Raj
2016-10-04 07:01:04 UTC
sosreports and logs can be accessed at http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1381452 Shashank, Please turn off features.cache-invalidation for that volume and re-run the tests. If the oom_score of ganesha process is still high after the tests complete, please tune the no. of worker threads of nfs-ganesha to 16 using below config option and re-try the tests. NFS_Core_Param { Nb_Worker = 16; } Tried running posix_compliance again with both features.cache-invalidation on/off and i am not able to reproduce this issue again. So it seems some other test in fs-sanity is the culprit for this issue. Will keep trying it and update bug accordingly. For now changing the bug title as appropriate. (In reply to Shashank Raj from comment #4) > Tried running posix_compliance again with both features.cache-invalidation > on/off and i am not able to reproduce this issue again. > > So it seems some other test in fs-sanity is the culprit for this issue. > > Will keep trying it and update bug accordingly. For now changing the bug > title as appropriate. Surabhi, Could you please check the same and update the bug with the details of the test which may be causing this issue. For 6X2 volume, while executing posix_compliance tests, ganesha gets oom_killed on the mounted node always when the oom_score reaches to ~870. sosreports are at, http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1381452/ I have tried with volumes with lesser bricks like plain distribute with 2 bricks and 1*2 volume, and issue is not seen. As Jiffin suggested, I have executed the following test with 6*2 volume prove -vf /opt/qa/tools/posix-testsuite/tests/rename/00.t and oom_score increases drastically when this test is running. dmesg: [248560.640500] Call Trace: [248560.640511] [<ffffffff81685eac>] dump_stack+0x19/0x1b [248560.640516] [<ffffffff81680e57>] dump_header+0x8e/0x225 [248560.640523] [<ffffffff812ae71b>] ? cred_has_capability+0x6b/0x120 [248560.640530] [<ffffffff8113cb03>] ? delayacct_end+0x33/0xb0 [248560.640537] [<ffffffff8118460e>] oom_kill_process+0x24e/0x3c0 [248560.640542] [<ffffffff810936ce>] ? has_capability_noaudit+0x1e/0x30 [248560.640545] [<ffffffff81184e46>] out_of_memory+0x4b6/0x4f0 [248560.640548] [<ffffffff81681960>] __alloc_pages_slowpath+0x5d7/0x725 [248560.640552] [<ffffffff8118af55>] __alloc_pages_nodemask+0x405/0x420 [248560.640556] [<ffffffff811cf10a>] alloc_pages_current+0xaa/0x170 [248560.640563] [<ffffffff8106a587>] pte_alloc_one+0x17/0x40 [248560.640568] [<ffffffff811adb23>] __pte_alloc+0x23/0x170 [248560.640571] [<ffffffff811b1535>] handle_mm_fault+0xe25/0xfe0 [248560.640574] [<ffffffff811b76d5>] ? do_mmap_pgoff+0x305/0x3c0 [248560.640579] [<ffffffff81691994>] __do_page_fault+0x154/0x450 [248560.640581] [<ffffffff81691cc5>] do_page_fault+0x35/0x90 [248560.640584] [<ffffffff8168df88>] page_fault+0x28/0x30 [248560.640586] Mem-Info: [248560.640591] active_anon:1620957 inactive_anon:292997 isolated_anon:0 active_file:0 inactive_file:974 isolated_file:0 unevictable:6562 dirty:0 writeback:0 unstable:0 slab_reclaimable:7116 slab_unreclaimable:13556 mapped:5683 shmem:8641 pagetables:7532 bounce:0 free:25150 free_pcp:474 free_cma:0 [248560.640595] Node 0 DMA free:15852kB min:132kB low:164kB high:196kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15936kB managed:15852kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [248560.640602] lowmem_reserve[]: 0 3327 7805 7805 [248560.640605] Node 0 DMA32 free:46200kB min:28752kB low:35940kB high:43128kB active_anon:2727832kB inactive_anon:545796kB active_file:0kB inactive_file:2536kB unevictable:16040kB isolated(anon):0kB isolated(file):0kB present:3653620kB managed:3408880kB mlocked:16040kB dirty:0kB writeback:0kB mapped:16596kB shmem:12260kB slab_reclaimable:9708kB slab_unreclaimable:24740kB kernel_stack:4944kB pagetables:11448kB unstable:0kB bounce:0kB free_pcp:792kB local_pcp:120kB free_cma:0kB writeback_tmp:0kB pages_scanned:285 all_unreclaimable? yes [248560.640611] lowmem_reserve[]: 0 0 4478 4478 [248560.640613] Node 0 Normal free:38548kB min:38696kB low:48368kB high:58044kB active_anon:3755996kB inactive_anon:626192kB active_file:0kB inactive_file:1360kB unevictable:10208kB isolated(anon):0kB isolated(file):0kB present:4718592kB managed:4585756kB mlocked:10208kB dirty:0kB writeback:0kB mapped:6136kB shmem:22304kB slab_reclaimable:18756kB slab_unreclaimable:29484kB kernel_stack:7872kB pagetables:18680kB unstable:0kB bounce:0kB free_pcp:1104kB local_pcp:160kB free_cma:0kB writeback_tmp:0kB pages_scanned:1049 all_unreclaimable? yes [248560.640618] lowmem_reserve[]: 0 0 0 0 [248560.640620] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15852kB [248560.640630] Node 0 DMA32: 1564*4kB (UE) 956*8kB (UE) 723*16kB (UEM) 388*32kB (UEM) 120*64kB (UEM) 5*128kB (EM) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 46208kB [248560.640639] Node 0 Normal: 1174*4kB (UEM) 1024*8kB (UEM) 691*16kB (UEM) 305*32kB (UEM) 53*64kB (UEM) 9*128kB (M) 1*256kB (M) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 38504kB [248560.640649] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [248560.640651] 18106 total pagecache pages [248560.640653] 6146 pages in swap cache [248560.640654] Swap cache stats: add 1107386, delete 1101240, find 294696/305552 [248560.640655] Free swap = 0kB [248560.640656] Total swap = 2097148kB [248560.640657] 2097037 pages RAM [248560.640658] 0 pages HighMem/MovableOnly [248560.640659] 94415 pages reserved [248560.640660] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [248560.640666] [ 685] 0 685 17664 2179 39 49 0 systemd-journal [248560.640668] [ 716] 0 716 220817 676 46 1476 0 lvmetad [248560.640671] [ 722] 0 722 11679 635 22 546 -1000 systemd-udevd [248560.640675] [ 881] 0 881 179084 6113 49 0 -1000 dmeventd [248560.640685] [ 1273] 0 1273 13854 234 26 89 -1000 auditd [248560.640688] [ 1292] 0 1292 4826 217 14 37 0 irqbalance [248560.640690] [ 1293] 81 1293 8197 262 17 71 -900 dbus-daemon [248560.640692] [ 1296] 0 1296 6156 261 15 138 0 systemd-logind [248560.640695] [ 1299] 998 1299 132067 351 54 1894 0 polkitd [248560.640697] [ 1310] 997 1310 28962 310 26 42 0 chronyd [248560.640699] [ 1311] 32 1311 16237 175 34 104 0 rpcbind [248560.640701] [ 1322] 0 1322 50303 142 40 114 0 gssproxy [248560.640704] [ 1334] 0 1334 82865 469 84 5904 0 firewalld [248560.640706] [ 1691] 0 1691 28206 115 52 3081 0 dhclient [248560.640708] [ 1785] 0 1785 28335 98 12 37 0 rhsmcertd [248560.640710] [ 1787] 0 1787 138291 385 87 2567 0 tuned [248560.640712] [ 1798] 0 1798 20617 91 42 190 -1000 sshd [248560.640715] [ 1916] 0 1916 22244 222 42 238 0 master [248560.640717] [ 1918] 89 1918 22287 245 44 236 0 qmgr [248560.640719] [ 2334] 0 2334 31556 209 17 133 0 crond [248560.640721] [ 2385] 0 2385 26978 101 8 37 0 rhnsd [248560.640723] [ 2388] 0 2388 27509 164 10 33 0 agetty [248560.640726] [17375] 29 17375 10605 230 24 177 0 rpc.statd [248560.640728] [16763] 0 16763 72838 1270 59 105 0 rsyslogd [248560.640730] [16951] 0 16951 151619 470 86 12040 0 glusterd [248560.640733] [27747] 0 27747 428530 2595 125 10071 0 glusterfsd [248560.640735] [27962] 0 27962 226969 5025 89 6433 0 glusterfs [248560.640737] [29536] 0 29536 49589 2611 63 2017 0 corosync [248560.640739] [29552] 0 29552 33157 377 64 1026 0 pacemakerd [248560.640741] [29554] 189 29554 35595 2224 72 1416 0 cib [248560.640744] [29555] 0 29555 34361 885 69 479 0 stonithd [248560.640746] [29556] 0 29556 26273 371 52 228 0 lrmd [248560.640748] [29557] 189 29557 31731 940 64 345 0 attrd [248560.640750] [29558] 189 29558 38963 2038 71 241 0 pengine [248560.640752] [29559] 189 29559 47014 2147 79 880 0 crmd [248560.640754] [29577] 0 29577 244360 8064 98 2064 0 pcsd [248560.640757] [ 6278] 0 6278 3262386 1857506 4856 406101 0 ganesha.nfsd [248560.640759] [22343] 0 22343 35726 306 72 290 0 sshd [248560.640761] [22358] 0 22358 28879 278 14 48 0 bash [248560.640764] [27763] 0 27763 330732 1777 113 8853 0 glusterfsd [248560.640767] [27785] 0 27785 330733 2281 115 9062 0 glusterfsd [248560.640769] [27804] 0 27804 314090 3314 110 8708 0 glusterfsd [248560.640771] [27836] 0 27836 255249 6445 106 14561 0 glusterfs [248560.640773] [22453] 89 22453 22270 479 42 0 0 pickup [248560.640776] [ 5088] 0 5088 35726 635 71 0 0 sshd [248560.640778] [ 5111] 0 5111 28879 319 15 0 0 bash [248560.640780] [11672] 0 11672 26984 136 10 0 0 tail [248560.640782] [14710] 0 14710 35726 581 72 0 0 sshd [248560.640784] [14745] 0 14745 28879 311 14 0 0 bash [248560.640787] [15813] 0 15813 28910 333 14 0 0 ganesha_grace [248560.640789] [15819] 0 15819 28910 180 10 0 0 ganesha_grace [248560.640791] [15820] 0 15820 30197 552 62 0 0 crm_attribute [248560.640793] [15821] 0 15821 28877 274 14 0 0 portblock [248560.640795] [15824] 0 15824 28811 185 14 0 0 ganesha_mon [248560.640797] [15825] 0 15825 28877 125 10 0 0 portblock [248560.640800] [15826] 0 15826 28811 98 11 0 0 ganesha_mon [248560.640801] [15827] 0 15827 26974 127 10 0 0 basename [248560.640803] Out of memory: Kill process 6278 (ganesha.nfsd) score 870 or sacrifice child [248560.640886] Killed process 6278 (ganesha.nfsd) total-vm:13049544kB, anon-rss:7430024kB, file-rss:0kB, shmem-rss:0kB Basically following part of test got hung. 1.) create a file 0644 permission 2.) rename the file using non root user, it will fail 3.) delete the file 4.) create a directory with same name 5.) rename directory with non root user -- it hungs RCA : When a rename fails with non root user, the linkto file created by dht is not removed properly. It will exists as stale entry. On the next rename call, a lookup will be performed and lookup tries to remove stale entry but it fails. And client keep on trying to remove stale entry but always results in EPERM. The linkto file(mknod call) always created using with root user(even for non root user). But clean up for this file using original user which results in the failure. Patch posted upstream fro review http://review.gluster.org/#/c/15894/1 As per the triaging we all have the agreement that this BZ has to be fixed in rhgs-3.2.0. Providing qa_ack The above issue happens when rename/00.t test executed on nfs-ganesha clients : Steps executed in that script * create a file using root * rename the file using a non root user, it fails with EACESS * delete the file * create directory directory using root * rename the directory using non root user, test hungs and slowly led to OOM kill of ganesha RCA put forwarded by Du for OOM kill of ganesha Note that when we hit this bug, we've a scenario of a dentry being present as: * a linkto file on one subvol * a directory on rest of subvols When a lookup happens on the dentry in such a scenario, the control flow goes into an infinite loop of: dht_lookup_everywhere dht_lookup_everywhere_cbk dht_lookup_unlink_cbk dht_lookup_everywhere_done dht_lookup_directory (as local->dir_count > 0) dht_lookup_dir_cbk (sets to local->need_selfheal = 1 as the entry is a linkto file on one of the subvol) dht_lookup_everywhere (as need_selfheal = 1). This infinite loop can cause increased consumption of memory due to: 1) dht_lookup_directory assigns a new layout to local->layout unconditionally 2) Most of the functions in this loop do a stack_wind of various fops. This results in growing of call stack (note that call-stack is destroyed only after lookup response is received by fuse - which never happens in this case) OOM kill of nfs-ganesha is not seen in the latest build when posix compliance tests are running. nfs-ganesha-2.4.1-2.el7rhgs.x86_64 nfs-ganesha-gluster-2.4.1-2.el7rhgs.x86_64 glusterfs-ganesha-3.8.4-8.el7rhgs.x86_64 However, following test in posix compliance test suite are failing, for which a different bug(1404367) has been raised. Test Summary Report ------------------- /opt/qa/tools/posix-testsuite/tests/chown/00.t (Wstat: 0 Tests: 171 Failed: 1) Failed test: 77 /opt/qa/tools/posix-testsuite/tests/link/00.t (Wstat: 0 Tests: 82 Failed: 1) Failed test: 77 /opt/qa/tools/posix-testsuite/tests/open/07.t (Wstat: 0 Tests: 23 Failed: 3) Failed tests: 5, 7, 9 Files=185, Tests=1962, 132 wallclock secs ( 1.57 usr 0.59 sys + 16.39 cusr 33.22 csys = 51.77 CPU) Result: FAIL end: 15:20:31 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html |