Bug 1221511

Summary: nfs-ganesha: OOM killed for nfsd process while executing the posix test suite
Product: [Community] GlusterFS Reporter: Saurabh <saujain>
Component: ganesha-nfsAssignee: Soumya Koduri <skoduri>
Status: CLOSED NOTABUG QA Contact:
Severity: high Docs Contact:
Priority: urgent    
Version: 3.7.0CC: kkeithle, mzywusko, ndevos, rkavunga, skoduri, smohan, vagarwal
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-02-05 11:26:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
dmesg log from node1
none
sosreport of node1
none
posix-compliance-tests-v3-with-upcall
none
posix-compliance-tests-v3-without-upcall
none
posix-compliance-tests-v4-with-upcall
none
posix-compliance-tests-v4-without-upcall none

Description Saurabh 2015-05-14 08:45:53 UTC
Created attachment 1025317 [details]
dmesg log from node1

Description of problem:
OOM kill for nfs-ganesha process

glusterfsd invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
glusterfsd cpuset=/ mems_allowed=0
Pid: 31429, comm: glusterfsd Not tainted 2.6.32-504.12.2.el6.x86_64 #1
Call Trace:
 [<ffffffff810d40c1>] ? cpuset_print_task_mems_allowed+0x91/0xb0
 [<ffffffff81127300>] ? dump_header+0x90/0x1b0
 [<ffffffff8122eb5c>] ? security_real_capable_noaudit+0x3c/0x70
 [<ffffffff81127782>] ? oom_kill_process+0x82/0x2a0
 [<ffffffff811276c1>] ? select_bad_process+0xe1/0x120
 [<ffffffff81127bc0>] ? out_of_memory+0x220/0x3c0
 [<ffffffff811344df>] ? __alloc_pages_nodemask+0x89f/0x8d0
 [<ffffffff8116c69a>] ? alloc_pages_current+0xaa/0x110
 [<ffffffff811246f7>] ? __page_cache_alloc+0x87/0x90
 [<ffffffff811240de>] ? find_get_page+0x1e/0xa0
 [<ffffffff81125697>] ? filemap_fault+0x1a7/0x500
 [<ffffffff8114eae4>] ? __do_fault+0x54/0x530
 [<ffffffff8114f0b7>] ? handle_pte_fault+0xf7/0xb00
 [<ffffffff8114fcea>] ? handle_mm_fault+0x22a/0x300
 [<ffffffff8104d0d8>] ? __do_page_fault+0x138/0x480
 [<ffffffff81041e98>] ? pvclock_clocksource_read+0x58/0xd0
 [<ffffffff8153003e>] ? do_page_fault+0x3e/0xa0
 [<ffffffff8152d3f5>] ? page_fault+0x25/0x30


[root@nfs2 ~]# gluster volume status
Status of volume: gluster_shared_storage
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.37.148:/rhs/brick1/d1r1-share   49156     0          Y       3549 
Brick 10.70.37.77:/rhs/brick1/d1r2-share    49155     0          Y       3329 
Brick 10.70.37.76:/rhs/brick1/d2r1-share    49155     0          Y       3081 
Brick 10.70.37.69:/rhs/brick1/d2r2-share    49155     0          Y       3346 
Brick 10.70.37.148:/rhs/brick1/d3r1-share   49157     0          Y       3566 
Brick 10.70.37.77:/rhs/brick1/d3r2-share    49156     0          Y       3346 
Brick 10.70.37.76:/rhs/brick1/d4r1-share    49156     0          Y       3098 
Brick 10.70.37.69:/rhs/brick1/d4r2-share    49156     0          Y       3363 
Brick 10.70.37.148:/rhs/brick1/d5r1-share   49158     0          Y       3583 
Brick 10.70.37.77:/rhs/brick1/d5r2-share    49157     0          Y       3363 
Brick 10.70.37.76:/rhs/brick1/d6r1-share    49157     0          Y       3115 
Brick 10.70.37.69:/rhs/brick1/d6r2-share    49157     0          Y       3380 
Self-heal Daemon on localhost               N/A       N/A        Y       25893
Self-heal Daemon on 10.70.37.69             N/A       N/A        Y       28389
Self-heal Daemon on 10.70.37.77             N/A       N/A        Y       4784 
Self-heal Daemon on 10.70.37.148            N/A       N/A        Y       22717
 
Task Status of Volume gluster_shared_storage
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: vol2
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.37.148:/rhs/brick1/d1r1         49153     0          Y       22219
Brick 10.70.37.77:/rhs/brick1/d1r2          49152     0          Y       4321 
Brick 10.70.37.76:/rhs/brick1/d2r1          49152     0          Y       25654
Brick 10.70.37.69:/rhs/brick1/d2r2          49152     0          Y       27914
Brick 10.70.37.148:/rhs/brick1/d3r1         49154     0          Y       18842
Brick 10.70.37.77:/rhs/brick1/d3r2          49153     0          Y       4343 
Brick 10.70.37.76:/rhs/brick1/d4r1          49153     0          Y       25856
Brick 10.70.37.69:/rhs/brick1/d4r2          49153     0          Y       27934
Brick 10.70.37.148:/rhs/brick1/d5r1         49155     0          Y       22237
Brick 10.70.37.77:/rhs/brick1/d5r2          49154     0          Y       4361 
Brick 10.70.37.76:/rhs/brick1/d6r1          49154     0          Y       25874
Brick 10.70.37.69:/rhs/brick1/d6r2          49154     0          Y       27952
Self-heal Daemon on localhost               N/A       N/A        Y       25893
Self-heal Daemon on 10.70.37.77             N/A       N/A        Y       4784 
Self-heal Daemon on 10.70.37.69             N/A       N/A        Y       28389
Self-heal Daemon on 10.70.37.148            N/A       N/A        Y       22717
 
Task Status of Volume vol2
------------------------------------------------------------------------------
There are no active volume tasks


Version-Release number of selected component (if applicable):
glusterfs-3.7.0beta2-0.0.el6.x86_64
nfs-ganesha-2.2.0-0.el6.x86_64

How reproducible:
OOM kill first time

Steps to Reproduce:
1. create a volume of type 6x2 and start it
2. start nfs-ganesha after completing the pre-requisites
3. mount the volume with vers=3
4. start executing posix testsuite

Actual results:
OOM kill seen for nfs-ganesha process

Expected results:
OOM kill should not be seen, as the test executed is posix testsuite

Additional info:

BZ 1221489 was filed with posix testsuite

Comment 1 Saurabh 2015-05-14 08:49:02 UTC
Created attachment 1025318 [details]
sosreport of node1

Comment 2 Meghana 2015-05-14 09:04:25 UTC
Hi Saurabh,

I see the following messages on this node,
May 14 13:34:52 nfs1 setroubleshoot: SELinux is preventing /usr/sbin/glusterfsd from create access on the fifo_file fstest_a1b17951bbc68e452b02abd3cf4cf15a. For complete SELinux messages. run sealert -l cc5df174-cc02-4d36-a558-0fad09188e85
May 14 13:34:52 nfs1 setroubleshoot: SELinux is preventing /usr/sbin/glusterfsd from setattr access on the fifo_file fstest_a1b17951bbc68e452b02abd3cf4cf15a. For complete SELinux messages. run sealert -l 47bc3821-9053-4a1f-a3ea-003f1143cc54
May 14 13:34:52 nfs1 setroubleshoot: SELinux is preventing /usr/sbin/glusterfsd from getattr access on the fifo_file /rhs/brick1/d5r1/dir1/fstest_2f02ef6b423108211157842c608fbe71/fstest_a1b17951bbc68e452b02abd3cf4cf15a. For complete SELinux messages. run sealert -l 733b075a-5f77-40c9-9ce7-064b380001b2
May 14 13:34:52 nfs1 setroubleshoot: SELinux is preventing /usr/sbin/glusterfsd from link access on the fifo_file fstest_a1b17951bbc68e452b02abd3cf4cf15a. For complete SELinux messages. run sealert -l b8a0356e-54f1-43b7-833d-89c3033a65bb
May 14 13:34:52 nfs1 setroubleshoot: SELinux is preventing /usr/sbin/glusterfsd from getattr access on the fifo_file /rhs/brick1/d5r1/dir1/fstest_2f02ef6b423108211157842c608fbe71/fstest_a1b17951bbc68e452b02abd3cf4cf15a. For complete SELinux messages. run sealert -l 733b075a-5f77-40c9-9ce7-064b380001b2
May 14 13:34:53 nfs1 setroubleshoot: SELinux is preventing glusterfsd from read access on the fifo_file ec1a1d0d-0f95-4c14-a291-fe780016614f. For complete SELinux messages. run sealert -l 26cfead6-e5fb-4381-8bb3-03edae2b21d9

Can you please try this in permissive mode?

Comment 3 Saurabh 2015-05-14 10:11:32 UTC
selinux is already in permissive mode.

[root@nfs1 ~]# sestatus
SELinux status:                 enabled
SELinuxfs mount:                /selinux
Current mode:                   permissive
Mode from config file:          permissive
Policy version:                 24
Policy from config file:        targeted

Comment 4 Meghana 2015-05-14 10:17:12 UTC
Even in the permissive mode, we see that read acccess  is prevented. Please disable SElinux and try once. I am running the test suite on my 4 node set up currently.

Comment 5 Anand Subramanian 2015-05-14 10:27:02 UTC
Saurabh,

If it is glusterfsd that is being prevented from reading the attrs, what makes you file the bug for nfs-ganesha? Won't this be impacting glusterfsd on a general note? If so can you check with other folks who may happen to have a solution for glusterfs attr access with selinux enabled? If not, I agree we can try and work on a policy - my only point being this doesn't seem to be a ganesha issue.

Meghana - Please don't mark the comments as private if they do not contain any confidential info.

Thanks,
Anand

Comment 6 Soumya Koduri 2015-05-15 08:21:01 UTC
I can think of one RCA for this issue atm though not checked the logs/core. Since upcall is enabled, gfapi will queue all the upcall notifications received from server in a queue. But since the patch in NFS-Ganesha (which polls and cleans up these entries) haven't yet got merged, I am thinking those entries may have consumed lots of memory which resulted in OOM kill. Please disable upcall 'gluster vol set features.cache-invalidation off' and run the test-suite just to confirm if this is the case. Thanks!

Comment 7 Soumya Koduri 2015-06-01 06:53:58 UTC
Created attachment 1033153 [details]
posix-compliance-tests-v3-with-upcall

Comment 8 Soumya Koduri 2015-06-01 06:54:27 UTC
Created attachment 1033154 [details]
posix-compliance-tests-v3-without-upcall

Comment 9 Soumya Koduri 2015-06-01 06:54:51 UTC
Created attachment 1033155 [details]
posix-compliance-tests-v4-with-upcall

Comment 10 Soumya Koduri 2015-06-01 06:55:19 UTC
Created attachment 1033156 [details]
posix-compliance-tests-v4-without-upcall

Comment 11 Soumya Koduri 2015-06-01 07:00:20 UTC
With the latest upstream gluster and nfs-ganesha (v2.3 dev-5) sources, I do not see any cores/OOM issues reported while running posix-compliance tests. However there are some tests failures and also the results are similar with or without features.cache-invalidation on/off.

v3 Mount- 

Test Summary Report
-------------------
/opt/qa/tools/posix-testsuite/tests/chown/00.t   (Wstat: 0 Tests: 171 Failed: 1)
  Failed test:  77
Files=185, Tests=1962, 104 wallclock secs ( 0.64 usr  0.20 sys +  3.04 cusr  0.56 csys =  4.44 CPU)
Result: FAIL



v4 Mount - 

Test Summary Report
-------------------
/opt/qa/tools/posix-testsuite/tests/chown/00.t   (Wstat: 0 Tests: 171 Failed: 1)
  Failed test:  77
/opt/qa/tools/posix-testsuite/tests/open/07.t    (Wstat: 0 Tests: 23 Failed: 3)
  Failed tests:  5, 7, 9
Files=185, Tests=1962, 106 wallclock secs ( 0.67 usr  0.17 sys +  3.06 cusr  0.57 csys =  4.47 CPU)
Result: FAIL


Looking into those specific test failures.

Comment 12 Soumya Koduri 2015-06-08 12:41:10 UTC
With respect to chown tests, I see the same failure in case of Gluster-NFS as well.
/opt/qa/tools/posix-testsuite/tests/chown/00.t
Failed test:  77

<<<<<<<
# when non-super-user calls chown(2) successfully, set-uid and set-gid bits are
# removed, except when both uid and gid are equal to -1.
# 64
expect 0 create ${n0} 0644
expect 0 chown ${n0} 65534 65533
expect 0 chmod ${n0} 06555
expect 06555 lstat ${n0} mode
expect 0 -u 65534 -g 65533,65532 chown ${n0} 65534 65532
expect 0555,65534,65532 lstat ${n0} mode,uid,gid
expect 0 chmod ${n0} 06555
expect 06555 lstat ${n0} mode
expect 0 -u 65534 -g 65533,65532 -- chown ${n0} -1 65533
expect 0555,65534,65533 lstat ${n0} mode,uid,gid
expect 0 chmod ${n0} 06555
expect 06555 lstat ${n0} mode
expect 0 -u 65534 -g 65533,65532 -- chown ${n0} -1 -1
case "${os}" in
Linux)
echo "os = ${os}"
	expect 0555,65534,65533 lstat ${n0} mode,uid,gid
        ;;
*)
	expect 06555,65534,65533 lstat ${n0} mode,uid,gid
        ;;
esac
>>>>>>>>

Last 'expect' in this block is test#77. From the packet trace I have observed that nfs-client never sends 'expect 0 -u 65534 -g 65533,65532 -- chown ${n0} -1 -1' to the server and that has resulted in no change in the mode-bits. This doesn't seem a server issue.

Attached the pkt-trace (nfsv3_chown.pcap)

Comment 13 Soumya Koduri 2015-06-09 12:24:27 UTC
>>>>
/opt/qa/tools/posix-testsuite/tests/open/07.t    (Wstat: 0 Tests: 23 Failed: 3)
  Failed tests:  5, 7, 9


These are the expected failures with respect to NFSv4. Got same results using kernel-NFS as well. 


expect 0 -u 65534 -g 65534 create ${n1} 0644

expect 0 -u 65534 -g 65534 chmod ${n1} 0477
expect EACCES -u 65534 -g 65534 open ${n1} O_RDONLY,O_TRUNC   (FAIL)
expect 0 -u 65534 -g 65534 chmod ${n1} 0747
expect EACCES -u 65533 -g 65534 open ${n1} O_RDONLY,O_TRUNC   (FAIL)
expect 0 -u 65534 -g 65534 chmod ${n1} 0774
expect EACCES -u 65533 -g 65533 open ${n1} O_RDONLY,O_TRUNC   (FAIL)

The issue here is that in the test it tries to open the file in O_READ, O_TRUNC mode which should ideally have returned EACCESS. But NFSv4 client maps it to OPEN with 'OPEN4_SHARE_ACCESS_READ' and since the owner has READ permissions, server rightly grants it.

That means both chown(00.t - #77) and open(07.t - #5, #7, #9) tests should be marked as Known_Issues.