Bug 1221511
Summary: | nfs-ganesha: OOM killed for nfsd process while executing the posix test suite | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Saurabh <saujain> | ||||||||||||||
Component: | ganesha-nfs | Assignee: | Soumya Koduri <skoduri> | ||||||||||||||
Status: | CLOSED NOTABUG | QA Contact: | |||||||||||||||
Severity: | high | Docs Contact: | |||||||||||||||
Priority: | urgent | ||||||||||||||||
Version: | 3.7.0 | CC: | kkeithle, mzywusko, ndevos, rkavunga, skoduri, smohan, vagarwal | ||||||||||||||
Target Milestone: | --- | Keywords: | Triaged | ||||||||||||||
Target Release: | --- | ||||||||||||||||
Hardware: | x86_64 | ||||||||||||||||
OS: | Linux | ||||||||||||||||
Whiteboard: | |||||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||
Clone Of: | Environment: | ||||||||||||||||
Last Closed: | 2016-02-05 11:26:30 UTC | Type: | Bug | ||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||
Embargoed: | |||||||||||||||||
Attachments: |
|
Description
Saurabh
2015-05-14 08:45:53 UTC
Created attachment 1025318 [details]
sosreport of node1
Hi Saurabh, I see the following messages on this node, May 14 13:34:52 nfs1 setroubleshoot: SELinux is preventing /usr/sbin/glusterfsd from create access on the fifo_file fstest_a1b17951bbc68e452b02abd3cf4cf15a. For complete SELinux messages. run sealert -l cc5df174-cc02-4d36-a558-0fad09188e85 May 14 13:34:52 nfs1 setroubleshoot: SELinux is preventing /usr/sbin/glusterfsd from setattr access on the fifo_file fstest_a1b17951bbc68e452b02abd3cf4cf15a. For complete SELinux messages. run sealert -l 47bc3821-9053-4a1f-a3ea-003f1143cc54 May 14 13:34:52 nfs1 setroubleshoot: SELinux is preventing /usr/sbin/glusterfsd from getattr access on the fifo_file /rhs/brick1/d5r1/dir1/fstest_2f02ef6b423108211157842c608fbe71/fstest_a1b17951bbc68e452b02abd3cf4cf15a. For complete SELinux messages. run sealert -l 733b075a-5f77-40c9-9ce7-064b380001b2 May 14 13:34:52 nfs1 setroubleshoot: SELinux is preventing /usr/sbin/glusterfsd from link access on the fifo_file fstest_a1b17951bbc68e452b02abd3cf4cf15a. For complete SELinux messages. run sealert -l b8a0356e-54f1-43b7-833d-89c3033a65bb May 14 13:34:52 nfs1 setroubleshoot: SELinux is preventing /usr/sbin/glusterfsd from getattr access on the fifo_file /rhs/brick1/d5r1/dir1/fstest_2f02ef6b423108211157842c608fbe71/fstest_a1b17951bbc68e452b02abd3cf4cf15a. For complete SELinux messages. run sealert -l 733b075a-5f77-40c9-9ce7-064b380001b2 May 14 13:34:53 nfs1 setroubleshoot: SELinux is preventing glusterfsd from read access on the fifo_file ec1a1d0d-0f95-4c14-a291-fe780016614f. For complete SELinux messages. run sealert -l 26cfead6-e5fb-4381-8bb3-03edae2b21d9 Can you please try this in permissive mode? selinux is already in permissive mode. [root@nfs1 ~]# sestatus SELinux status: enabled SELinuxfs mount: /selinux Current mode: permissive Mode from config file: permissive Policy version: 24 Policy from config file: targeted Even in the permissive mode, we see that read acccess is prevented. Please disable SElinux and try once. I am running the test suite on my 4 node set up currently. Saurabh, If it is glusterfsd that is being prevented from reading the attrs, what makes you file the bug for nfs-ganesha? Won't this be impacting glusterfsd on a general note? If so can you check with other folks who may happen to have a solution for glusterfs attr access with selinux enabled? If not, I agree we can try and work on a policy - my only point being this doesn't seem to be a ganesha issue. Meghana - Please don't mark the comments as private if they do not contain any confidential info. Thanks, Anand I can think of one RCA for this issue atm though not checked the logs/core. Since upcall is enabled, gfapi will queue all the upcall notifications received from server in a queue. But since the patch in NFS-Ganesha (which polls and cleans up these entries) haven't yet got merged, I am thinking those entries may have consumed lots of memory which resulted in OOM kill. Please disable upcall 'gluster vol set features.cache-invalidation off' and run the test-suite just to confirm if this is the case. Thanks! Created attachment 1033153 [details]
posix-compliance-tests-v3-with-upcall
Created attachment 1033154 [details]
posix-compliance-tests-v3-without-upcall
Created attachment 1033155 [details]
posix-compliance-tests-v4-with-upcall
Created attachment 1033156 [details]
posix-compliance-tests-v4-without-upcall
With the latest upstream gluster and nfs-ganesha (v2.3 dev-5) sources, I do not see any cores/OOM issues reported while running posix-compliance tests. However there are some tests failures and also the results are similar with or without features.cache-invalidation on/off. v3 Mount- Test Summary Report ------------------- /opt/qa/tools/posix-testsuite/tests/chown/00.t (Wstat: 0 Tests: 171 Failed: 1) Failed test: 77 Files=185, Tests=1962, 104 wallclock secs ( 0.64 usr 0.20 sys + 3.04 cusr 0.56 csys = 4.44 CPU) Result: FAIL v4 Mount - Test Summary Report ------------------- /opt/qa/tools/posix-testsuite/tests/chown/00.t (Wstat: 0 Tests: 171 Failed: 1) Failed test: 77 /opt/qa/tools/posix-testsuite/tests/open/07.t (Wstat: 0 Tests: 23 Failed: 3) Failed tests: 5, 7, 9 Files=185, Tests=1962, 106 wallclock secs ( 0.67 usr 0.17 sys + 3.06 cusr 0.57 csys = 4.47 CPU) Result: FAIL Looking into those specific test failures. With respect to chown tests, I see the same failure in case of Gluster-NFS as well.
/opt/qa/tools/posix-testsuite/tests/chown/00.t
Failed test: 77
<<<<<<<
# when non-super-user calls chown(2) successfully, set-uid and set-gid bits are
# removed, except when both uid and gid are equal to -1.
# 64
expect 0 create ${n0} 0644
expect 0 chown ${n0} 65534 65533
expect 0 chmod ${n0} 06555
expect 06555 lstat ${n0} mode
expect 0 -u 65534 -g 65533,65532 chown ${n0} 65534 65532
expect 0555,65534,65532 lstat ${n0} mode,uid,gid
expect 0 chmod ${n0} 06555
expect 06555 lstat ${n0} mode
expect 0 -u 65534 -g 65533,65532 -- chown ${n0} -1 65533
expect 0555,65534,65533 lstat ${n0} mode,uid,gid
expect 0 chmod ${n0} 06555
expect 06555 lstat ${n0} mode
expect 0 -u 65534 -g 65533,65532 -- chown ${n0} -1 -1
case "${os}" in
Linux)
echo "os = ${os}"
expect 0555,65534,65533 lstat ${n0} mode,uid,gid
;;
*)
expect 06555,65534,65533 lstat ${n0} mode,uid,gid
;;
esac
>>>>>>>>
Last 'expect' in this block is test#77. From the packet trace I have observed that nfs-client never sends 'expect 0 -u 65534 -g 65533,65532 -- chown ${n0} -1 -1' to the server and that has resulted in no change in the mode-bits. This doesn't seem a server issue.
Attached the pkt-trace (nfsv3_chown.pcap)
>>>>
/opt/qa/tools/posix-testsuite/tests/open/07.t (Wstat: 0 Tests: 23 Failed: 3)
Failed tests: 5, 7, 9
These are the expected failures with respect to NFSv4. Got same results using kernel-NFS as well.
expect 0 -u 65534 -g 65534 create ${n1} 0644
expect 0 -u 65534 -g 65534 chmod ${n1} 0477
expect EACCES -u 65534 -g 65534 open ${n1} O_RDONLY,O_TRUNC (FAIL)
expect 0 -u 65534 -g 65534 chmod ${n1} 0747
expect EACCES -u 65533 -g 65534 open ${n1} O_RDONLY,O_TRUNC (FAIL)
expect 0 -u 65534 -g 65534 chmod ${n1} 0774
expect EACCES -u 65533 -g 65533 open ${n1} O_RDONLY,O_TRUNC (FAIL)
The issue here is that in the test it tries to open the file in O_READ, O_TRUNC mode which should ideally have returned EACCESS. But NFSv4 client maps it to OPEN with 'OPEN4_SHARE_ACCESS_READ' and since the owner has READ permissions, server rightly grants it.
That means both chown(00.t - #77) and open(07.t - #5, #7, #9) tests should be marked as Known_Issues.
|