Hide Forgot
Description of problem: We run RHEL 6.1 NFS servers and clients and mount home directories via NFS3. This setup generally works but we sometimes get a 'permission denied' error, for instance when running an 'ls' in ssh sessions that were left idle for hours: $ cd ~/a/b/c $ ls 1 2 3 ... (wait a few hours) $ ls ls: cannot open directory .: Permission denied The export is not the issue because the error immediately goes away when the directory is accessed using its absolute path: $ ls ls: cannot open directory .: Permission denied $ ls ~/a/b/c 1 2 3 $ ls 1 2 3 Turing off iptables and SELinux on server and clients does not help. Mounting with all kinds of options like lookupcache=none or sync does also have no impact. The error can more or less forced to show up immediately by modifying the file caching behaviour of the server: # echo 20000 > /proc/sys/vm/vfs_cache_pressure # echo 3 > /proc/sys/vm/drop_caches # sync Over the network I see this with wireshark when the 'ls' fails: 1 0.000000 x y NFS V3 ACCESS Call, FH:0xd01522c9 2 0.000115 x y NFS V3 ACCESS Reply (Call In 1) Error:NFS3ERR_ACCES I did some initial bug hunting with a kernel built from kernel-2.6.32-131.12.1.el6.src.rpm that has some more dprintk()s spread over the nfsd code, starting with nfsd_access() in fs/nfsd/nfs3proc.c. Here are my results so far: The NFS3ERR_ACCES comes from an EACCES detected by the following code in the function nfsd_set_fh_dentry() in fs/nfsd/nfsfh.c: 242 if (fileid_type == FILEID_ROOT) 243 dentry = dget(exp->ex_path.dentry); 244 else { 245 dentry = exportfs_decode_fh(exp->ex_path.mnt, fid, 246 data_left, fileid_type, 247 nfsd_acceptable, exp); 248 } 249 if (dentry == NULL) 250 goto out; 251 if (IS_ERR(dentry)) { 252 if (PTR_ERR(dentry) != -EINVAL) 253 error = nfserrno(PTR_ERR(dentry)); 254 goto out; 255 } fileid_type is FILEID_ROOT when the error occurs and therefore the entry exp->ex_path.dentry is used and seems to be an -EACCES error code instead of a pointer to a valid dentry. I have not found out what code sets exp->ex_path.dentry to -EACCES but i suspect that should actually never be the case otherwise the dget() in line 243 is unsafe because it assumes to operate on a valid pointer. If someone of the nfs-utils or kernel nfs wizards could help to debug this further and fix the issue would be greatly appreciated. Version-Release number of selected component (if applicable): kernel-2.6.32-131.12.1.el6.x86_64 nfs-utils-1.2.3-7.el6.x86_64 How reproducible: As described above. We failed so far to reproduce it on a simple test server. On our productive server with tens of exports and many clients it always shows up. Steps to Reproduce: 1. Configure a fairly large NFS3 server 2. Mount a home directory via NFS3 3. Checne the current directory to some subdirectory 4. try to do an 'ls' Actual results: As described above. Expected results: There should be no errors.
It seem that our testing with SELinux was not thorough enough (our team is playing with a server of a productive service after all). After a few reboots the server started to report SELinux messages like the following randomly: type=1400 audit(1319712116.696:712): avc: denied { 0x400000 } for pid=3164 comm="nfsd" name="" dev=dm-18 ino=16124690 scontext=system_u:system_r:kernel_t:s0 tcontext=system_u:object_r:unlabeled_t:s0 tclass=file This avc message did not show up in our logs before today. A quick google search for that message leads me to believe that we hit the bug reported in BZ576207 which is not fixed in the RHEl6. Disabling SELinux on the server now makes the problem go away. I am going to run our server with a kernel that incorporates the patch for a few days. If that works stable I will request to back-port the fix from BZ576207 to RHEL6.
Since RHEL 6.2 External Beta has begun, and this bug remains unresolved, it has been rejected as it is not proposed as exception or blocker. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux.
Supposedly, this bug has been fixed in RHEL6 though I don't know the BZ# right offhand. Are you still able to reproduce this on more recent kernels?
Ok, for the record this is probably the same as bug 656458, and should be fixed in 6.2. I'll go ahead and close this as a duplicate. Please reopen if it's not fixed in 6.2. *** This bug has been marked as a duplicate of bug 656458 ***