Description of problem: The client (HP-UX system) gives "RPC: Authentication error" as shown below: # bdf -t nfs NFS getattr failed for server XXX: RPC: Authentication error NFS fsstat failed for server XXX: RPC: Authentication error bdf: /nfs/linux: I/O error When a linux client is used the error returned is "Permission Denied". A "df" on the mounted partion returns: XXX:/nfs - - - - /nfs/linux A "dmesg | tail -1" command returns nfs_statfs: statfs error = 13 Version-Release number of selected component (if applicable): nfs-utils-1.0.6-70.EL4-x86_64 nfs-toolkit-A.01.04-0-i386 serviceguard-A.11.16.07-0-x86_64 How reproducible: Create a two node SGLX cluster and install and configure the nfs toolkit on both the nodes. Steps to Reproduce: 1. chkconfig nfs off 2. clear /var/lib/nfs/rmtab , xtab, etab to have a clean start (optional) 3. Reboot the nodes. 4. Start the cluster (cmruncl). 5. Run the nfs package. 6. Mount the exported directory in the client. 7. Stop the cluster and package and reboot the node. 8. After bootup start the cluster and the package. 9. At this point, accessing the mount fails with the above error However, umounting and mounting the filesystem back allows the clients to access the directory just fine. Actual results: NFS getattr failed for server XXX: RPC: Authentication error NFS fsstat failed for server XXX: RPC: Authentication error bdf: /nfs/linux: I/O error Expected results: Disk information stats. Additional info: The problem is not seen (until the next reboot) if "service nfs restart" command is executed.
Error 13 is: /usr/include/asm-generic/errno-base.h:#define EACCES 13 /* Permission denied */ ...so it sounds like something is probably strange with mountd or exports here. I'm not familiar with SGLX (is that serviceguard?) clustering. Is this reproducible without it? If you reboot the box. Here's what I'd like to see first: 1) a packet trace, preferably showing a working statfs call, and then the failed statfs after the machine is rebooted. i.e. start a packet capture, do the "bdf" command, reboot the cluster node, and when it comes back up, do the bdf command again and get the error. This should show whether the client is sending something odd in the subsequent RPC calls after the reboot. Doubtful, but it would be good to know for sure. 2) the output from 'exportfs -v' and 'showmount -e' on the server both before and after the reboot. Since access is generally controlled by mountd, we want to know what its idea of the export table is before and after the reboot. The *best* thing would be a way to reproduce this that doesn't involve clustering software at all.
*** Bug 302611 has been marked as a duplicate of this bug. ***
We pin pointed the error to be due to the incorrect order of starting mountd & nfsd daemons in serviceguard. This error is not showing up now. (In reply to comment #1) > Error 13 is: > > /usr/include/asm-generic/errno-base.h:#define EACCES 13 /* > Permission denied */ > > ...so it sounds like something is probably strange with mountd or exports here. > > I'm not familiar with SGLX (is that serviceguard?) clustering. Is this > reproducible without it? If you reboot the box. > > Here's what I'd like to see first: > > 1) a packet trace, preferably showing a working statfs call, and then the failed > statfs after the machine is rebooted. i.e. start a packet capture, do the "bdf" > command, reboot the cluster node, and when it comes back up, do the bdf command > again and get the error. This should show whether the client is sending > something odd in the subsequent RPC calls after the reboot. Doubtful, but it > would be good to know for sure. > > 2) the output from 'exportfs -v' and 'showmount -e' on the server both before > and after the reboot. Since access is generally controlled by mountd, we want to > know what its idea of the export table is before and after the reboot. > > The *best* thing would be a way to reproduce this that doesn't involve > clustering software at all. >