302671 – Client gets AUTH_ERROR ( bad credentials) when switching the package to adoptive node under HA environment

Bug 302671 - Client gets AUTH_ERROR ( bad credentials) when switching the package to adoptive node under HA environment

Summary: Client gets AUTH_ERROR ( bad credentials) when switching the package to adopt...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	nfs-utils
Sub Component:
Version:	4.4
Hardware:	ia64
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Jeff Layton
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2007-09-24 06:49 UTC by Rahul Prasad
Modified:	2009-09-23 12:32 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-10-08 03:23:45 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Rahul Prasad 2007-09-24 06:49:20 UTC

Description of problem:

The client (HP-UX system) gives "RPC: Authentication error" as shown below:
# bdf -t nfs
NFS getattr failed for server XXX: RPC: Authentication error
NFS fsstat failed for server XXX: RPC: Authentication error
bdf: /nfs/linux: I/O error

When a linux client is used the error returned is "Permission Denied".
A "df" on the mounted partion returns:
XXX:/nfs            -         -         -   -  /nfs/linux

A "dmesg | tail -1" command returns 
nfs_statfs: statfs error = 13


Version-Release number of selected component (if applicable):

  nfs-utils-1.0.6-70.EL4-x86_64
  nfs-toolkit-A.01.04-0-i386
  serviceguard-A.11.16.07-0-x86_64


How reproducible:
Create a two node SGLX cluster and install and configure the nfs toolkit on both
the nodes. 

Steps to Reproduce:
1. chkconfig nfs off
2. clear /var/lib/nfs/rmtab , xtab, etab to have a clean start (optional)
3. Reboot the nodes.
4. Start the cluster (cmruncl).
5. Run the nfs package.
6. Mount the exported directory in the client.
7. Stop the cluster and package and reboot the node.
8. After bootup start the cluster and the package.
9. At this point, accessing the mount fails with the above error

However, umounting and mounting the filesystem back allows the clients to access
 the directory just fine.
  
Actual results:
NFS getattr failed for server XXX: RPC: Authentication error
NFS fsstat failed for server XXX: RPC: Authentication error
bdf: /nfs/linux: I/O error

Expected results:
Disk information stats.

Additional info:
The problem is not seen (until the next reboot) if "service nfs restart" command
is executed.

Comment 1 Jeff Layton 2007-10-03 16:25:27 UTC

Error 13 is:

/usr/include/asm-generic/errno-base.h:#define   EACCES          13      /*
Permission denied */

...so it sounds like something is probably strange with mountd or exports here. 

I'm not familiar with SGLX (is that serviceguard?) clustering. Is this
reproducible without it? If you reboot the box.

Here's what I'd like to see first:

1) a packet trace, preferably showing a working statfs call, and then the failed
statfs after the machine is rebooted. i.e. start a packet capture, do the "bdf"
command, reboot the cluster node, and when it comes back up, do the bdf command
again and get the error. This should show whether the client is sending
something odd in the subsequent RPC calls after the reboot. Doubtful, but it
would be good to know for sure.

2) the output from 'exportfs -v' and 'showmount -e' on the server both before
and after the reboot. Since access is generally controlled by mountd, we want to
know what its idea of the export table is before and after the reboot.

The *best* thing would be a way to reproduce this that doesn't involve
clustering software at all.

Comment 2 Jeff Layton 2007-10-03 20:00:09 UTC

*** Bug 302611 has been marked as a duplicate of this bug. ***

Comment 3 Rahul Prasad 2007-10-08 03:23:45 UTC

We pin pointed the error to be due to the incorrect order of starting mountd &
nfsd daemons in serviceguard. This error is not showing up now.

(In reply to comment #1)
> Error 13 is:
> 
> /usr/include/asm-generic/errno-base.h:#define   EACCES          13      /*
> Permission denied */
> 
> ...so it sounds like something is probably strange with mountd or exports here. 
> 
> I'm not familiar with SGLX (is that serviceguard?) clustering. Is this
> reproducible without it? If you reboot the box.
> 
> Here's what I'd like to see first:
> 
> 1) a packet trace, preferably showing a working statfs call, and then the failed
> statfs after the machine is rebooted. i.e. start a packet capture, do the "bdf"
> command, reboot the cluster node, and when it comes back up, do the bdf command
> again and get the error. This should show whether the client is sending
> something odd in the subsequent RPC calls after the reboot. Doubtful, but it
> would be good to know for sure.
> 
> 2) the output from 'exportfs -v' and 'showmount -e' on the server both before
> and after the reboot. Since access is generally controlled by mountd, we want to
> know what its idea of the export table is before and after the reboot.
> 
> The *best* thing would be a way to reproduce this that doesn't involve
> clustering software at all.
>

Note You need to log in before you can comment on or make changes to this bug.