Bug 201438 - vi (chown) doesn't work in an nfs4 mount
vi (chown) doesn't work in an nfs4 mount
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: nfs-utils (Show other bugs)
4.4
All Linux
medium Severity medium
: ---
: ---
Assigned To: Steve Dickson
Ben Levenson
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-08-05 07:27 EDT by Kostas Georgiou
Modified: 2007-11-16 20:14 EST (History)
0 users

See Also:
Fixed In Version: nfs-utils-1.0.6-76
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-02-12 17:52:00 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
nfs4 network trace with host servername and port 2049 (35.67 KB, application/octet-stream)
2006-08-16 10:42 EDT, Kostas Georgiou
no flags Details

  None (edit)
Description Kostas Georgiou 2006-08-05 07:27:29 EDT
Running vi in a nfs4 home directory gets stuck in a call to chown
strace shows that the chown call never returns.
  chown("5159", 1111, 222) = ? ERESTARTSYS (To be restarted)
the ERESTARTSYS is because of a kill.

The server is a x86_64 RHEL4 machine and both i386/x86_64 RHEL4 clients
have the problem with nfs4 mounts with sec=sys and sec=krb5. A mount from
a FC5 client to the same server works fine.

The RHEL4 machines are fully patched (U3+fasttrack) and the latest available
kernel (2.6.9-42.ELsmp) doesn't help either.

Let me know if you want tcpdump logs etc.
Comment 1 Kostas Georgiou 2006-08-10 05:47:06 EDT
ping
Comment 2 Kostas Georgiou 2006-08-14 11:26:57 EDT
Still there in update4
Comment 3 Steve Dickson 2006-08-16 07:28:55 EDT
I'm not seen this behavior at all... so could you please posted
as bzip2 binary tethereal trace of the problem. Something
similar to 'tethereal -w /tmp/data.pcap host <server> ; bzip2 /tmp/data.pcap'

Comment 4 Kostas Georgiou 2006-08-16 09:04:48 EDT
Hmm this is going to take a while, it seems that the machine is hit by something
similar to: http://linux-nfs.org/pipermail/nfsv4/2006-April/004132.html and
http://linux-nfs.org/pipermail/nfsv4/2006-April/004103.html
Since the new nfs-utils-lib includes librpcsecgss-0.10 I suspect that this is
the cause. 

I'll roll back to the older version of nfs_utils* to get the network dump. 

Do you want me to open a new bug for the "rpc.gssd: WARNING: can't create
rpc_clnt for server ...." error message?
Comment 5 Steve Dickson 2006-08-16 10:16:46 EDT
Yes.. please... thanks!!

Comment 6 Kostas Georgiou 2006-08-16 10:42:09 EDT
Created attachment 134314 [details]
nfs4 network trace with host servername and port 2049
Comment 7 Kostas Georgiou 2006-08-16 10:45:54 EDT
I had to roll back the server (x86_64 also) as well since it was failing with:

Aug 16 15:33:23 icva rpc.svcgssd[3443]: WARNING: get_uid failed
Aug 16 15:33:23 icva rpc.svcgssd[3443]: WARNING: handle_nullreq: get_uid failed
Aug 16 15:33:48 icva rpc.svcgssd[3443]: WARNING: get_uid failed
Aug 16 15:33:48 icva rpc.svcgssd[3443]: WARNING: handle_nullreq: get_uid failed
Aug 16 15:34:13 icva rpc.svcgssd[3443]: WARNING: get_uid failed
Aug 16 15:34:13 icva rpc.svcgssd[3443]: WARNING: handle_nullreq: get_uid failed

Which makes me wonder how did I manage to claim that the chown bug is still
there in update4? Probably I only tested with the new kernel...
Comment 8 Kostas Georgiou 2006-08-19 14:16:12 EDT
#203239 for the regression.
Comment 9 Kostas Georgiou 2006-10-20 08:43:29 EDT
Any thoughts on what is causing the problem? Can you replicate the problem at all?
Comment 10 Steve Dickson 2006-10-20 12:23:06 EDT
This probably does not happen when you don't use sec=krb5, correct?
Comment 11 Kostas Georgiou 2006-10-20 12:41:05 EDT
Both sec=sys and sec=krb5 show the problem, the server is x86_64 but I tried
both i386 and x86_64 clients (with and without krb5). 

When does the server reply with NFS4ERR_DELAY? From what I can see the client
keep sgetting the delay "error" forever :(
Comment 12 Steve Dickson 2006-10-30 11:30:42 EST
Well when the server returns NFS4ERR_DELAY, it generally means
the server has done an upcall to some user daemon and is waiting
for the response (i.e. a downcall)..... Now a NFS4ERR_DELAY on
the client means go away for a second or two and then retry...
So seeing the client continuously trying is normal....

Now looking a Comment #7, it appears the user level
daemon the server is waiting for is rpc.svcgssd, who seems
to be having a problem getting the uid from Kerberos...

What version of nfs-utils and nfs-utils-lib are you using?
Comment 13 RHEL Product and Program Management 2006-10-30 11:45:57 EST
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 14 Kostas Georgiou 2006-10-30 13:19:53 EST
The testing was done with the version previous to 4.4, since the latest version
is broken under x86-64.
The error message in comment #7 is from nfs-utils-1.0.6-70.EL4 (bugzilla
#203239) which is a different issue.
Comment 15 Steve Dickson 2006-10-30 20:07:26 EST
Just curious... what is the containts of /etc/gssapi_mech.conf?
It should have an entry like:

    libgssapi_krb5.so.2     mechglue_internal_krb5_init
Comment 16 Kostas Georgiou 2006-10-30 20:33:54 EST
It's 
libgssapi_krb5.so     mechglue_internal_krb5_init
I think I did remove the /usr/lib/ by hand since the install was before the fix
(was it in 4.4? I can't remember really)
Comment 18 Steve Dickson 2007-01-09 10:32:37 EST
The errors in Comment #7 should be fixed in nfs-utils-1.0.6-76
Comment 19 Kostas Georgiou 2007-02-12 18:14:07 EST
Is nfs-utils-1.0.6-76 available anywhere? I imagine it's part of 4.5 right?
Comment 20 Steve Dickson 2007-02-12 18:31:09 EST
yes... Please feel free to reopen this bug if you still have the same
problem..

Note You need to log in before you can comment on or make changes to this bug.