Bug 201438

Summary: vi (chown) doesn't work in an nfs4 mount
Product: Red Hat Enterprise Linux 4 Reporter: Kostas Georgiou <k.georgiou>
Component: nfs-utilsAssignee: Steve Dickson <steved>
Status: CLOSED CURRENTRELEASE QA Contact: Ben Levenson <benl>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.4   
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: nfs-utils-1.0.6-76 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-02-12 22:52:00 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
nfs4 network trace with host servername and port 2049 none

Description Kostas Georgiou 2006-08-05 11:27:29 UTC
Running vi in a nfs4 home directory gets stuck in a call to chown
strace shows that the chown call never returns.
  chown("5159", 1111, 222) = ? ERESTARTSYS (To be restarted)
the ERESTARTSYS is because of a kill.

The server is a x86_64 RHEL4 machine and both i386/x86_64 RHEL4 clients
have the problem with nfs4 mounts with sec=sys and sec=krb5. A mount from
a FC5 client to the same server works fine.

The RHEL4 machines are fully patched (U3+fasttrack) and the latest available
kernel (2.6.9-42.ELsmp) doesn't help either.

Let me know if you want tcpdump logs etc.

Comment 1 Kostas Georgiou 2006-08-10 09:47:06 UTC
ping

Comment 2 Kostas Georgiou 2006-08-14 15:26:57 UTC
Still there in update4

Comment 3 Steve Dickson 2006-08-16 11:28:55 UTC
I'm not seen this behavior at all... so could you please posted
as bzip2 binary tethereal trace of the problem. Something
similar to 'tethereal -w /tmp/data.pcap host <server> ; bzip2 /tmp/data.pcap'



Comment 4 Kostas Georgiou 2006-08-16 13:04:48 UTC
Hmm this is going to take a while, it seems that the machine is hit by something
similar to: http://linux-nfs.org/pipermail/nfsv4/2006-April/004132.html and
http://linux-nfs.org/pipermail/nfsv4/2006-April/004103.html
Since the new nfs-utils-lib includes librpcsecgss-0.10 I suspect that this is
the cause. 

I'll roll back to the older version of nfs_utils* to get the network dump. 

Do you want me to open a new bug for the "rpc.gssd: WARNING: can't create
rpc_clnt for server ...." error message?

Comment 5 Steve Dickson 2006-08-16 14:16:46 UTC
Yes.. please... thanks!!



Comment 6 Kostas Georgiou 2006-08-16 14:42:09 UTC
Created attachment 134314 [details]
nfs4 network trace with host servername and port 2049

Comment 7 Kostas Georgiou 2006-08-16 14:45:54 UTC
I had to roll back the server (x86_64 also) as well since it was failing with:

Aug 16 15:33:23 icva rpc.svcgssd[3443]: WARNING: get_uid failed
Aug 16 15:33:23 icva rpc.svcgssd[3443]: WARNING: handle_nullreq: get_uid failed
Aug 16 15:33:48 icva rpc.svcgssd[3443]: WARNING: get_uid failed
Aug 16 15:33:48 icva rpc.svcgssd[3443]: WARNING: handle_nullreq: get_uid failed
Aug 16 15:34:13 icva rpc.svcgssd[3443]: WARNING: get_uid failed
Aug 16 15:34:13 icva rpc.svcgssd[3443]: WARNING: handle_nullreq: get_uid failed

Which makes me wonder how did I manage to claim that the chown bug is still
there in update4? Probably I only tested with the new kernel...

Comment 8 Kostas Georgiou 2006-08-19 18:16:12 UTC
#203239 for the regression.

Comment 9 Kostas Georgiou 2006-10-20 12:43:29 UTC
Any thoughts on what is causing the problem? Can you replicate the problem at all?

Comment 10 Steve Dickson 2006-10-20 16:23:06 UTC
This probably does not happen when you don't use sec=krb5, correct?

Comment 11 Kostas Georgiou 2006-10-20 16:41:05 UTC
Both sec=sys and sec=krb5 show the problem, the server is x86_64 but I tried
both i386 and x86_64 clients (with and without krb5). 

When does the server reply with NFS4ERR_DELAY? From what I can see the client
keep sgetting the delay "error" forever :(

Comment 12 Steve Dickson 2006-10-30 16:30:42 UTC
Well when the server returns NFS4ERR_DELAY, it generally means
the server has done an upcall to some user daemon and is waiting
for the response (i.e. a downcall)..... Now a NFS4ERR_DELAY on
the client means go away for a second or two and then retry...
So seeing the client continuously trying is normal....

Now looking a Comment #7, it appears the user level
daemon the server is waiting for is rpc.svcgssd, who seems
to be having a problem getting the uid from Kerberos...

What version of nfs-utils and nfs-utils-lib are you using?


Comment 13 RHEL Program Management 2006-10-30 16:45:57 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 14 Kostas Georgiou 2006-10-30 18:19:53 UTC
The testing was done with the version previous to 4.4, since the latest version
is broken under x86-64.
The error message in comment #7 is from nfs-utils-1.0.6-70.EL4 (bugzilla
#203239) which is a different issue.

Comment 15 Steve Dickson 2006-10-31 01:07:26 UTC
Just curious... what is the containts of /etc/gssapi_mech.conf?
It should have an entry like:

    libgssapi_krb5.so.2     mechglue_internal_krb5_init


Comment 16 Kostas Georgiou 2006-10-31 01:33:54 UTC
It's 
libgssapi_krb5.so     mechglue_internal_krb5_init
I think I did remove the /usr/lib/ by hand since the install was before the fix
(was it in 4.4? I can't remember really)

Comment 18 Steve Dickson 2007-01-09 15:32:37 UTC
The errors in Comment #7 should be fixed in nfs-utils-1.0.6-76

Comment 19 Kostas Georgiou 2007-02-12 23:14:07 UTC
Is nfs-utils-1.0.6-76 available anywhere? I imagine it's part of 4.5 right?

Comment 20 Steve Dickson 2007-02-12 23:31:09 UTC
yes... Please feel free to reopen this bug if you still have the same
problem..