Bug 201438

Summary:

vi (chown) doesn't work in an nfs4 mount

Product:

Red Hat Enterprise Linux 4

Reporter:

Kostas Georgiou <k.georgiou>

Component:

nfs-utils

Assignee:

Steve Dickson <steved>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Ben Levenson <benl>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

4.4

Target Milestone:

---

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

nfs-utils-1.0.6-76

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2007-02-12 22:52:00 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
nfs4 network trace with host servername and port 2049	none

Description Kostas Georgiou 2006-08-05 11:27:29 UTC

Running vi in a nfs4 home directory gets stuck in a call to chown
strace shows that the chown call never returns.
  chown("5159", 1111, 222) = ? ERESTARTSYS (To be restarted)
the ERESTARTSYS is because of a kill.

The server is a x86_64 RHEL4 machine and both i386/x86_64 RHEL4 clients
have the problem with nfs4 mounts with sec=sys and sec=krb5. A mount from
a FC5 client to the same server works fine.

The RHEL4 machines are fully patched (U3+fasttrack) and the latest available
kernel (2.6.9-42.ELsmp) doesn't help either.

Let me know if you want tcpdump logs etc.

Comment 1 Kostas Georgiou 2006-08-10 09:47:06 UTC

ping

Comment 2 Kostas Georgiou 2006-08-14 15:26:57 UTC

Still there in update4

Comment 3 Steve Dickson 2006-08-16 11:28:55 UTC

I'm not seen this behavior at all... so could you please posted
as bzip2 binary tethereal trace of the problem. Something
similar to 'tethereal -w /tmp/data.pcap host <server> ; bzip2 /tmp/data.pcap'

Comment 4 Kostas Georgiou 2006-08-16 13:04:48 UTC

Hmm this is going to take a while, it seems that the machine is hit by something
similar to: http://linux-nfs.org/pipermail/nfsv4/2006-April/004132.html and
http://linux-nfs.org/pipermail/nfsv4/2006-April/004103.html
Since the new nfs-utils-lib includes librpcsecgss-0.10 I suspect that this is
the cause. 

I'll roll back to the older version of nfs_utils* to get the network dump. 

Do you want me to open a new bug for the "rpc.gssd: WARNING: can't create
rpc_clnt for server ...." error message?

Comment 5 Steve Dickson 2006-08-16 14:16:46 UTC

Yes.. please... thanks!!

Comment 6 Kostas Georgiou 2006-08-16 14:42:09 UTC

Created attachment 134314 [details]
nfs4 network trace with host servername and port 2049

Comment 7 Kostas Georgiou 2006-08-16 14:45:54 UTC

I had to roll back the server (x86_64 also) as well since it was failing with:

Aug 16 15:33:23 icva rpc.svcgssd[3443]: WARNING: get_uid failed
Aug 16 15:33:23 icva rpc.svcgssd[3443]: WARNING: handle_nullreq: get_uid failed
Aug 16 15:33:48 icva rpc.svcgssd[3443]: WARNING: get_uid failed
Aug 16 15:33:48 icva rpc.svcgssd[3443]: WARNING: handle_nullreq: get_uid failed
Aug 16 15:34:13 icva rpc.svcgssd[3443]: WARNING: get_uid failed
Aug 16 15:34:13 icva rpc.svcgssd[3443]: WARNING: handle_nullreq: get_uid failed

Which makes me wonder how did I manage to claim that the chown bug is still
there in update4? Probably I only tested with the new kernel...

Comment 8 Kostas Georgiou 2006-08-19 18:16:12 UTC

#203239 for the regression.

Comment 9 Kostas Georgiou 2006-10-20 12:43:29 UTC

Any thoughts on what is causing the problem? Can you replicate the problem at all?

Comment 10 Steve Dickson 2006-10-20 16:23:06 UTC

This probably does not happen when you don't use sec=krb5, correct?

Comment 11 Kostas Georgiou 2006-10-20 16:41:05 UTC

Both sec=sys and sec=krb5 show the problem, the server is x86_64 but I tried
both i386 and x86_64 clients (with and without krb5). 

When does the server reply with NFS4ERR_DELAY? From what I can see the client
keep sgetting the delay "error" forever :(

Comment 12 Steve Dickson 2006-10-30 16:30:42 UTC

Well when the server returns NFS4ERR_DELAY, it generally means
the server has done an upcall to some user daemon and is waiting
for the response (i.e. a downcall)..... Now a NFS4ERR_DELAY on
the client means go away for a second or two and then retry...
So seeing the client continuously trying is normal....

Now looking a Comment #7, it appears the user level
daemon the server is waiting for is rpc.svcgssd, who seems
to be having a problem getting the uid from Kerberos...

What version of nfs-utils and nfs-utils-lib are you using?

Comment 13 RHEL Program Management 2006-10-30 16:45:57 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 14 Kostas Georgiou 2006-10-30 18:19:53 UTC

The testing was done with the version previous to 4.4, since the latest version
is broken under x86-64.
The error message in comment #7 is from nfs-utils-1.0.6-70.EL4 (bugzilla
#203239) which is a different issue.

Comment 15 Steve Dickson 2006-10-31 01:07:26 UTC

Just curious... what is the containts of /etc/gssapi_mech.conf?
It should have an entry like:

    libgssapi_krb5.so.2     mechglue_internal_krb5_init

Comment 16 Kostas Georgiou 2006-10-31 01:33:54 UTC

It's 
libgssapi_krb5.so     mechglue_internal_krb5_init
I think I did remove the /usr/lib/ by hand since the install was before the fix
(was it in 4.4? I can't remember really)

Comment 18 Steve Dickson 2007-01-09 15:32:37 UTC

The errors in Comment #7 should be fixed in nfs-utils-1.0.6-76

Comment 19 Kostas Georgiou 2007-02-12 23:14:07 UTC

Is nfs-utils-1.0.6-76 available anywhere? I imagine it's part of 4.5 right?

Comment 20 Steve Dickson 2007-02-12 23:31:09 UTC

yes... Please feel free to reopen this bug if you still have the same
problem..