Bug 29374

Summary: file locking over NFS not working
Product: [Retired] Red Hat Linux Reporter: Thilo Mezger <thilo.mezger>
Component: kernelAssignee: Pete Zaitcev <zaitcev>
Status: CLOSED RAWHIDE QA Contact: David Lawrence <dkl>
Severity: high Docs Contact:
Priority: high    
Version: 7.1   
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2001-03-08 21:57:46 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
excerpt from log file
none
detailed scenario again
none
strace output of running pine none

Description Thilo Mezger 2001-02-25 12:07:49 UTC
My test environment looks like this:

The NFS server ("argos") is running Red Hat Linux 7.0 with all updates 
applied and it did a very good job serving RHL7.0 NFS clients.

My Wolverine test machine ("silicad") is mounting /home via NFS and 
automount from the server.  As soon as a process tries to lock a file 
(e.g. pine locking the mailbox or konquerer locking its bookmarks file) 
the system is hanging and it looks like the server is away - but it isn't.

Everything works fine there are only 7.0 machines in the network.

The clients syslogs to a loghost.  Logs are attached.

Comment 1 Thilo Mezger 2001-02-25 12:08:25 UTC
Created attachment 11020 [details]
excerpt from log file

Comment 2 Bob Matthews 2001-02-26 16:27:51 UTC
Can you strace the process which is doing the locking and attach the output? 
I'd like to know where it is getting stuck or spinning.

Comment 3 Glen Foster 2001-02-26 23:57:03 UTC
We (Red Hat) should really try to resolve this before next release.

Comment 4 Thilo Mezger 2001-02-27 18:47:40 UTC
Created attachment 11220 [details]
detailed scenario again

Comment 5 Thilo Mezger 2001-02-27 18:49:02 UTC
Created attachment 11221 [details]
strace output of running pine

Comment 6 Thilo Mezger 2001-02-27 18:50:22 UTC
running "/usr/sbin/nhfsstone /home/joe" on a directory which is mounted via NFS 
turned out to be a good test case:

- it works with Guinness
- it fails with Wolverine

Comment 7 Jeff Johnson 2001-03-07 20:26:17 UTC
I failed to reproduce the problem using nhfstone with
	client	kernel-2.4.0-0.43.12smp	nfs-utils-0.2.1-10
	server	kernel-2.2.16-7		nfs-utils-0.2-2
all using nfsv2 on udp.

I also tried nfs-utils-0.3.1 on both sides, no problem, and tried
kernel-2.4.2-0.1.20smp on
the client as well, still no problem.

I also tried using pine/mutt with /var/spool/mail mounted on client, while
watching
the protocol using tcpdump, no propblem

So, can you supply the exact packages for kernel/nfs-utils/pine installed on
both client and
server?

Could also try your nhfstone failure on a manually mounted (no autofs)
directory?

Thanks.




Comment 8 Thilo Mezger 2001-03-08 21:39:12 UTC
Server ("Guinness" with all official patches applied and nothing else):
kernel-2.2.17-14
nfs-utils-0.1.9.1-7
 
Client ("Wolverine" as is):
kernel-2.4.1-0.1.9
nfs-utils-0.2.1-10

OK, I tried both now: auto-mounted and manually mounted filesystems.  Both 
worked now and I wasn't able to reproduce my bug anymore.

It's too bad I changed my network card from a NE2000 compatible 10MBit/s to a 
3COM 100MBit/s network - now everything works for me.

This is really strange as I was able to reproduce this bug for about 3 times 
(i.e. re-installing for 3 times).  And the old network-card with RHL7.0 on 
both client and server had always worked perfectly which made me think that it 
was Wolverine...

Still, I don't really understand what has happened here - I suppose it might 
have something to do with the ne2000 driver...?!  God knows...

Comment 9 Bob Matthews 2001-03-08 21:56:18 UTC
We have had one other report of a problem with the NE2000, although it was
generating a kernel hang.

Comment 10 Arjan van de Ven 2001-03-19 15:46:25 UTC
I will close this fixed as there is no way this can be reproduced anymore
and it works now; I have a similar situation (also not ne2000) that works just
fine.