My test environment looks like this: The NFS server ("argos") is running Red Hat Linux 7.0 with all updates applied and it did a very good job serving RHL7.0 NFS clients. My Wolverine test machine ("silicad") is mounting /home via NFS and automount from the server. As soon as a process tries to lock a file (e.g. pine locking the mailbox or konquerer locking its bookmarks file) the system is hanging and it looks like the server is away - but it isn't. Everything works fine there are only 7.0 machines in the network. The clients syslogs to a loghost. Logs are attached.
Created attachment 11020 [details] excerpt from log file
Can you strace the process which is doing the locking and attach the output? I'd like to know where it is getting stuck or spinning.
We (Red Hat) should really try to resolve this before next release.
Created attachment 11220 [details] detailed scenario again
Created attachment 11221 [details] strace output of running pine
running "/usr/sbin/nhfsstone /home/joe" on a directory which is mounted via NFS turned out to be a good test case: - it works with Guinness - it fails with Wolverine
I failed to reproduce the problem using nhfstone with client kernel-2.4.0-0.43.12smp nfs-utils-0.2.1-10 server kernel-2.2.16-7 nfs-utils-0.2-2 all using nfsv2 on udp. I also tried nfs-utils-0.3.1 on both sides, no problem, and tried kernel-2.4.2-0.1.20smp on the client as well, still no problem. I also tried using pine/mutt with /var/spool/mail mounted on client, while watching the protocol using tcpdump, no propblem So, can you supply the exact packages for kernel/nfs-utils/pine installed on both client and server? Could also try your nhfstone failure on a manually mounted (no autofs) directory? Thanks.
Server ("Guinness" with all official patches applied and nothing else): kernel-2.2.17-14 nfs-utils-0.1.9.1-7 Client ("Wolverine" as is): kernel-2.4.1-0.1.9 nfs-utils-0.2.1-10 OK, I tried both now: auto-mounted and manually mounted filesystems. Both worked now and I wasn't able to reproduce my bug anymore. It's too bad I changed my network card from a NE2000 compatible 10MBit/s to a 3COM 100MBit/s network - now everything works for me. This is really strange as I was able to reproduce this bug for about 3 times (i.e. re-installing for 3 times). And the old network-card with RHL7.0 on both client and server had always worked perfectly which made me think that it was Wolverine... Still, I don't really understand what has happened here - I suppose it might have something to do with the ne2000 driver...?! God knows...
We have had one other report of a problem with the NE2000, although it was generating a kernel hang.
I will close this fixed as there is no way this can be reproduced anymore and it works now; I have a similar situation (also not ne2000) that works just fine.